Logistic Regression revisited

I recently came across a different method using logistic regression to yield attributed credit, and having tested it for a few months am relatively happy with the results generated.

As previously, a standard glm logit model is run against an instance data table and coefficients determined for each campaign (plus an intercept). These coefficients are then resolved to odds via an exponential transformation, and these act as stakes in a proportional claim of conversion.


So for example, assume we have two campaigns yielding logodds coefficients as follows:

Intercept             Coefficient = -5.5

Campaign A        Coefficient = +1.6

Campaign B        Coefficient = +1.0


We translate these in to odds:

Intercept = exp(-5.5) = 0.004

A = 4.78

B = 2.70


So for a conversion where all both channels are present, the claim for channel A is 4.78 / (4.78+2.70+.004) = 0.64.  Channel B claims 0.36, and the intercept has a very small fraction ~ 0 in this instance.

Quite a neat method I think which avoids the outcome of negative contributions that may occur with resolution to a probability formula.


Touched upon in the prior post was the technique of partitioning data prior to running the regression in an attempt to reduce collinearity effects (i.e. impression-only paths/click-only/mixed). As part of our testing, we have also attempted to partition data in to activity occurring pre- and post- first site landing, yielding models with two, three and six separate partitions respectively that we combine and, in the case of pre-post splits, weight by unique converters.

Partitioning data by site visit isn’t a new approach, and the logic behind this is to separate activity from users who are in likely quite distinct decision states: pre-site advertising’s designated function is to raise brand and product awareness, post-site advertising is likely functioning as retargeting or is employed in a purely navigational role (where a user follows a brand search click rather than entering the URL).


3 thoughts on “Logistic Regression revisited

  1. Jonathan thanks for the insights. After reading the previous blog spots, and having some, not too deep, understanding of the different models, I wanted to ask you, which one do you think assigns the most valid attribution scores? And furthermore, is it best to measure it only on paths that actually converted instead of taking into account those who did not?

    Thanks for your time! And awesome job with your posts! I stumbled upon your blog by chance and it has been really valuable for me.


    1. Hi Ezequiel,

      I would say that each model has pros and cons, and that actually the most important part is how you prepare the data prior to modelling; such as partitioning on prior site visit, whether to include post-conversion activity, keep non-converters etc. This heavily influences the results that are generated.

      To quote another author, “A challenge for attribution analysis is that the “truth” is a quantity that may never be known, so there is no benchmark or holdout set to evaluate “better” in a quantitative sense” (Dalessandro et. al. 2012). i.e. It’s about regular testing and gut feel!

      If you are looking for a ‘pure’ digital attribution system, then my preference would be for logistic regression. For this you need to have non-converter data.

      Sometimes the data ‘lies’ though; for example SEM Brand activity may be highly driven by other campaigns or offline factors. Where you have no other measure of this influence, the Markov approach is a highly flexible framework. I would choose to use non-converting data, but this approach can be applied with converter-only data with not hugely dissimilar results (in my experience), which is another boon.

      Lastly; Game Theory as a solution seems to have (from what I can see) more proponents in commercial providers. There’s an inherent ‘fairness’ that I like to the theory, albeit there’s never a time when all your coalitions are defined so as far as I can see some approximations must be applied. I’ve seen this approach applied with and without non-converter data, but again, my preference would be to have non-converting data as it moderates against volume-bias (i.e. doing lots of activity inevitably drops in to conversions regardless of whether the campaign increased conversion probability).


      1. First of all, thanks for taking the time to answer my question. I really appreciate it.
        I will try both Logistic and Markov on a couple of our biggest clients and once I have some clean results, analyze which is the best way for deciding which one to stick to. Im guessing the Markov may be the more flexible one and thus the easiest to adapt to each client needs.

        Again thanks for everything and I hope you keep up with this amazing work.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s