Logistic Regression revisited

I recently came across a different method using logistic regression to yield attributed credit, and having tested it for a few months am relatively happy with the results generated.

As previously, a standard glm logit model is run against an instance data table and coefficients determined for each campaign (plus an intercept). These coefficients are then resolved to odds via an exponential transformation, and these act as stakes in a proportional claim of conversion.


So for example, assume we have two campaigns yielding logodds coefficients as follows:

Intercept             Coefficient = -5.5

Campaign A        Coefficient = +1.6

Campaign B        Coefficient = +1.0


We translate these in to odds:

Intercept = exp(-5.5) = 0.004

A = 4.78

B = 2.70


So for a conversion where all both channels are present, the claim for channel A is 4.78 / (4.78+2.70+.004) = 0.64.  Channel B claims 0.36, and the intercept has a very small fraction ~ 0 in this instance.

Quite a neat method I think which avoids the outcome of negative contributions that may occur with resolution to a probability formula.


Touched upon in the prior post was the technique of partitioning data prior to running the regression in an attempt to reduce collinearity effects (i.e. impression-only paths/click-only/mixed). As part of our testing, we have also attempted to partition data in to activity occurring pre- and post- first site landing, yielding models with two, three and six separate partitions respectively that we combine and, in the case of pre-post splits, weight by unique converters.

Partitioning data by site visit isn’t a new approach, and the logic behind this is to separate activity from users who are in likely quite distinct decision states: pre-site advertising’s designated function is to raise brand and product awareness, post-site advertising is likely functioning as retargeting or is employed in a purely navigational role (where a user follows a brand search click rather than entering the URL).