If you’re ‘doing it on the side’ then your first port of call I’d suggest will be DoubleClick’s (or Marin, Kenshoo etc.) path-to-conversion reports. There’s an equivalent in Google Analytics which also shows top converting paths. This will give you some data to play around with, however you will be limited in that you can’t see non-converting paths. This limits which models you can use unfortunately: the Markov package will work across this data (albeit biased without non-converters), but you won’t be able to use logistic regression.

As an entry point, there are models available within the Google platforms. That may provide an acceptable trade off, giving you access to non-converting data against settling for an automated ‘black box’ solution.

If you have access to full log data then obviously you’re in a stronger position. I’d start with using the glm package in R as it’s already written, and incredibly quick.

LikeLike

]]>I would love to start doing this (on the side at my company) just not sure where to start!

LikeLike

]]>LikeLike

]]>Love your articles, but I’m a bit confused on certain parts of this one; in particular, the statement:

“Run a series of regressions comparing the importance value of each campaign in turn with each of the others as a pair, triplet, or higher order combination. Allocate the conversions observed when these combinations occur in the attribution data based on the Shapley Value regression approach”

Where are these regressions actually happening in the example? It just seems like straightforward math…

LikeLike

]]>LikeLike

]]>I will try both Logistic and Markov on a couple of our biggest clients and once I have some clean results, analyze which is the best way for deciding which one to stick to. Im guessing the Markov may be the more flexible one and thus the easiest to adapt to each client needs.

Again thanks for everything and I hope you keep up with this amazing work.

LikeLike

]]>I would say that each model has pros and cons, and that actually the most important part is how you prepare the data prior to modelling; such as partitioning on prior site visit, whether to include post-conversion activity, keep non-converters etc. This heavily influences the results that are generated.

To quote another author, “A challenge for attribution analysis is that the â€śtruthâ€ť is a quantity that may never be known, so there is no benchmark or holdout set to evaluate â€śbetterâ€ť in a quantitative sense” (Dalessandro et. al. 2012). i.e. It’s about regular testing and gut feel!

If you are looking for a ‘pure’ digital attribution system, then my preference would be for logistic regression. For this you need to have non-converter data.

Sometimes the data ‘lies’ though; for example SEM Brand activity may be highly driven by other campaigns or offline factors. Where you have no other measure of this influence, the Markov approach is a highly flexible framework. I would choose to use non-converting data, but this approach can be applied with converter-only data with not hugely dissimilar results (in my experience), which is another boon.

Lastly; Game Theory as a solution seems to have (from what I can see) more proponents in commercial providers. There’s an inherent ‘fairness’ that I like to the theory, albeit there’s never a time when all your coalitions are defined so as far as I can see some approximations must be applied. I’ve seen this approach applied with and without non-converter data, but again, my preference would be to have non-converting data as it moderates against volume-bias (i.e. doing lots of activity inevitably drops in to conversions regardless of whether the campaign increased conversion probability).

LikeLike

]]>Thanks for your time! And awesome job with your posts! I stumbled upon your blog by chance and it has been really valuable for me.

LikeLike

]]>LikeLike

]]>