Markov Chain approach to digital attribution

I’ve been using a Markov Chain approach for some time now, refocusing on this approach  soon after the release of Davide Altomare’s excellent ChannelAttribution package in R back in January (’16). We’ve had a basic implementation of this method built in R for some time, but having the speed (from C libraries) and the simplicity of its front end has really allowed extensive testing of variables and input data.

Over the past few months it has been pleasing to see this approach gain wider exposure ( and ), though these working examples predominantly use data from Google Analytics which can have gaps when it comes to display impression tracking.

Rather than recite the methodology here (both the above posts cover these well, as does Davide’s original slideshare), I’d prefer to share my experience with this approach. As is the case with other attribution techniques, the model is sensitive to the data you feed in to it, and importantly I feel (so far) this hasn’t been discussed in detail.


If you are using purely path to conversion data, that is only successfully converting journeys, then you will achieve a very different result from the model than if you include non-converting journeys (and apply the var_null parameter). Principally this is because your starting and subsequent transition probabilities will vary hugely if you capture and incorporate all display impressions.

Objectively having only conversions isn’t a true representation of the underlying data, but whether or not including the non-converting data improves the results is more subjective. As mentioned in previous posts, running independent A/B uplift tests can help give a target with which to select an approach.


Secondly, changing the order of your Markov model also has a significant change on your results. If you read the paper on which the package is based (Eva Anderl et al. ) the authors settled on a third order model based on the model fit to the data, but I have read results from other sources presenting best fit with 2nd order Markov chains.

These decisions are predominantly based on a few statistical tests; how well the model predicts the original data’s successes and fails, how consistent the results are (stability), and top decile lifts. These tests are fairly arbitrary in my mind, and from a practitioner’s point of view I am more concerned with how the results match up with other, independently sourced real world results. While I can respect and appreciate that having the model replicating the structure of the data is a good thing, having result consensus with independent tests is, for me, a better indicator of ‘goodness’.

As a result, I generally work with 2o and 3o models and try and match external test results.

I think first order Markov is too simplistic, and suffers from a passive serving effect: that is, display impressions that appear in a user’s path but which likely haven’t influenced the conversion. Fourth order loses the finer detail; when the majority of conversions occur within a few steps then four-tuple sequences lose the transitional detail.

Having mentioned passive touchpoints such as impression serving (i.e. advertising that is dictated by a targeting algorithm rather than a user’s choice to interact such as that seen with a click), it is worth a further comment on this issue.

Where regression techniques pick out incremental correlation, the Markov approach rewards any touch point that occurs in a successful tuple. Our initial findings indicate that without prior distinction, display impressions in particular receive more credit than a ‘causation’ based test would suggest. Model order selection does vary this credit, but the more thorny issue of impression viewability is at issue here.

Wider reading can be found in many Media industry sites, but the quick version of this is that not all served impressions are, or can be, actually seen. Without data to ascertain the viewability of each impression, there is a danger you reward a campaign for its ability to identify warm targets rather than its ability to influence the conversion outcome.

We therefore exclude non-viewable impressions whenever we have the option to mitigate this eventuality.


Our work on this approach continues; despite some of the limitations above the results are not unreasonable when compared against a range of sense checks. A key advantage is that all channels are described in this model; though a downside remains that no offline/exogenous baseline contribution is identified.

Most interesting for us is how the contribution of display impression appears to notably increase vs. last-touch models, and we see much closer comparisons with A/B test results from this modelling approach than others we’ve deployed.

We are investigating further methods to refine the modelling approach. Redistribution of credit for single-click and direct-to-site conversions is one such idea; where driven by online activity, may we assume incomplete path data may have been a factor? If so, we have a probabilistic model already built to suggest what the previous step may have been.

There may be some merit in using VLMCs (Variable length Markov chains) rather than fixed length chains in order to accommodate the very different path lengths observed. Also, the possibility of integrating some offline touchpoints (such as a TV advert) based on time sequencing is an appealing proposal for further development.

Hopefully as adoption of this approach widens a broader set of ‘data pre-processing’ approaches will be uncovered and tested.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s