Digital Attribution Modelling

Having a feel for your typical customer journey is an integral part of the attribution model design process.

For example, understanding the number of steps you average user takes, and over what time frame, can massively influence the assumptions you make and hence your choice of model. Fortunately almost all digital reporting platforms provide you this kind of insight as part of their standard reporting toolkit, under a name of “time lag to conversion”, “path length”, or something similar.

If most customer journeys are only a single step and a customer ‘converts’ there and then, then you need make few assumptions: for whatever model you’ve chosen there are few ways to distribute credit.

In my experience, short paths are common in insurance products. A customers’ mind set is one of speed, convenience and price – often on a renewal deadline. With the prominence of price comparison sites acting as one-stop-shops, a customer has few opportunities to be exposed to advertising and as such generates few data “touch points”.

As the research phase extends, so we collect more advertising interactions and the complexity of our path evaluation increases. Where a purchase decision may take weeks or months, so we must now decide not only on advert resonance but also how time plays as a factor.

Early on in their journey they may just be getting a feel for the marketplace; which brands are out there, what are their perceptions or previous experiences of these, do they trust the site and does it have what they are in market for?

As they progress: maybe their activity is more about deciding between a shortlist of products or brands? We expect refined search terms and specific information sought on dedicated web sites. Towards the point of conversion have they in effect made up their mind, and are now looking simply for reassurance through second opinions on social media and product review sites?

I see longer journeys more commonly with high value purchases and/or ‘desirable’ goods – homes, holidays and mobile phones for example: areas where customers are happy to spend more time browsing and researching to get a specific deal or set of product features.

From an analytical viewpoint then: does our understanding of the customer journey extend to allowing us to identify which stages are more important than others? Are they equally important? Do we believe that as the user nears decision then so the advert’s influence becomes more important? Or do we assume the early (or first) brand association step was key to the chain? Is a particular advert channel known to be strong in this sector?

In some cases we can answer these questions with data, in others we must make an educated choice.

If you’re dipped your toe already in to the Google Analytics world, you will probably have already seen their [relatively] simple modelling tools. These reports allow a simple, low-cost entry in to envisaging the results of a particular credit attribution model:


I would say if you have fairly short conversion paths of one or two steps, and/or your online budget levels are fairly small, then these are perfectly adequate tools to begin exploring and testing models.

As a recommendation, my advice would be to try and steer clear of the ‘absolute’ models (First interaction and Last interaction). If you’ve read any of the previous posts, you’ll hopefully agree that giving all credit to just one interaction is like giving all the credit in a football match to your goal scorer. Assuming you optimise on the results of your model (if not – why bother at all?), you’d end up with a team purely of forwards which is more than likely, “a bad strategyTM”.

There are some other notable limitations to these positional models, and I would advocate you should be aware of these even though it doesn’t necessarily prevent you from drawing useful insight.

Firstly: technological limitations. Online tracking technology is not perfect. The primary tracking mechanism is a cookie, which is device specific; by that I mean tied to your mobile, tablet, laptop or computer. Many customers will use multiple devices depending on where they are and whom they are with: assuming they don’t identify themselves to the browser (such as signing in to Google+), they will generate independent cookie data on each device.

While you can see the device on which the user completes a purchases (last click), you have no idea where they really started their journey. Putting all your eggs on the first known step is, I’d suggest, at best imprecise and at worst horribly wrong.

Secondly: these models are digital domain only, and make no accounting for offline advertising such as TV and Radio, or even plain old word-of-mouth. You will with certainty be over-estimating the contribution from your online advertising, particularly through any brand name search advertising which is commonly seen as a means to site navigation, rather than advertising per se. You might be able to apply some gut feel scale-factor to adjust these numbers particularly if you’ve been monitoring offline campaigns for a while, but unfortunately at this level of complexity there’s not much more science you can apply.

Thirdly: in the GA implementation at least, the data used to feed these models only contains the paths of successful ‘converters’ (i.e. customers who complete a certain phase such as quote request or sale). Each occurrence of a particular advert is rewarded with incremental credit regardless of its realistic influence at a given step, and there is no negative feedback from failed conversions against which this incrementality may be offset. In more complex models, we observe that repeat advert appearances can in fact be an indicator of decreased effectiveness.

Still, no model is perfect and given that the results return quickly it is worth comparing a couple of these models side by side. If the results come out pretty similar for a period of a month, then it’s probably not worth getting too hung up on exactly which is best – take one, and try it out for a while (i.e. see if making the changes the results imply actually result in better returns).

When your online advertising expenditure sits in the “higher” category, or your customer journeys are more protracted, then you are probably at a stage where either a more complex model, or certainly a better fitted model, is worth investment.

I’ll get on to techniques later, but a good rule of thumb here is that these models shouldn’t just apply more complex rules, but more complex techniques. This means that rather than making more assumptions, you make fewer.

By more complex techniques; we’re really talking about utilising more complete data sets, either pushing it through statistical models (for example, logistic regression or decision trees), crunching it brute-force style with algorithmic solvers to produce weighted positional or Markov chain models, or using mathematical game theory solutions (such as obtained via Shapley Value techniques).

I’ve seen, and admittedly produced (under protest!), rule-based attribution models that apply an inordinate number of “if-then” conditions, all of which were based entirely on conjecture. These results were reported upon, but with an ever changing strategy landscape and high seasonal volatility no feasible testing structure would ever be possible. In other words, it was guess work.

Testing is a basic requirement for any model development lifecycle. If you make changes based on a model that doesn’t result in greater success (all other things being equal), then you haven’t got a good model and you’re likely wasting money. Moreover, if you can’t work back from results to understand which part of your model might be at fault, you’re doomed to repeat the failings.

The best way to test your model is, of course, to spend some money. If it looks like doing more of A is a cost effective solution, then do some more “A” and see if your sales increase and efficiency remains roughly consistent. Utilise A/B tests to help calibrate your model.

However, it is often difficult to convince a stakeholder to ‘gamble’ on a model, and frequently the market place differs significantly from one month to the next, making the ‘all other things being equal’ stipulation awkwardly un-equal. If you have a regular periodic measurement framework in place though, and maybe a second model running in parallel for comparison, you can begin to see which particular adverts in a campaign are consistently performing well, and certainly those that aren’t, and from this begin to make some campaign decisions.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s