Game Theory (Shapley) revisited

Setting some time aside for development work, I decided finally that I should try and write a script to evaluate game theory solutions. Truthfully, I’d been put off previously by the apparent complexity in generating the set structure to evaluate the regression.

For every n campaigns, there are n! (factorial) + 1 combinations of sets for a fully defined game. As seen in the previous post, with three channels this is relatively trivial and can be done by hand. However, with a campaign taxonomy of even 10 campaigns you would need to generate 3.6million sets! Most of these sets would be empty as real user data doesn’t come close to having examples of each of these combinations.

However, when I actually spent some time reading through the maths I soon realised that there is a mathematical approach that avoids evaluating ‘empty’ sets. I found a handy explanation of the formula:

Shapley_latex

here  https://linguisticcapital.wordpress.com/2015/06/09/the-shapley-value-an-extremely-short-introduction/  which was essentially that the formula can be calculated in two stages.

Imagine a step-wise process, which loops through each unique combination (“coalition”) of campaigns you have. If you refer to my previous post (https://thedataanalyst.wordpress.com/2016/08/30/shapley-value-regression-game-theory-as-an-attribution-solution/) then the equivalent would be looping through each row of the table presented there.

For each unique combination of campaigns, S = number of ‘campaigns’ in that combination instance, and n is the total number of unique campaigns you have.

So for a 3 campaign taxonomy, and a set containing 2 campaigns (e.g. row 4, PPC Brand and SEO), that first bit is

Factorial(2-1) * Factorial(3-2) / Factorial(3) =   1/6   = 0.1667

 

The second bit is simply the value difference with and without the campaign. For example, if we’re evaluating SEO in the set {PPCBrand, SEO}, and we know from the value estimation stage that {PPCBrand, SEO} = 424 and {PPCBrand} = 270, then the credit SEO receives for this combination is (424 – 270) * 0.1667 = 25.7

 

Evaluating {SEO} = 199, then PPC Brand receives (424-199) * 0.1667 = 37.5

You can then move on to the next known unique combination of campaigns. When all have been calculated, a sum of credit across each set by campaign yields an attributed share.

 

So far so good. In theory.

 

Except: unless you have a trivial case-study, you won’t have all your combinations described in your customer path data, so you are missing vast swathes of the sub-sets you need to evaluate uplift.

Real world data being what it is too, there’s every chance that a key rule of “additivity” has also been broken. Additivity states that in this cooperative game, adding a channel to a coalition should not reduce its value: but at least in the data I’ve worked with it is not uncommon for (say) a single SEM click to have a higher conversion rate than an SEM click with a series of prospecting display adverts.

How does this impact on the results?

 

In the instance of partially defined coalitions, fortunately real world data aids us here: consider it not uncommon that a typical data set that shows 1 step paths accounting for 50% of conversions, 2 steps or fewer ~ 80%, and cumulatively 3 or fewer steps ~ 90% – this then tails away over the remaining data.

Chances are you have combinations of 3 channels described fully: at least for campaigns that make up the bulk of conversions.  This leaves only a small proportion of conversions caught up in non-described games.

From reading around, evaluating ‘partially defined games’ appears to be an unsolved problem with active investigation (https://www.goshen.edu/wp-content/uploads/sites/27/2015/05/Linear.pdf).

If you are determining value by some kind of algorithm, then it may be possible to generate these in-situ (e.g. logistic regression as your model, ref: https://huayin.wordpress.com/tag/attribution-modeling/).

For pre-modelled values though I’ve not yet worked out an answer: *makes note – this would be a good question to put to vendors..!*.  I can imagine for a simplistic resolution you can adjust for these undescribed games by ignoring them, and simply scaling known credit back up. Maybe also by applying a hybrid approach to those sets missing subsets, where known shares are used where available and the remainder shared equally between the remaining channels?

 

In the latter instance of additivity, data partitioning (as described previously) has been suggested as a means to separate upper funnel and lower funnel activity. By modelling those returning clicks close to conversion differently from brand/product awareness activity, and adjusting credit between the pools there is an implicit push of value back up the funnel.

I’ve no doubt that there will be cases still where a channel appears to have a negative effect: setting a lower boundary of zero credit is a blunt way of approaching this, though this necessitates some modest rescaling of results as by doing this your model will inevitably generate more conversions than actually occurred.

 

And so work continues. A welcome addition to the portfolio of approaches I can apply even if it isn’t 100% there yet. Though, what model is?

Advertisements