Game Theory (Shapley) revisited

Setting some time aside for development work, I decided finally that I should try and write a script to evaluate game theory solutions. Truthfully, I’d been put off previously by the apparent complexity in generating the set structure to evaluate the regression.

For every n campaigns, there are n! (factorial) + 1 combinations of sets for a fully defined game. As seen in the previous post, with three channels this is relatively trivial and can be done by hand. However, with a campaign taxonomy of even 10 campaigns you would need to generate 3.6million sets! Most of these sets would be empty as real user data doesn’t come close to having examples of each of these combinations.

However, when I actually spent some time reading through the maths I soon realised that there is a mathematical approach that avoids evaluating ‘empty’ sets. I found a handy explanation of the formula:

Shapley_latex

here  https://linguisticcapital.wordpress.com/2015/06/09/the-shapley-value-an-extremely-short-introduction/  which was essentially that the formula can be calculated in two stages.

Imagine a step-wise process, which loops through each unique combination (“coalition”) of campaigns you have. If you refer to my previous post (https://thedataanalyst.wordpress.com/2016/08/30/shapley-value-regression-game-theory-as-an-attribution-solution/) then the equivalent would be looping through each row of the table presented there.

For each unique combination of campaigns, S = number of ‘campaigns’ in that combination instance, and n is the total number of unique campaigns you have.

So for a 3 campaign taxonomy, and a set containing 2 campaigns (e.g. row 4, PPC Brand and SEO), that first bit is

Factorial(2-1) * Factorial(3-2) / Factorial(3) =   1/6   = 0.1667

 

The second bit is simply the value difference with and without the campaign. For example, if we’re evaluating SEO in the set {PPCBrand, SEO}, and we know from the value estimation stage that {PPCBrand, SEO} = 424 and {PPCBrand} = 270, then the credit SEO receives for this combination is (424 – 270) * 0.1667 = 25.7

 

Evaluating {SEO} = 199, then PPC Brand receives (424-199) * 0.1667 = 37.5

You can then move on to the next known unique combination of campaigns. When all have been calculated, a sum of credit across each set by campaign yields an attributed share.

 

So far so good. In theory.

 

Except: unless you have a trivial case-study, you won’t have all your combinations described in your customer path data, so you are missing vast swathes of the sub-sets you need to evaluate uplift.

Real world data being what it is too, there’s every chance that a key rule of “additivity” has also been broken. Additivity states that in this cooperative game, adding a channel to a coalition should not reduce its value: but at least in the data I’ve worked with it is not uncommon for (say) a single SEM click to have a higher conversion rate than an SEM click with a series of prospecting display adverts.

How does this impact on the results?

 

In the instance of partially defined coalitions, fortunately real world data aids us here: consider it not uncommon that a typical data set that shows 1 step paths accounting for 50% of conversions, 2 steps or fewer ~ 80%, and cumulatively 3 or fewer steps ~ 90% – this then tails away over the remaining data.

Chances are you have combinations of 3 channels described fully: at least for campaigns that make up the bulk of conversions.  This leaves only a small proportion of conversions caught up in non-described games.

From reading around, evaluating ‘partially defined games’ appears to be an unsolved problem with active investigation (https://www.goshen.edu/wp-content/uploads/sites/27/2015/05/Linear.pdf).

If you are determining value by some kind of algorithm, then it may be possible to generate these in-situ (e.g. logistic regression as your model, ref: https://huayin.wordpress.com/tag/attribution-modeling/).

For pre-modelled values though I’ve not yet worked out an answer: *makes note – this would be a good question to put to vendors..!*.  I can imagine for a simplistic resolution you can adjust for these undescribed games by ignoring them, and simply scaling known credit back up. Maybe also by applying a hybrid approach to those sets missing subsets, where known shares are used where available and the remainder shared equally between the remaining channels?

 

In the latter instance of additivity, data partitioning (as described previously) has been suggested as a means to separate upper funnel and lower funnel activity. By modelling those returning clicks close to conversion differently from brand/product awareness activity, and adjusting credit between the pools there is an implicit push of value back up the funnel.

I’ve no doubt that there will be cases still where a channel appears to have a negative effect: setting a lower boundary of zero credit is a blunt way of approaching this, though this necessitates some modest rescaling of results as by doing this your model will inevitably generate more conversions than actually occurred.

 

And so work continues. A welcome addition to the portfolio of approaches I can apply even if it isn’t 100% there yet. Though, what model is?

Advertisements

Shapley Value regression (Game Theory) as an Attribution solution

Shapley Value regression (referred to commonly as Game Theory) is at its core quite an elegant solution, and one I’ve left until quite late on in the series to post about as it is an approach I haven’t implemented, despite understanding its principles.

[ I therefore offer no real practicable insight, short of that from discussions with 3rd party practitioners and published papers. ]

 

At the core of this algorithm sits a fairly simple two stage approach

1              Identify a baseline ‘importance’ value for each campaign that represents the expected conversion performance (number of conversions, or anticipated conversion rate)

2              Run a series of regressions comparing the importance value of each campaign in turn with each of the others as a pair, triplet, or higher order combination. Allocate the conversions observed when these combinations occur in the attribution data based on the Shapley Value regression approach

 

In essence, the approach simulates a series of multivariate (A/B) test uplift comparisons. I first came across this approach when assisting in pilot testing with GoogleNY shortly after their acquisition of Adometry. At the time I little understood the approach, focusing more on the credibility of the attributed outputs.

I came across it again when research uncovered the Shao and Li paper “Data-driven Multi-touch Attribution Models”, which details a 2nd order Shapley regression almost as an aside to their ensemble logistic regression approach (the latter of which, frustratingly, I cannot get to produce valid results using our data!).

A more comprehensive step by step detailing of the technique is incorporated in to the Abakus patent (http://www.google.co.uk/patents/US8775248). Incidentally, Abakus provide a series of short videos on their site explaining how the technique is applied, though the algorithm (as far as I could see) isn’t covered.

 

Hence, a simplified example of how stage 1 and 2 are executed (as I understand it). Note that the below approach is based upon the Abakus method described in their patent, but other methods for determining the stage 1 ‘player values’ are alluded to, if not detailed, in other articles (http://huayin.wordpress.com/tag/attribution-modeling/)  such as logistic regression, time-decay based models, etc.

 

Imagine this core result set obtained from a hypothetical attribution data set:

S1 S2 S3 Desc Unique Users Conversions cr%
PPC Brand p1 800 60 7.500%
SEO p2 850 45 5.294%
Display p3 150000 10 0.007%
PPC Brand SEO p1,2 300 25 8.333%
PPC Brand Display p1,3 1800 65 3.611%
SEO Display p2,3 1900 55 2.895%
PPC Brand SEO Display p1,2,3 700 40 5.714%
P p0 3000000 100 0.003%

 

 

From this aggregated data set, we ascertain each campaigns ‘playing value’ – i.e. stage 1 above. Starting with PPC Brand, we multiply its conversion rate in isolation (7.5%) against all the combinations in which it appears (800+300+1800+700) and add to that the rate that would occur anyway (the baseline of 0.003% against 850+150000+1900+3000000). This gives a playing value for V1  (PPC Brand only) of 375. This is repeated for each of the single campaign only combinations (V2 and V3).

For the two-campaign sets, the channels base conversion rate is applied against the combinations it occurs in but which the OTHER campaign does not, PLUS how it performs together. So for PPC+SEO, then this set (V1,V2) is worth =

7.5% * (800+1800)                           (the PPC component of the set)

+ 5.294% * (850+1900)                   (the SEO component of the set)

+8.333% * (300+700)                      (the combined PPC+SEO components)

+0.003% * (150000+3000000)      (the baseline * what’s left)

= 529

 

The last remaining combination of all sets (V1,V2,V3) is effectively the ‘game’ total – a sum of all conversions (400).  Then we account for the base line set, V0 (no media) which is 105 (0.0003% * all users).

This yields a ‘game’ as follows against which we enter each ‘campaign value’ in the order seen (starting with the special value, V0 = 105). On adding each subsequent set, we evaluate the ‘incremental’ difference.

E.g. in the first combination, PPC Brand, we know its V1 score is 375. Therefore after V0 has claimed 105, only 270 is left.

In the second combination, PPC Brand+SEO, we apply 105, then 270, then 529-270-105 = 154

And so on as follows:

 

Marginal Improvement Values
Step 1 Step 2 Step 3 V0 (Base) PPC Brand SEO Display
PPC Brand 105 270
PPC Brand SEO 105 270 154
PPC Brand Display 105 270 -92
PPC Brand SEO Display 105 270 154 -129
PPC Brand Display SEO 105 270 117 -92
SEO 105 198
SEO PPC Brand 105 225 198
SEO Display 105 198 -57
SEO PPC Brand Display 105 225 198 -129
SEO Display PPC Brand 105 154 198 -57
Display 105 5
Display PPC Brand 105 173 5
Display SEO 105 136 5
Display PPC Brand SEO 105 173 117 5
Display SEO PPC Brand 105 154 136 5
Attribution: 105 223 164 5
Target is 400 conversions: 105 167.7 123.5 3.9

 

The attribution score is the average contribution of the column which typically may exceed the actual number of conversions, and hence is scaled to a target number of conversions.

 

The occurrence of negative values occurs because while the model is based on cooperative game theory (that is, the worth from a combination of participants’ activity cannot be worse than the best ‘player’ alone) in reality two channels occurring together may well have a conversion rate LESS than the parts achieve “independently”. This effect is observed in the Anderl et al. result set (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2343077). Does this mean that a channel has a negative effect? Likely, no. It’s much as we observe in logistic regression where the likelihood is that two different customer states are being pooled.

 

Hence we again turn to data pre-processing, whereupon I can offer no real practical advice short of that recommended in the Abakus patent and elsewhere (such as Archak et al., “Mining Advertiser-specific User Behaviour Using Adfactors): pre and post site visit segmentation of data. The logic behind this is to ascertain the clicks that are most likely ‘navigational’ in nature, and thus push credit the other side of this divide in to ‘influencing’ activity.

In addition, a proportion is extracted from the influencing bracket and allocated to a baseline of non-digital media conversions. I am admittedly unclear how these proportions are decided given the exact number of non-converting, non-exposed users is not directly measurable, but trust that an estimate of cookies in market can at least give a steer. I imagine also a calibration of the outputs is enacted in order to bring the results in line with those expected (again from independent tests).

 

So in summary – the game theory as an approach isn’t overly complex. The difficulty though is the efficient reduction of the data in to distinct sets, and then the stepwise allocation of ‘values’ as seen in stage 1. Stage 2 as it happens has been solved in a variety of packages in R and is freely available (and is quick to execute).

 

If you’ve access to GA Premium, then ultimately it is a version of this approach that is deployed in the Data-Driven Algorithm available under the attribution section. It’s not easy though to determine what data manipulation has been employed (if any?) given that the algorithm is locked up in a black box click-and-go.

Even so, I find the results interesting if only for the ability to see the ‘direct conversion’ land grab. Display campaigns increase against a last click view of the world, but maybe not quite so much as we see in say the Markov approach. In addition, some non-brand PPC terms also seem to increase – the opposite from that seen in other approaches I’ve tried. This might be via collection from the direct conversion pool, but either way, it is always interesting to have another model, especially one requiring little effort from the user.

__

Note for reference: An independent comparison of results is available in the aforementioned Anderl paper and is well worth a read given the differences observed between models.