Data Science in Training 1: Marketing Mix Modeling (MMM) and Multi-Touch Attribution (MTA)

Shobeir Seddington
8 min readApr 2, 2023

This post is not an expert opinion, it is my way of trying to learn and think through a new field and document my learnings. To the experts in the field of MMM and MTA, your comments are much appreciated in correcting my view or expanding on any section.

Introduction

Marketing attribution is an essential part of understanding the effectiveness of your marketing campaigns. Attribution modeling is the process of determining which marketing activities are responsible for driving conversions or sales. It helps businesses to allocate their marketing budget more effectively and optimize their marketing strategies.

There are two main types of attribution modeling: Marketing Mix Modeling (MMM) and Multi-Touch Attribution (MTA).
MMM models use statistical methods to analyze historical data and estimate the impact of different marketing channels on sales. MTA models focus on tracking individual user journeys across multiple touchpoints to determine the impact of each touchpoint on conversions or sales.

In this blog post, we will explore the various methodologies for evaluating MMM and MTA models, and discuss the strengths and weaknesses of each method. We will also examine the use of deep learning methods for attribution modeling, and discuss the pros and cons of using these methods. Finally, we will discuss how to choose the best attribution modeling method for your business and explore future directions for attribution modeling research and development.

Bullet point approach

  • First, let’s see when to use each:

Everything starts with data!

The data for MMM is at the aggregate level. Impressions also can work instead of dollars spent.

MMM Sample Data

But for MTA the data needs to be more granular and on the individual level.

MTA Sample Data

Code for MMM

To analyze the impact of each marketing channel on sales. There are different approaches you can use, such as linear regression, time series analysis, or machine learning algorithms.

Here’s an example of how you can use linear regression to calculate MMM in Python:

import pandas as pd
import numpy as np
import statsmodels.api as sm

# Load the data into a Pandas dataframe
data = pd.DataFrame({
'sales': [1000000, 1200000, 1150000, 1300000, 1400000],
'tv_spend': [100000, 120000, 110000, 130000, 140000],
'online_spend': [50000, 55000, 60000, 65000, 70000],
'social_spend': [20000, 22000, 25000, 27000, 30000],
'other_spend': [30000, 32000, 35000, 40000, 45000]
})

# Create the X and Y matrices for linear regression
X = data[['tv_spend', 'online_spend', 'social_spend', 'other_spend']]
X = sm.add_constant(X)
y = data['sales']

# Fit the linear regression model
model = sm.OLS(y, X).fit()

# Print the model summary
print(model.summary())

How can I trust my model?

If you have the luxury of creating variability and tests, then do it. Other than that obviously you should include the normal modeling practices of checking model errors, having a test and dev data split, etc. Various methodologies for evaluating MMM models, including Holdout testing, Time series analysis, Incrementality testingAttribution lift, and Sensitivity analysis.

The double edge sword is that your numbers should pass your partner’s sniff test! If you show a number that they don’t feel it’s right you may have a hard time defending it. Sometimes you can add constraints to your model's optimization function to get the coefficients within a certain range. At this point, I am not sure if it’s “science” anymore.

MTA: the sucker’s game!

MTA is personalization for MMM! But I believe that it is the sucker’s game since in reality there is no ground truth for what you measure. This makes it a balance of art and science where you as a practitioner have to find ways to prove yourself wrong.

There are various methodologies for attribution.

First/Last Touch Attribution

This method assigns 100% of the credit to the first/last touchpoint in the customer journey. Obviously, only mentioned it for completeness. It is almost zero trust in this approach. Only use it if this is the first step in the journey of setting up measurement in the organization.

import pandas as pd

# Load data
data = pd.read_csv('mta_data.csv')

# Perform first/last touch attribution
first_touch = data.groupby('User ID')['Channel'].first()

last_touch = data.groupby('User ID')['Channel'].last()
# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Output results
print("First touch attribution:")
print(first_touch)
print("\nRevenue by channel:")
print(revenue_by_channel)

Linear Attribution

Slightly better than the first/last touch. This method assigns credit equally to all touchpoints.

import pandas as pd

# Load data
data = pd.read_csv('mta_data.csv')

# Calculate total touches per user
total_touches = data.groupby('User ID')['Channel'].count()

# Calculate revenue per touch
revenue_per_touch = data.groupby('Channel')['Revenue'].sum() / data.groupby('Channel')['User ID'].count()

# Calculate linear attribution
linear = pd.DataFrame({'User ID': total_touches.index, 'Touches': total_touches.values})
linear['Credit'] = linear['Touches'] * revenue_per_touch.mean()

# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Output results
print("Linear attribution:")
print(linear[['User ID', 'Credit']])
print("\nRevenue by channel:")
print(revenue_by_channel)

Time Decay Attribution

This method assigns more credit to touchpoints that occur closer in time to the purchase.

import pandas as pd
import numpy as np

# Load data
data = pd.read_csv('mta_data.csv')

# Calculate time between each touchpoint and purchase
data['Days to Purchase'] = (pd.to_datetime(data['Purchase Date']) - pd.to_datetime(data['Date'])).dt.days

# Calculate weight for each touchpoint based on time decay
data['Weight'] = np.exp(-0.1 * data['Days to Purchase'])

# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Calculate time decay attribution
data['Weighted Revenue'] = data['Revenue'] * data['Weight']
time_decay = data.groupby('User ID')['Weighted Revenue'].sum()

# Output results
print("Time decay attribution:")
print(time_decay)
print("\nRevenue by channel:")
print(revenue_by_channel)

U-Shaped & W-Shaped Attribution

The scientist in me cringes! This method assigns 40% to the first, 40% to the last, and 20% to all the steps in the middle. An improvement on this method is to make the percentages a learnable parameter!
The W-shaped gives more weight to the second to last step in the customer journey. I haven’t read enough to know why!

import pandas as pd

# Load data
data = pd.read_csv('mta_data.csv')

# Calculate total touches per user
total_touches = data.groupby('User ID')['Channel'].count()

# Calculate revenue per touch
revenue_per_touch = data.groupby('Channel')['Revenue'].sum() / data.groupby('Channel')['User ID'].count()

# Calculate u-shaped attribution
u_shaped = pd.DataFrame({'User ID': total_touches.index, 'Touches': total_touches.values})
u_shaped['Credit'] = u_shaped['Touches'] * revenue_per_touch.mean()
u_shaped.loc[u_shaped['Touches'] == 1, 'Credit'] *= 0.4
u_shaped.loc[u_shaped['Touches'] > 2, 'Credit'] *= 0.2
u_shaped.loc[u_shaped['Touches'] == 2, 'Credit'] *= 0.4

# Calculate w-shaped attribution
w_shaped = pd.DataFrame({'User ID': total_touches.index, 'Touches': total_touches.values})
w_shaped['Credit'] = w_shaped['Touches'] * revenue_per_touch.mean()
w_shaped.loc[w_shaped['Touches'] == 1, 'Credit'] *= 0.4
w_shaped.loc[w_shaped['Touches'] > 3, 'Credit'] *= 0.1
w_shaped.loc[w_shaped['Touches'] == 3, 'Credit'] *= 0.3
w_shaped.loc[w_shaped['Touches'] == 2, 'Credit'] *= 0.2
middle_touch = data.groupby('User ID').apply(lambda x: x.iloc[1:-1]['Channel'].value_counts().index[0] if len(x) > 2 else None)
w_shaped['Credit'] = w_shaped.apply(lambda x: x['Credit'] * 1.5 if x['User ID'] in middle_touch else x['Credit'], axis=1)

# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Output results
print("U-shaped attribution:")
print(u_shaped[['User ID', 'Credit']])
print("\nRevenue by channel:")
print(revenue_by_channel)

Shapley Value Attribution

This method is based on cooperative game theory and assigns credit to each touchpoint based on its marginal contribution to the conversion event. The total credit is distributed among all touchpoints, taking into account their interactions and dependencies. Remember that Shap Values are in the unit of your dependent variable.

import pandas as pd
import numpy as np

# Load data
data = pd.read_csv('mta_data.csv')

# Calculate total touches per user
total_touches = data.groupby('User ID')['Channel'].count()

# Calculate revenue per touch
revenue_per_touch = data.groupby('Channel')['Revenue'].sum() / data.groupby('Channel')['User ID'].count()

# Define value function
def value_function(coalition):
return data[data['Channel'].isin(coalition)]['Revenue'].sum()

# Define Shapley value function
def shapley_value(group, players, value_function):
values = []
for i in range(len(players)):
coalition = players[:i] + [group] + players[i+1:]
marginal_contribution = value_function(coalition) - value_function(players[:i] + players[i+1:])
values.append(marginal_contribution)
return sum(values) / len(players)

# Calculate Shapley value attribution
shapley = pd.DataFrame({'User ID': total_touches.index, 'Touches': total_touches.values})
touch_channels = data.groupby('User ID')['Channel'].apply(lambda x: x.value_counts() / len(x))
touch_channels = touch_channels.reindex(data['Channel'].unique()).fillna(0)
touch_channels = touch_channels.groupby('User ID').apply(lambda x: x / x.sum())
shapley['Credit'] = touch_channels.apply(lambda x: shapley_value(x.name, touch_channels.index, lambda c: (touch_channels.loc[c] * revenue_per_touch).sum()), axis=1)

# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Output results
print("Shapley value attribution:")
print(shapley[['User ID', 'Credit']])
print("\nRevenue by channel:")
print(revenue_by_channel)

Logistic Regression Attribution

This method uses a logistic regression model to estimate the probability of conversion given each touchpoint. The credit is assigned based on the estimated impact of each touchpoint on the conversion probability.


import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

# Load data
data = pd.read_csv('mta_data.csv')

# Encode channels as binary variables
channels = pd.get_dummies(data['Channel'], prefix='Channel')

# Fit logistic regression model
model = LogisticRegression()
model.fit(channels, data['Converted'])

# Calculate logistic regression attribution
lr = pd.DataFrame({'User ID': data['User ID'].unique()})
lr['Credit'] = 0
for channel in channels.columns:
mask = channels[channel] == 1
touch_credit = model.predict_proba(channels[mask])[:,1] - model.predict_proba(channels[~mask])[:,1]
lr.loc[mask, 'Credit'] = touch_credit / (channels[mask].sum() / len(data))
lr = lr.set_index('User ID')

# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Output results
print("Logistic regression attribution:")
print(lr[['Credit']])
print("\nRevenue by channel:")
print(revenue_by_channel)

Markov Chain Attribution

Good old statistical mechanics is back! This method uses a probabilistic model to estimate the probability of moving from one touchpoint to another in the customer journey. The credit for each touchpoint is calculated as the expected number of conversions that would have been lost if that touchpoint had been removed from the customer journey.

In most Markov Chains we are making an assumption on the memory of the system to make the problem and its computation tractable.

import pandas as pd
import numpy as np

# Load data
data = pd.read_csv('mta_data.csv')

# Create transition matrix
channels = data['Channel'].unique()
transition_matrix = pd.DataFrame(0, index=channels, columns=channels)
for user_id in data['User ID'].unique():
user_data = data[data['User ID'] == user_id]
for i in range(len(user_data)-1):
current_channel = user_data.iloc[i]['Channel']
next_channel = user_data.iloc[i+1]['Channel']
transition_matrix.loc[current_channel, next_channel] += 1
transition_matrix = transition_matrix.div(transition_matrix.sum(axis=1), axis=0)

# Calculate Markov chain attribution
markov = pd.DataFrame({'User ID': data['User ID'].unique()})
markov['Credit'] = 0
for user_id in markov['User ID']:
user_data = data[data['User ID'] == user_id]
credit = 1
for i in range(len(user_data)-1):
current_channel = user_data.iloc[i]['Channel']
next_channel = user_data.iloc[i+1]['Channel']
credit *= transition_matrix.loc[current_channel, next_channel]
markov.loc[markov['User ID'] == user_id, 'Credit'] = credit
markov = markov.set_index('User ID')

# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Output results
print("Markov chain attribution:")
print(markov[['Credit']])
print("\nRevenue by channel:")
print(revenue_by_channel)

Deep Learning Attribution

I had to do it! I’m still reading on the topic so this part might get updated in the future.

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Load data
data = pd.read_csv('mta_data.csv')

# Encode channels as one-hot vectors
encoder = OneHotEncoder(sparse=False)
channels = encoder.fit_transform(data[['Channel']])

# Split data into train and test sets
train_size = int(len(data) * 0.8)
train_channels = channels[:train_size]
train_revenue = data['Revenue'][:train_size]
test_channels = channels[train_size:]
test_revenue = data['Revenue'][train_size:]

# Build neural network model
model = Sequential()
model.add(Dense(64, input_dim=train_channels.shape[1], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer='adam')

# Train model
model.fit(train_channels, train_revenue, epochs=50, batch_size=32, validation_data=(test_channels, test_revenue))

# Calculate Deep Learning Attribution
dl = pd.DataFrame({'User ID': data['User ID'].unique()})
dl['Credit'] = 0
for user_id in dl['User ID']:
user_data = data[data['User ID'] == user_id]
user_channels = encoder.transform(user_data[['Channel']])
user_revenue = user_data['Revenue']
credit = model.predict(user_channels)
dl.loc[dl['User ID'] == user_id, 'Credit'] = credit.sum()
dl = dl.set_index('User ID')

# Calculate revenue per channel
revenue_by_channel = data.groupby('Channel')['Revenue'].sum()

# Output results
print("Deep Learning Attribution:")
print(dl[['Credit']])
print("\nRevenue by channel:")
print(revenue_by_channel)

Some great resources I found

BECOME a WRITER at MLearning.ai

--

--

Shobeir Seddington

Quantum Roots, Data Dreams: A Journey Through Science, Life, and the Art of Possibilities. "Opinions expressed are solely my own."