Probabilistic Modeling The Pareto NBD model By Ramesh
Probabilistic Modeling The Pareto / NBD model By Ramesh Adavi The Pareto/NBD model 1
Introduction to the Pareto/NBD model • • • The value of a business is the aggregate of the Customer Lifetime Value of it’s customer base. In particular, the Pareto/NBD model is applied to situations where a customer can transact business at any time (continuous) and the organization doesn’t formally know if a customer is an active customer or not (non-contractual). Given the Recency , Frequency & Monetary value “history” of the customer base, and using Probabilistic Modeling (Bayesian) techniques, the Pareto/NBD model can estimate Expected Lifetime Value for each customer (and for the customer base); and also the components that constitute the lifetime value , such as: – Number of purchase transactions expected, – The expected lifetime, – The transaction intensity of each customer – The likelihood of dropping out Understanding customer behavior at this level of granularity is arguably the current gold standard in customer analytics The most endearing aspect of the model is that input data needs to be a simple transaction log , with 3 elements: – Customer ID – Date of transaction – Purchase Value of the transaction The Pareto/NBD model 2
Why Probabilistic Modeling? • Limitations of regression like techniques: – Essentially summarizes the data that’s all • There is little understanding about what is responsible for the “observed facts” – As reality changes, the error increases unless you have the current data (in all it’s entireity). • Tough performance requirements – Poor predictive performance • Overfitting the current data • So probabilistic models – Estimates “latent” parameters which explain current observation, and hence behavior • Recency, Frequency & Monetary value, in that order, explain behavior better than any other attributes Incl. demographics, etc. – Takes heterogeneity into account • “Segmentation” – Given RFM attributes of a customer, you can predict purchase behavior – Bayesian Learning • Learn (customer) behavior by estimating the parameters (purchase rate , dropout rate ) • New data brings in new insights! That’s what LEARNING ORGANIZATIONS do! • Predict the present, better! The Pareto/NBD model 3
The setting / the context / the problem The Pareto/NBD model 4
The Logic of Probability Models • The actual data-generating process that lies behind any given data on buyer behavior embodies a huge number of factors. • Even if the actual process were completely deterministic, it would be impossible to measure all the variables that determine an individual’s buying behavior in any setting. • Any account of buyer behavior must be expressed in probabilistic/random/stochastic terms so as to account for our ignorance regarding (and/or lack of data on) all the determinants • Rather than try to tease out the effects of various marketing, personal, and situational variables, we embrace the notion of randomness and view the behavior of interest as the outcome of some probabilistic process. • The proposed model of individual-level behavior that is “summed” across individuals (taking individual differences into account) to obtain a model of aggregate behavior. The Pareto/NBD model 5
Applications of Probabilistic Models • Summarize and interpret patterns of market-level behavior • Predict behavior in future periods, be it in the aggregate or at a more granular level (e. g. , conditional on past behavior) • Make inferences about behavior given summary measures (i. e. RFM). – Recency, Frequency & Monetary value – in that order are considered to be the most “significant” predictors of buying behavior • Profile behavioral propensities of individuals • Generate benchmarks/norms The Pareto/NBD model 6
Building the Pareto/NBD model i) Determine the marketing decision problem/ information needed. ii) Identify the observable individuallevel behavior of interest. – We denote this by x. x maybe # number purchase transactions made in period, lifetime of a customer, … iii) Select a probability distribution that characterizes this individuallevel behavior. – This is denoted by f(x|θ). – We view the parameters of this distribution ( ) as individual-level latent traits. Examples of parameters are -the purchase rate or the dropout rate, … – In the Pareto/NBD model - is modeled as Poisson distribution & is modeled as a Exponential distribution iv) Specify a distribution to characterize the distribution of the latent trait variable(s) across the population. – We denote this by g(θ). This is often called the mixing distribution. – The mixing distribution accounts for heterogeneity. All customers are not the same. Each may have different purchase rates, dropout rates, etc. v) Derive the corresponding aggregate or observed distribution for the behavior of interest: – f(x) = f(x|θ)g(θ)dθ. – Example f(x=# of purchases) is NBD & f(x=lifetime) is a Pareto type II vi) Estimate the parameters (of the mixing distribution), also known as hyper-parameters by fitting the aggregate distribution to the observed data. Here is where Bayesian techniques kick in – A popular Bayesian technique for “hyper-parameter estimation is MAXIMUM LIKELIHOOD ESTIMATION. vii) Use the model to solve the marketing decision problem/provide the required information. The Pareto/NBD model 7
Estimating CLV The expected lifetime value of an as-yet-to-acquired customer is given by where • E[v(t)] = expected value (or net cashflow) of the customer at time t (if alive) • S(t) = the probability that the customer is alive beyond time t • d(t) = discount factor that reflects the present value of money received at time t Standing at T, the expected residual lifetime value of an existing customer is given by The Pareto/NBD model 8
Pareto/NBD Model Assumptions • Occam’s razor – Balance between “accuracy” and “complexity” – You add assumptions (hypotheses) so that you approach the “truth” – step-bystep • Assumptions of the Pareto/NBD model – 5 assumptions. Note parameters added with every assumption: (i) While active, the number of transactions made by a customer in a time period of length t is distributed Poisson with mean t. (ii) Heterogeneity in the transaction rate across customers follows a gamma distribution with shape parameter r and scale parameter . (iii) Each customer has an unobserved “lifetime” of length . This is point at which the customer becomes inactive is distributed exponential with dropout rate . (iv) Heterogeneity in dropout rates across customers follows a gamma distribution with shape parameter s and scale parameter . (v) The transaction rate and the dropout rate vary independently across customers. The Pareto/NBD model 9
A sample - the CDNOW data base • New customers at CDNOW, 1/97– 3/97 • Systematic sample (1/10) drawn from panel of 23, 570 new customers • 39 -week calibration period • 39 -week forecasting (holdout) period • Initial focus on transaction “flow” • Transaction “flow” describes underlying behavior very well. • Recency and then Frequency are the most important predictors of behavior. • Average transaction value of each customer is quite a stable indicator of Monetary value ID = 0001 o x ID = 0002 o ×. . . ID = 1178 o ID = 1179 o. . . ID = 2356 ID = 2357 Week 0 × x o × o 3/97 ×× × T =Week 39 tx = First transaction to last transaction before calibration period ends x = number of “repeat” transactions during calibration period T. cal = Period from first transaction to T (end-ofcalibration period) The Pareto/NBD model 10
The Pareto/NBD Model for transaction flow behavior Transaction Process: • While alive, the number of transactions made by a customer follows a Poisson process with mean transaction rate λ. • Heterogeneity in transaction rates across customers is distributed gamma(r, α). Latent Attrition Process: • Each customer has an unobserved “lifetime” of length τ which is distributed exponential with death rate μ. • Heterogeneity in death rates across customers is distributed gamma(s, β). Summary statistics: • Given the model assumptions, we do not require information on when each of the x transactions occurred. • The only customer-level information required by this model is recency and frequency. • The notation used to represent this information is (x, tx, T), • x is the number of transactions observed in the time interval (0, T] • tx (0 < tx ≤ T) is the time of the last transaction • T is end of calibration period. From this T. cal for each customer is calculated [T – (date-of-firsttransaction)] The Pareto/NBD model 11
The Results that the model will provide ASSESSING CURRENT OPERATIONS 1. E[X(t)] - the expected number of transactions in a time period of length t , which is central to computing the expected transaction volume for the whole customer base over time. 2. P[X(t)=x], the probability of observing x transactions in a time period of length t PREDICTION: 1. P(Alive | x, t. x , T) , the probability that customer with a given history (x, tx, T) is alive the day after T 2. E[X(T, T+t) |x, tx, T], the expected number of transactions in the future period (T, T + t] for an individual with observed behavior (x, tx, T) 3. DERT (d | x, tx, T ) – for a given discount rate, how many “discounted” transactions will a customer with an observed behavior x, tx & T make in his lifetime The Pareto/NBD model 12
- Slides: 12