The Marketing Science Signal

Analytics Shoot-Out: Mike vs. Agentic AI

February 9, 2026
I like to think of myself as a capable analyst. I’ve been analyzing modeled output and large datasets for many years now, so I was naturally skeptical that an AI Agent could outperform me when it came to interpreting marketing data.

To test this, I staged a “shoot-out.” I built an AI Agent using the Gemini 2.0 Flash LLM, integrated my existing XGBoost propensity model code, and compared its analysis of the Bank Marketing Dataset to my own findings.

I was in for a surprise.

It is hard to compete with an always-on AI agent in terms of speed and scale. While a human analyst might identify top leads, an AI Agent has the capability to analyze each lead individually and generate marketing recommendations tailored to their specific demographics in seconds. This level of micro-segmentation at scale is simply not practical for a single analyst.

However, my experiment also proved that there are still massive advantages to having a “Human-In-The-Loop.”

The Foundation: The Propensity-to-Buy Workhorse

In a previous article, I built an ML classifier using the Bank Marketing Dataset from the UC Irvine repository (representing a real-world Portuguese bank campaign from 2008–2010). My initial human interpretation focused on the forensic view. Here is a sample of the data:

The model had good accuracy:

In my initial article on propensity-to-buy, I had four key findings based on the most influential model features:
- Primary Driver: Communication Method (Cellular). This was the strongest predictor of a purchase.
- The “Momentum” Effect: Success in previous campaigns (poutcome_success) was a massive indicator of future conversion, validating the importance of RFM (Recency/Frequency/Monetary) models.
- Financial Stability: Customers without housing loans and higher bank balances correlated positively with conversion.
- Timing: Outreach in specific months (June/March) influenced the model, suggesting seasonal adjustments for telemarketing programs.
The Challenger: The AI Agent

I wanted to see if an AI Agent could run my model and interpret the findings in a way that was actually actionable for a sales team. I provided the Agent with these System Instructions:

“You are a Senior Strategic Marketing Agent. Analyze these top leads, provide personalized pitch strategies, and reference specific rows to justify your advice.”

First, the Agent provided a profile of the Top 500 Leads, identifying that while the demographics varied, the common thread was the high propensity score and the success of previous outcomes.

Note: As shown in the chart above, there isn’t a direct correlation between account balance and propensity—intent is far more complex than just a bank balance.

The Agent’s Strategic Brief (Verbatim Output):

Lead Profile Summary: The dataset contains a variety of demographics. Most leads own houses, don’t have loans, and prefer cellular contact. “Success” in previous campaigns is a dominant indicator.

Strategy 1: Young to Middle-Aged Professionals (Leads 1 & 2)
- Focus: Career advancement and financial stability. Highlight long-term planning.
- Subject Line: “[Name], Unlock Your Management Potential with [Product/Service]”
Strategy 2: Students (Lead 3)
- Focus: Education and skill-building. Promote affordable solutions.
- Subject Line: “[Name], Ace Your Studies and Future Career with [Product/Service]”
The Tree Map Reveal

The Agent then generated a tree map to visualize these segments. It quickly identified that Management and Administrative job titles weren’t just numerous, they represented a massive volume of the bank’s total account balance opportunity.

Agent Detailed Data Output And Top Lead Strategies

The dataset contains a variety of individuals with varying demographics, including:
- Age: Wide range, from teenagers to retirees.
- Job: Diverse occupations, including management, students, self-employed, technicians, etc.
- Marital Status: Mostly married or single.
- Education: Secondary and Tertiary education are common, with a few having Primary.
- Balance: Varies significantly.
- Housing/Loan: Mostly own houses and don’t have loans.
- Contact: Cellular is the primary mode of contact.
- Poutcome: “Success” in previous campaigns is a strong indicator.
- Propensity Score: All leads have very high propensity scores, indicating a high likelihood of conversion.
1. Strategy for Young to Middle-Aged Professionals (e.g., Leads 1 & 2):
- Focus: Career advancement and financial stability. Highlight products or services that aid in professional development, investment opportunities, or long-term financial planning.
- Messaging: Emphasize efficiency, time-saving benefits, and the potential for increased income or wealth.
- Channel: Utilize digital channels such as email, LinkedIn, and targeted online advertising.
- Personalized Email Subject Line (Lead 1): “\[Name], Unlock Your Management Potential with \[Product/Service]” Personalized Email Subject Line (Lead 2): “\[Name], Enhance Your Financial Security with \[Product/Service]”
2. Strategy for Students (e.g., Lead 3):
- Focus: Education, skill-building, and prospects. Promote products/services that enhance their learning experience, provide career guidance, or offer affordable financial solutions tailored to students.
- Messaging: Emphasize affordability, accessibility, and the potential for boosting their resume and job prospects.
- Channel: Utilize social media platforms popular among students (e.g., Instagram, TikTok), student-focused websites, and university partnerships.
- Personalized Email Subject Line (Lead 3): “\[Name], Ace Your Studies and Future Career with \[Product/Service]”
AI Agent Important Considerations:
1. “Success” Poutcome: Since the “poutcome” is “success” for a prior campaign for all of these individuals, emphasize the proven track record and build on their positive past experiences.
2. A/B Testing: Continuously test different messaging and channels to optimize results within each segment.
3. Respect Privacy: Always adhere to privacy regulations and provide an easy way for recipients to opt-out of communications.
The Verdict: It’s a Draw (and a Win) – Human in the Loop is best!
- Mike has the edge relative to Context and Governance. The Agent suggested using TikTok to reach students—but TikTok didn’t exist when this data was collected in 2008! I know how to interpret data within its historical and social context.
- The AI Agent outperforms relative to Scale and Creative Iteration. It took my complex XGBoost output and, in seconds, generated three distinct GTM playbooks and 500 personalized subject lines; tasks that would take a marketing team days to do manually.
The Real Winner: The organization that uses a human-tuned predictive foundation (XGBoost) to feed an Agentic Execution Layer (Gemini). This combination allows us to operate with the precision of a statistician and the speed of an AI.

References & Further Reading
- Moro, S., Cortez, P., & Rita, P. (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31. [The definitive study for the Bank Marketing Dataset].
- UCI Machine Learning Repository. Bank Marketing Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
- Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [The foundational paper for the XGBoost algorithm].
- Google DeepMind. (2023). Gemini: A Family of Highly Capable Multimodal Models. Google Technical Report. [Context for the Agentic LLM used in the shoot-out].
- Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). RFM and CLV: Using Iso-Value Curves for Customer Base Analysis. Journal of Marketing Research. [Supporting the Recency, Frequency, and Monetary value methodology].
- American Statistical Association (ASA). Ethical Guidelines for Statistical Practice. [Reference for the “Human-in-the-Loop” governance and ethical AI interpretation mentioned in the verdict].
The Hybrid Forecast: Integrating Field Sales “Expert Opinion” with Deep Learning Ensembles

January 25, 2026
I am always surprised when I join a company to find that the GTM and Finance functions are still relying solely on Excel spreadsheets and field sales “expert opinion” (the Delphi Method) for forecasting. This persists despite the wealth of statistical, machine-learning, and deep-learning methods available today. Often, this reliance stems from a lack of trust in “black box” technologies; if leadership doesn’t understand the path a model takes from Point A to Point B, their hesitation is understandable—especially when financial performance and board-level predictions are on the line.

However, transitioning to modern forecasting doesn’t require abandoning qualitative insights. In fact, expert human opinion can be directly integrated into ML and statistical frameworks to improve accuracy (e.g., using field sales input to refine deal-size estimates within CRM data like Salesforce). In this article, I examine three techniques my teams and I have used ranging from traditional modeling to Deep Learning.

On a related note, as modelers, it can be tempting to pre-select a favorite technique and assume it is the best fit for the problem. However, I was trained to always evaluate at least three distinct approaches to identify which produces the highest predictive accuracy with the lowest error.

In this pursuit, statisticians often follow Occam’s Razor (the principle of parsimony): the philosophical rule that when competing models explain data equally well, the simplest model should be preferred. In this case, that would be a relatively simple univariate time series. However, when encountering significant unexplained variance, additional predictors—exogenous variables—are required. While a model like SARIMAX can outperform a standard ARIMA by accounting for these external factors, every additional variable risks increasing model complexity and error if not managed correctly. Similarly, while neural networks rule the world of Big Data, they can significantly underperform if the dataset is too small.

Rather than choosing a winner in advance, this three-model shoot-out guarantees the best possible prediction for my selected business case.

Detail

1. SARIMA: The Statistical Foundation

This is the traditional statistical method. It looks strictly at the history of a single variable (i.e. Won Revenue) for prediction. It decomposes data into its past values (Autoregressive), its past errors (Moving Average), and its repeating cycles (Seasonality).

2. SARIMAX: The Context-Aware Statistical Model

The “X” stands for eXogenous variables. Building on SARIMA with additional features to explain random variation, SARIMAX looks at the calendar plus external factors—like sales account manager’s Forecasted Revenue, marketing spend, or economic indicators. It provides the power of time series + linear regression.

3. LSTM: The Deep Learning Memory Model

As François Chollet explains in Deep Learning with Python, LSTMs are a specialized type of Recurrent Neural Network (RNN). While traditional models may forget the beginning of a sequence by the time they reach the end, LSTMs have a carry (Cell State)—a way to keep track of long-term dependencies. Unlike SARIMA, which uses fixed formulas, the LSTM creates its own features through layers of neurons. It is Long Short-Term because it decides what to remember (Long-term) and what to throw away (Short-term).

The Dataset and Exploratory Data Analysis (EDA)

Sparse Data Cannot Be Used For Forecasting

My first attempt to do a forecasting technique shoot-out leveraged an SFDC/CRM-style dataset (Source: Chioma Iwuchukwu – Sales Funnel Revenue Forecast). The raw data captured roughly 100 days of activity starting January 1st. While this seed data provided a realistic snapshot of B2B deal flow, it presented a cold-start problem: with only ~18 actual “Won Revenue” events, the data was too sparse for deep learning models to distinguish between a recurring pattern and random luck.

Sometimes the data is so bad that it cannot be used for forecasting, and despite my best efforts this was as close as I could get with a n=18 ultra-sparse dataset that looked like this:

As a result, my forecasts were way off the mark:

In its original state, the data provided a snapshot of performance but was limited in volume, making it difficult for deep learning models like LSTMs to generalize without overfitting. A review of the summary statistics (df.describe()) reveals a high standard deviation (275% of the mean for Won Revenue) and an enormous range (0 to $46K for Won/Loss) across all key variables. In a B2B context, this indicates a high-variance environment where individual large deals can significantly swing daily totals. Another indicator was the fact that the mean for Won Revenue was $3,986 while the median was $0.

The Augmented Dataset (n=200): Scaling for Deep Learning

To do a comparison of techniques, I had to do a significant clean-up and rebuild of the n=100 dataset, and so I utilized synthetic data augmentation to expand the dataset to 200 daily observations that captured the underlying structure of the initial data but removed the volatility while capturing the weekly cadence and growth of the initial dataset. This was a deliberate data engineering step to create a robust training environment for the LSTM Neural Network. This augmentation was calibrated to mirror the original B2B cycle mathematically:
- Deterministic Trend: I preserved the original growth trajectory, ensuring the models evaluate a business that is scaling.
- Seasonal Harmonic: Using a sinusoidal function, I reinforced the 7-day weekly cycle. This captures the essential B2B weekend dip and mid-week peak patterns (shown below).
- Stochastic Noise: I injected Gaussian Noise (random variance) to simulate market volatility. This forces the LSTM to learn signal over noise—distinguishing between a structural shift and daily chatter.
- Lookback Depth: Doubling the volume allowed for a 14-day sliding window. This gives the neural network enough temporal depth to learn from two full business cycles before making its next prediction.
Analysis of Variability and Decomposition

For the new dataset, things look much more promising. I removed the extreme observations (outliers) because the three $45,000 observations alone were adding $1,350 to the daily average. The standard deviation is now around 0.19 off the mean because the max is much lower, and so the range is much tighter (min:max). Another check showed that the median was $5,236 — really close to the $5,249 mean indicating a good symmetry and low skewness. It is ready to use.

Here are some bar charts to show the daily trends, which are not highly variable but do show the weekend sales dip:

Typically, Time Series Decomposition (as shown in the chart below) allows us to strip away the noise to see the underlying mechanics:
1. Trend: A clear, 15% positive slope indicating long-term growth.
2. Seasonality: A heavy daily/weekly influence.
3. Residuals: A significant amount of randomness.
Because of this high residual noise, a pure time series model like ARIMA (which only looks at past values) is likely to be insufficient. This is why I have utilized SARIMAX to incorporate Forecasted Revenue (the field sales pipeline, which is opinion based and gives us a blend of business intuition and machine learning) as an exogenous variable and an LSTM to capture non-linear relationships that traditional statistics might miss.

By increasing the density of the dataset, I have smoothed the influence of extreme outliers, ensuring that the resulting forecast is a reflection of systemic performance rather than a reaction to a few bad days.

Forecasting Model Results

The logic is driven by the test size = 21 variable. In time-series forecasting, we typically set aside a test set to validate how well the models perform against real data.
- Total Dataset (n=200): Since the frequency is daily (freq=’D’) the dataset covers approximately 6.5 months (from January 1st to mid-July).
- Training Period: The first 179 days.
- Forecast (Test) Period: The final 21 days.
Based on the measures of predictive accuracy (below) SARIMAX is the winner, followed closely by LSTM. This is not surprising, since we saw the randomness (exogenous factors beyond trend and seasonality) that were driving sales, and these two techniques are able to capture some of that noise, especially because they are incorporating human opinion (similar to the Delphi Method) by adding the sales pipeline data into the model. Sales account managers know things that are out of the model’s sight: if a key contact just quit (or joined) a company, competitor price competition/promotions, new product launch timing, etc. The forecast now includes this information.

The fact that the SARIMAX model really outperformed the ARIMA model shows that, in this case at least, human opinion can add a great deal to machine learning models, and for this dataset (developed using past historical forecast and actual data) the two are highly correlated:

So, humans still have a place in the world of AI!

Citations
- Chollet, F. (2021). Deep Learning with Python (2nd ed.). Manning Publications. (For logic regarding LSTM architecture and temporal representations).
- Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. (For data visualization and time-series plotting).
- Harris, C. R., et al. (2020). Array programming with NumPy. Nature. (For synthetic data generation and mathematical arrays).
- Iwuchukwu, C. (2024). Sales Funnel Revenue Forecast [Dataset]. GitHub/Personal Collection. Augmented and expanded to n=200 using Python synthetic generation techniques (2024).
- Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education. (For the application of SARIMA/SARIMAX in a business revenue context).
- Pandas Development Team. (2023). pandas-dev/pandas: Pandas 2.0.0. Zenodo. (For data manipulation and time-series resampling).
- Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference. (For SARIMA and Seasonal Decomposition implementation).
- Synthetic Revenue Dataset. (2024). Enlarged B2B Sales Funnel Forecast Data [Generated Dataset]. (Derived from original Sales Funnel Revenue patterns using Python-based augmentation).
The Multi-Channel Force Multiplier: How Bridging Digital Nurture and Direct Outreach Triples Conversion Lift.

January 19, 2026
The Problem: Beyond the “3+ Rule”: It is widely accepted that a synergistic media mix will always outperform a single media vehicle. Historically, the industry adhered to the “3+ rule”—popularized in the 1970s—which suggested that three exposures to a message were required to influence a purchase. In the digital age, however, that threshold has risen to a frequency of 7+ or more. While media and advertising agencies have used techniques like linear programming and mainframe-based syndicated survey data since the 1980s to optimize these mixes, modern integrated marketing campaigns require a more sophisticated touch.

The Implementation Gap: Throughout my career, my data science teams and I have built media mix optimization models for numerous B2B companies. While these models are often adopted in principle at the executive level, they frequently prove too complex for practical implementation. Often, marketing groups remain so tactically focused that executing a systematically integrated campaign feels unfeasible, rendering the optimization an “academic exercise.” Although modern vendors provide multi-touch attribution (MTA) methods to track touches against opportunities and allocate budgets, the real value lies in using this data as a foundation for deeper optimization work.

A Strategic Priority for the CMO: In practice, CMOs—such as Todd Forsythe and Jonathan Martin—leverage these models to calibrate marketing budgets and enhance overall effectiveness. Building a media mix model is typically the first task I undertake when launching a new data science practice. It is a baseline expectation that a CMO understands the optimal mix for generating pipeline and revenue and allocates their budget accordingly.

Innovation through Association Rules: My approach to media mix modeling has evolved toward leveraging association analysis. The idea originated from Ling (Xiaoling) Huang, and I refined the methodology through collaborations with Yexiazi (Summer) Song and, most recently, Fuqiang Shi, to develop models for diverse business units.

According to Miller (2015), this technique is commonly referred to as Market Basket Analysis, a concept born in retail:

“Market basket analysis, (also called affinity or association analysis) asks, what goes with what? What products are ordered or purchased together?”

By applying this retail-focused logic to media, we can uncover the hidden relationships between marketing channels. As Zhao et al. (2019) noted:

“The challenge of multi-channel attribution lies not just in identifying the final touchpoint, but in uncovering the hidden synergies where the presence of one marketing stimulus significantly amplifies the effectiveness of another.”

Zhao et al. (2019)

The Data: Constructing a Hybrid Marketing Funnel

To demonstrate this methodology, I synthesized a consolidated dataset of over 45,000 interactions by merging the UCI Bank Marketing Dataset with the Kaggle/Criteo Multi-Touch Attribution (MTA) Dataset. While these sources represent different industries—fixed-term bank deposits and e-commerce—their combination provides a comprehensive view of the modern buyer’s journey, blending high-frequency digital touchpoints with high-touch personal outreach.

Dataset Profiles

1. UCI Bank Marketing Dataset

This dataset is famous for being highly imbalanced, which is a “real-world” marketing scenario.
- Non-Converters: About 88% of the rows. Customers who were called but did not subscribe to the term deposit.
- Converters: About 12% of the rows.
- Why it matters: Because most people don’t convert, when the model finds a channel like Mobile Outreach that has a high Lift, it’s mathematically significant.
2. Kaggle/Criteo Attribution Dataset

This dataset is designed for Multi-Touch Attribution (MTA).
- Non-Converters: These are “Journeys” where a user clicked on ads (Email, Display, etc.) without converting to a sale.
- Converters: Customer journeys that resulted in sales.
- Why it matters: this helps improve marketing effectiveness and efficiency by identifying waste (channels that people click on but never lead to a sale).
Data Preprocessing

Raw data labels were standardized into a unified “Media Mix” master dataframe to ensure consistency across sources:
- (Kaggle MTA) Email Nurture: Represents the high-volume digital baseline.
- Cellular (UCI Bank Mkt.) Mobile Outreach: Represents direct personal contact via mobile.
- Telephone (UCI Bank Mkt.) Telemarketing: Represents landline-based outreach.
The preprocessing phase involved concatenating the sources, indexing, removing non-essential characters, and handling missing values. I then formatted the dates and standardized the channel categories to enable a seamless cross-platform analysis.

The Data: A Hybrid Funnel Approach

By merging these records, we are modeling a hybrid B2C customer journey. This reflects the reality of high-value industries like financial services, where a customer might be prompted by a high-volume digital ad (Criteo) but requires a personal, high-touch mobile conversation (UCI) to finalize a complex transaction.

This balanced view allows the association rules to uncover synergies across the entire funnel, rather than looking at digital or offline channels in isolation.

Apriori Methodology

Using the Apriori Algorithm (Raschka, 2018, Agraval et al. 1996), I processed over 45,000+ interactions to identify Synergy Lift which is one of several measures of impact. According to Miller (2015) “an association rule is a division of each item set into two subsets with one subset, the antecedent, thought of as preceding the other subset, the consequent. The Apriori algorithm … deals with the large numbers of rules problem by using selection criteria that reflect the potential utility of association rules.” Essential the rule is Antecedent (marketing mix) à Consequent (conversion). Here are the key measures:
- Support (scale) how often this mix occurs:
  - Support = occurrences of rule (mix+conversion) / total customer base
- Confidence (predictability) how often a customer buys when exposed to this mix:
  - Confidence = ratio of conversions for each media mix combination.
- Lift (synergy) how well this mix performs vs. the average:
  - Lift = Confidence/Support (consequent or avg. conversion rate)
Model Output

Once I ran the Apriori algorithm and selected the top rules by highest lift (mix performance vs. the average) it was clear that the volume of marketing interaction was not aligned to conversion potential. The chart below has to be interpreted with caution, and illustrates the danger of looking at any channel in isolation, because email looks high volume/ low lift however it often occurs in high-potential integrated combinations.

Lift (>1) is the force multiplier and so that is how I evaluated the combinations for effectiveness in converting customers. For example, many combinations without mobile, direct mail, email and telemarketing had high impact. Activities that happen on the right-hand side of the arrows (>>) are happening at the same time as the customer conversion (Converted), and so should be considered part of the overall mix.

New product launches, combined with brand awareness, email and retargeting had the most impact, followed by a similar mix that replaced launches with discounting – both are time sensitive calls to action and so this makes sense. Activities on the left-hand side of the arrows are typically the ‘nurture’ phase while the right-hand side is the conversion event.

This visual shows the top media mix combinations and their performance relative to the baseline. So, for a financial services marketer this would be the roadmap for funding and executing integrated marketing campaigns:

If we want to look at combinations to check a particular pair of marketing channels, or create a particular tactic, a correlogram like the one below shows the pairs with the most lift.

Optimizing Marketing Budgets: A Data-Driven Approach

From a funding perspective, analyzing our 55,211-record blended dataset through Ridge regression allows us to move beyond raw interaction volume to true contribution. By generating and normalizing beta coefficients, we can isolate the unique impact of each channel on the final conversion event, providing a mathematical foundation for marketing spend allocation.

Based on this specific analysis, here is the performance breakdown of the primary drivers and their normalized contribution to conversion:

Summary
- The Efficiency of Retention: Email and Referral show the highest normalized contribution ($~20\%$ each), suggesting that “warm” audience paths are the most reliable foundation for the budget.
- The Synergy Mandate: Funding should prioritize synergistic pairs rather than siloed channels. For example, the high weights of Social Media and Search Ads suggest they function best when funded in tandem to capture both interest and intent.
- Awareness as Air-Cover: “Brand Awareness” channels (Social/Display) provide the necessary air-cover for time-sensitive, high-conversion calls to action like New Product Launches and Discount Offers.
Citations:

Criteo Labs (2018). Criteo Attribution Modeling & Bidding Dataset. Kaggle. Available at: https://www.kaggle.com/c/criteo-attribution/data

Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

Moro, S., Cortez, P., & Rita, P. (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, 62, 22-31. Elsevier. https://doi.org/10.1016/j.dss.2014.03.001

Raschka, S. (2018). MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. Journal of Open Source Software, 3(24), 638. https://doi.org/10.21105/joss.00638

Zhao, K., et al. (2019). Deep Learning and Association Rules for Multi-Channel Attribution. In Proceedings of the 2019 International Conference on Data Mining & Marketing Analytics.

UCI Machine Learning Repository. (2012). Bank Marketing Dataset. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
From Accounts to Contact Personas: a data-driven framework for B2B targeting and segmentation – when contacts matter.

January 11, 2026
Introduction & The B2B Modeling Hierarchy

In my experience, B2B contact data lacks the predictive weight found in B2C or subscriber databases. Having managed contact data at Hearst, Cisco, and DellEMC, I’ve seen firsthand that while contact attributes (PII, job titles, and history) are essential for execution, they are often secondary for targeting.

Because contact data is lower in dimensionality and higher in volatility, it rarely contributes to B2B targeting models in a statistically significant way. Even in “greedy” models utilizing 250+ features (including firmographics, purchasing history, competitive install, intent and installed-base telemetrics) the company-level attributes consistently push out contact-level demographics.

Machine learning and statistical algorithms, (such as logistic regression, SVM, Random Forest and XGBoost), will almost prioritize company-level variables, often rejecting contact-level data as “noise” that offers negligible improvement in predictive accuracy.

Starting with the Account

B2B propensity-to-buy (also respond or churn likelihood), RFM or cluster segmentation, and Customer Lifetime Value are all fundamentally based on the company (or account). This has always been the case. To maximize impact, start with the company.

Typically, the most influential variables (features) for these models include:
- Past Purchases (RFM): The strongest indicator of future behavior.
  - Recency, Frequency, and Monetary values at the account level provide the historical baseline for any CLV calculation.
- Firmographics: Standardizing variables like Industry and Revenue is critical.
  - These act as the primary “branches” in decision-tree models like XGBoost to segment high-value targets.
- Engagement Data: Third-party syndicated media usage and first-party website traffic.
  - “Intent” signals, indicate where an account is in the buying journey.
- Pipeline Dynamics: Lead volume and velocity.
  - How quickly multiple contacts from the same account are engaging.
- Market Context: Installed base telemetrics (usage) and competitive product footprint.
  - Gap analysis of current products that are at maximum use or underutilized for cross-/ up-sell, and identifying accounts that are primed for a displacement campaign based on service contracts or competitive product purchases.
However, once the targeting or segmentation scheme is developed, that is the time that contact penetration, quality and associated attributes become critical for go-to-market execution.

Maintaining a Contact Data Foundation.

The foundation for contact data should ideally be in place prior to program execution. This means achieving high contact penetration in key functional areas with established brand awareness and permissions. Because contact data is notoriously volatile, maintenance must be an ongoing process to prevent decay.

Technical Note: For this analysis, I generated synthetic data mirroring a typical B2B tech environment. All data wrangling and modeling were performed using Python (pandas/NumPy) in a Jupyter Notebook, with Matplotlib/Seaborn for graphics and Scikit-Learn for K-Means clustering.

Contact Profiling: Turning Noise into Segments

Marketing to hundreds of thousands of free-form job titles is impossible at scale. To develop a systematic approach, a contact hierarchy is critical. While companies once had to build internal mapping tables, most modern vendors now provide standardized hierarchies to assess quality and facilitate execution.

Start with a demographic profile of contacts currently in the marketing data lake or datamart. The table below is an illustration of a basic contact profiling scorecard that can be used to analyze and track the total marketable contact data foundation.

Hypothetical Contact Profiling Scorecard

Addendum: Contact Data Health Checklist

Framework Reference: Forrester (formerly SiriusDecisions) Data Strategy Standards

To ensure the “Workhorse” models discussed in previous articles perform at peak efficiency, I recommend auditing your contact database against these five critical health benchmarks:
- Accuracy: >95% for core predictive features (Job Title, Industry).
  - Scientific Impact: High accuracy reduces “label noise” and improves the gain in classification trees.
- Density: 100% completeness for critical path fields (Email, Account Name, Contact Name, Title).
  - Scientific Impact: Eliminates data sparsity, ensuring your models don’t rely on biased imputations.
- Timeliness/Validity: <12 months since last verification.
  - The Decay Factor: B2B data decays at ~2.1% per month (25% annually). Records older than one year significantly increase bounce rates and skew survival analysis.
- Consistency: 100% standardization on categorical variables (Country, Company Name, Seniority).
  - Scientific Impact: Standardizing “US” vs “USA” is essential for Wickham’s (2014) Tidy Data principles and ensures correct feature grouping.
- Buying Group Linkage: >3 contacts mapped per target account.
  - Strategic Impact: Essential for moving from individual “Lead Scoring” to “Account-Based Propensity” models.
I wanted to work from a single synthetic dataset to maintain continuity. For the dataset I’ll be using (below), here is a basic job title distribution to use as a starting point and later we will do some basic Persona mapping.

Fitting to the Account Target

Further cross-tabs can be done on the contact data to ascertain whether the contacts exist in high or low potential geographies, industries, etc.

There are three ways to do this. If only a few features are available or a specific industry is being targeted, then match the contacts to the companies:

Contact Distribution by Country and Industry — Top 10 in Rank Order.

Once the foundation is set, cross-tabulations can determine if contacts exist within high-potential geographies or industries. We can visualize this through a Tree Map of the top countries and industries. The larger the box, the higher the concentration of a specific job title within that segment.

Tree Map: Top 10 Countries, Industries and Job Titles based on contact coverage.

The “Sweet Spot” for Sales and Marketing

For a synthesized view, we return to the Propensity-to-Buy (P2B) models. By identifying the top job titles within high-propensity accounts, we find the “sweet spot” for marketing spend.

Propensity to Buy Customer Distribution

Here we have the top job titles for high-propensity accounts initially in a bar chart (below):

Taking segmentation a step further, we use K-Means clustering to group companies by Industry, Country, and P2B score. According to Tan et al. (2019),

“Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. The goal is that the objects within a group be similar or related to one another and different from (or unrelated to) the objects in the other groups. The greater the similarity (or homogeneity) within a group and the greater the difference between groups, the better or more distinct the clustering.”

This allows us to drill down into specific personas (e.g., TDMs and Strategic Managers) within “High Value/High Growth” clusters.

Distribution of contact personas in the High Value cluster segment.

Program Execution

Fictitious PII synthetic data to illustrate a final list of contacts in high propensity-to-buy companies. Now the hypothetical program is ready for execution.

Illustration only: synthetic data not intended to represent actual persons or PII.

Conclusion

By prioritizing account fit over contact volume, marketing effectiveness improves across every metric:
- Lower cost-per-lead.
- Higher response rates through relevance.
- Better pipeline quality and higher conversion rates.
- Increased revenue.
Targeting precision in B2B marketing isn’t about how many contacts you reach, but about reaching the right people within the right organizations. Start with the account to find your target, then use high-quality contact data to hit it—this is the most reliable path to maximizing both ROI and market impact.

Citations

Forrester Research. (2024). The B2B Marketing and Sales Data Strategy Toolkit. [Online]. Available at: https://www.forrester.com/report/the-forrester-b2b-marketing-and-sales-data-strategy-toolkit/RES172091

Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

Integrate. (2025). Implementing the B2B Data Quality Toolkit: Standards for 2026. [Online]. Available at: https://www.integrate.com/resources/b2b-data-quality-toolkit

Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to data mining (2nd ed.). Pearson Education.

Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10
Plugging the Leak: Gradient Boosting and Survival Analysis for Customer Retention

January 3, 2026
Why churn modeling?

The “Leaky Bucket” metaphor was first popularized by subscription marketers using CRM in the ‘90’s and I see it finally making its way into B2B marketing as a result of XaaS and Cloud technology.

Coming from subscriber marketing, the idea of customer churn and retention were always paramount for B2C marketers. With the rise of XaaS and Cloud technology, the ‘Leaky Bucket’ metaphor—a staple of 90s CRM—has evolved from a B2C concept into a mission-critical B2B framework.”

Churn modeling fits nicely with Propensity-to-Buy and Customer Lifetime Value because it employs many of the same techniques. There are two ways to look at churn which require methodologies we’ve seen in my past articles:

What is the likelihood of a customer leaving in the next 30 days? This requires a statistical or machine-learning classifier such as XGBoost (which I used in my article on propensity to buy). Here we want a list of customers that have a high churn likelihood, so the target for prediction is changed from purchase to churn (yes/no).

When will a customer churn? Knowing the timing of churn means turning to survival modeling, which I employed in my article on Customer Lifetime Value analysis, and here the recommended python library is called Lifelines. There are also some great visualizations of survival probability of different customer segments that can be generated, and I think that is an excellent way to understand the identify the drivers and profile the customers who churn – whether they are people (B2C) or businesses (B2B accounts).

The Dataset

The IBM Telco Customer Churn dataset is a famous dataset used by data scientists to predict churn and work with marketers to develop customer retention strategies. It is a fictional telecommunications customer database, providing a mix of demographic, service, and financial data. It consists of data on 7,043 customers with 21 features including:

Demographics: Information about the customer’s gender, age range (Senior Citizen), and whether they have partners or dependents.

Account Information: How long they’ve been a customer (tenure), their contract type (Month-to-month, One year, Two year), payment method, paperless billing, and charges (MonthlyCharges and TotalCharges).

Services: Specific services the customer has signed up for, including Phone, Multiple Lines, Internet (DSL, Fiber Optic, etc.), Online Security, Online Backup, Device Protection, Tech Support, and Streaming TV/Movies.

The Target: The Churn column, indicating whether the customer left within the last month (Yes/No).

Exploratory Data Analysis (EDA) and Data Quality

An analysis of the dataset showed that it was very complete, with only 11 missing Total Charges so other than a few data transformations (strings à integers) I wasn’t concerned about doing a lot of cleaning and data manipulation.

Since the data had a field for Churn (Yes/No) a customer profile could be generated which provided a lot of insight into churned customers:

From these charts, we can see that locking customers in with long term contracts and automatic withdrawal seems a good retention strategy, as churners tend to be on monthly contracts and paying by check. Perhaps Senior Citizens are more price-sensitive and have bandwidth to shop for discounts, but that would have to be tested. I also generated a correlation matrix which showed the correlation of the features to customer churn as another way of looking at the characteristics of churned customers:

The Classification Question: Will they leave? (XGBoost).

Real-world tradeoffs in modeling the probability and timing of customer churn.

There are a lot of features that can be used together to predict churn from this dataset. The first question is “will a customer churn”? So, I think of this as a propensity to churn model which classifies churners based on the statistical probability of churn (essentially a propensity to buy model with the target changed from “purchase” to “churn”).

Typically, in sales and marketing models we have to decide when we target:
1. Do we have a conservative model that is relatively accurate in predicting purchases or churn, but misses a lot of customers because it is so conservative? From a marketing expense perspective this is efficient.
2. Do we tune more aggressively and cast a wider net, but target a lot of prospects or customers that will not purchase or churn (false positives)? This is less cost-efficient but will uncover more absolute revenue or churn “by knocking on more doors.”
In my experience, I have always leaned towards the more aggressive model (within reason), and sacrifice precision to hit more potential purchasers/churners.

Looking at the confusion matrix below for my baseline (aggressive) model, it is great at capturing churn within a 30 day window (Recall = 0.82 means that it captures 82% of the customers who churned) but at a high cost of false positives (Precision = 0.50 which means that any marketing or sales effort will be inefficient because it will cast a very wide net). Further tuning improved precision, but at the cost of rejecting a lot of churning customers who had lower probability scores based on the available data, so I would not go with the conservative model or would continue to tune.

In short: a false positive will increase marketing or discounting, while a false negative will lose a customer.

In the high-risk customer base, we have two groups:
1. High risk and high monetary value customers that are on month-to-month contracts that should be encouraged to sign long term contracts.
2. High risk and lower monetary value customers that are locked into one- to two-year contracts that are targets for long term retention and customer satisfaction programs.
For some more input on sales and marketing program design, XGBoost also produces a list of the features (variables below) that are most influential in the model, which can be used to test tactical adjustments such as discounting for two-year contracts and content rebalancing.

The Time-based Question: When will they leave? (Lifelines)

While the XGBoost classifier was able to tell us “This customer is at-risk” a survival model can tell us “This customer has a 60% chance of making it a year” so that a marketer can time retention efforts. To find out when a customer will churn, we need a model that is designed to predict the timing of events. Miller finds that “a good example of a duration or survival model in marketing is customer lifetime estimation” (Miller, 2015). These are generally categorized as Survival Models:

“… medical researchers are often interested in the effects of certain drugs on the timing of death (or recovery) among a sample of patients. In fact, these statistical models are known most as survival models because they are used often by biostatisticians, epidemiologists, and other researchers to study the time between diagnosis and death. … social and behavioral scientists have adopted these models for a variety of purposes.”

Hoffmann, J. P. (2016)

In Python there is a package called Lifelines (lifelines import KaplanMeierFitter) which I used for this task. The survival probability curve by contract type highlights the value of two-year contracts.

Financial Impact

By combining two churn modeling approaches sales and marketing can evolve from a reactive strategy to proactively developing retention programs that improve financial performance. An aggressive XGBoost model gives the business the ability to identify high-risk/ high-monetary value customers in advance, while Survival Analysis improves marketing effectiveness by guiding the timing of retention and customer satisfaction campaigns.

The addition of churned customer profiling can be used to develop relevant marketing communications and pricing strategies.

Citations:

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785

Davidson-Pilon, C. (2023). lifelines: survival analysis in Python (Version 0.27.8) [Software]. Zenodo. https://doi.org/10.5281/zenodo.8259706

Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

IBM Sample Data Sets. (n.d.). Telco Customer Churn: Focused customer retention programs. Retrieved from https://community.ibm.com/community/user/businessanalytics/viewdocument/telco-customer-churn

Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.
The Financial Side of Marketing: Beyond RFM to Predictive CLV

December 28, 2025
My previous articles on RFM analysis and propensity-to-buy modeling explored stand-alone frameworks for segmentation and targeting. However, these models also serve as the foundational inputs for the ultimate metric in account prioritization: Customer Lifetime Value (CLV).

While I have typically used CLV with subscription-based B2C models or B2B IT hardware (storage, routers, and switches), I believe that the principles are identical for online retail. By analyzing historical purchasing patterns through RFM (Recency, Frequency, and Monetary Value), we establish a behavioral baseline. To effectively prioritize sales coverage and marketing spend, we can then project the customer’s expected life.

Leveraging RFM we have the CLV formula:

CLV = (Frequency * Monetary) x Expected Lifespan

By shifting from standard propensity modeling (the simple probability of purchase) to survival analysis (the timing and likelihood of the next purchase), we can distinguish between historical high-spenders and customers with the highest probability of future revenue. This allows us to treat customer acquisition and retention as a strategic financial investment—ensuring that projected cash in-flows exceed the Customer Acquisition Cost (CAC). As Miller (2015) notes:

“Customer lifetime value analysis draws on concepts from financial management. We evaluate investments in terms of cash in-flows and out-flows. Before we pursue a prospective customer, we want to know that [sales] will exceed [costs]”

From this, we can derive Net CLV (CLV – CAC) or the Financial Efficiency Ratio (CLV:CAC). For this exercise, I have targeted the 3:1 KPI (popularized by venture capitalist David Skok) as the benchmark for a healthy account.

Data Overview

The Online Retail Data Set from the UC Irvine Machine Learning Repository is a publicly available real-life dataset used by students for customer analytics, RFM (Recency, Frequency, Monetary) modeling, and market basket analysis.

Dataset Description

This is a transactional dataset containing all transactions occurring between December 1, 2010, and December 9, 2011, for a UK-based, non-store online retail company (Chen, D., 2012).
- Business Nature: The company primarily sells unique all-occasion giftware.
- Customer Base: While many are individuals, a significant portion are wholesalers (which accounts for the extreme outliers in spending and quantity).
- Scale: 541,909 transactions and 8 attributes.
Here is a sample from that dataset:

Exploratory Data Analysis (EDA) and Data Cleaning

Visual inspection revealed several data quality issues. I filtered the dataset to focus strictly on product purchases, removing entries related to “bad data”: returns, exchanges, and duplicative descriptions. Examples of the values removed are below (there were many more):

filter_values = [

‘add stock to allocate online orders’, ‘adjust’, ‘adjustment’, ‘alan hodge cant mamage this section’, ‘allocate stock for dotcom orders ta’, ‘barcode problem’, ‘broken’, ‘came coded as 20713’, “can’t find”, ‘check’, ‘check?’, ‘code mix up? 84930’, ‘counted’, ‘cracked’, ‘crushed’, ‘crushed boxes’, ‘crushed ctn’, ‘damaged’, ‘damaged stock’, ‘damages’, ‘damages wax’, ‘damages/credits from ASOS’, ‘damages/display’, ‘damages/dotcom?’, ‘damages/showroom etc’, ‘damages?’,

Data Distributions: Outliers and Skewness

Histograms of the RFM data show that none of the three components are normal (Gaussian) distributions. We see a high concentration of customers who bought recently and tapered off, a lot of customers who were one-time purchasers, and a right-skew in monetary value due to high spending customers.

Model Selection and Development: Heuristic vs. Statistical Modeling

The Manual “Heuristic” Model This is a “quick and dirty” approach using business rules of thumb. I utilized a step function to assign the probability of a customer remaining “alive” based on their last purchase:
- 0–30 days since last purchase: 95% chance.
- 31–180 days since last purchase: 80% chance.
- Over 180 days: 10% chance.
The Lifetimes Statistical Model (BG/NBD) Next, I applied the BG/NBD (Beta-Geometric/Negative Binomial Distribution) model via the Lifetimes library (Davidson-Pilon, C., 2021). Unlike the heuristic, this model analyzes the individual cadence of every customer. If a customer who typically buys every 10 days hasn’t purchased in 30, the model flags them as “at risk” much faster than a once-a-year purchaser.

I’ll return to the choice of Heuristic business rules based models vs. statistical Survival Analysis in a later article on customer churn, where this comparison is also relevant.

Results and Model Accuracy

The Lifetimes model proved to be more conservative than the heuristic approach. It yielded a Mean Absolute Error (MAE) of 0.98, indicating that the model’s prediction was off by less than one transaction per customer over a three-month holdout window.

While I utilized a 12-month window to account for retail seasonality and maintain precision, this horizon is flexible; for technology hardware with longer refresh cycles, a 3–5 year window is often more appropriate.

Model Performance Metrics: The following metrics compare the model’s predictions against actual customer behavior during the holdout period:
- MAE: 0.9846
- Actual Avg. Purchases (Holdout): 1.4462
- Predicted Avg. Purchases (Holdout): 1.0706
Financial Output & Targeting Logic: By applying the model to our customer base, we derived the following financial benchmarks:
- Mean CLV: $3,816.56
- Mean Opportunity Ratio: 1.30
Using an average Customer Acquisition Cost (CAC) benchmark of $400 (a conservative estimate based on the consumer electronics industry) and the 3:1 efficiency ratio, we establish a CLV threshold of $1,200. Any customer with a predicted CLV below this mark represents a net loss, whereas those above it are prioritized for sales coverage.

Putting this all together we have a targeting grid for planning and targeting high value customers:

Conclusion

Enterprise-level data is often more complex and so more work can be involved in tuning the CLV model, however the core approach here remains the same for strategic planning, account-based planning and target marketing. Understanding the future value of a customer (vs. only historical spend) is one of the most important capabilities in a marketer’s toolkit.

Citations:

Chen, D. (2012). Online Retail [Dataset]. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Online+Retail

Davidson-Pilon, C. (2021). Lifetimes: Measuring customer lifetime value in Python (Version 0.11.3) [Computer software]. https://github.com/CamDavidsonPilon/lifetimes

Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

Marianantoni, A. (2025, March 19). CLV to CAC Ratio: Guide for Startups 2025. M Accelerator. https://maccelerator.la/en/blog/entrepreneurship/clv-to-cac-ratio-guide-for-startups-2025/

Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

Shopify Staff. (2024, July 29). Customer acquisition costs by industry (2025). Shopify Blog. https://www.shopify.com/blog/customer-acquisition-cost-by-industry#4

Skok, D. (2010, February 17). SaaS metrics 2.0 – A guide to measuring and improving what matters. For Entrepreneurs. https://www.forentrepreneurs.com/saas-metrics-2/
The Demand Generation Workhorse

December 21, 2025
The “workhorse” of modern demand generation is a family of binary classification models. These models are designed to predict a specific response, such as “…whether a customer buys, whether a customer stays with the company or leaves to buy from another company, and whether the customer recommends a company’s products to another customer” (Miller, 2015). Bruce Ratner (2017), in his work on machine learning and data mining, calls this approach the “workhorse of response modeling.”

There are many statistical and machine-learning techniques for building Propensity-to-Buy (P2B) models—whether predicting churn, response rates, or identifying CRM opportunities that will convert to sales. The concept has been around since the early days of direct mail (1980s), when direct mail marketers relied on Logistic Regression (still a powerful technique). Now, fueled by big data and high-performance computing, one of the most popular and effective techniques is called eXtreme Gradient Boosting (XGBoost) and I am using that for my examination of how it can be applied in marketing.

Propensity-to-buy has been the first technique we have used in any company I’ve had the pleasure of working in; it has evolved to become the foundation of modern digital marketing. Technically efficient and scalable, once the base code is set, a P2B model can be trained to predict different targets with limited recalibration.

Key Applications of P2B:
1. Purchase Likelihood for a Product: Identifying which customers and prospects are most likely to purchase specific products. This applies to both new product launches and up-sell/cross-sell for existing products.
2. Churn Risk: Predicting which companies or individual contacts are at risk of leaving to competitors.
3. Lead Scoring and Conversion Probability: Forecasting which CRM opportunities are most likely to convert to “Closed-Won” to optimize the marketing and sales pipeline.
4. CLV Integration: Generating the probability scores required to calculate a Customer Lifetime Value (CLV) model.
5. Campaign Response: Determining which contacts are most likely to engage with or respond to a specific marketing program.
6. Customer Valuation: Identifying which customers will be the most valuable overall across their entire purchase history. For example, comparing a company’s Total IT Spend with P2B.
7. Strategic Profiling: Isolating a “high-propensity” population to build ideal demographic and firmographic profiles for future targeting.
For the following example, XGBoost was perfect (an open-source machine learning ensemble method that builds multiple decision trees to correct previous errors). First introduced to me six years ago by data scientist Fuqiang Shi, it has become the gold standard for structured data. XGBoost gained global fame around 2014–2016 for dominating Kaggle competitions (often outperforming popular methods like Random Forest, Support Vector Machine, Bayesian Classifier, etc.). That said, in practice a data scientist should test several methods to find the best model by comparing performance metrics.

The Data

To demonstrate how to build and score a model, I utilized the Bank Marketing Dataset from the well-known UC Irvine Machine Learning Repository. This dataset consists of 45,211 rows and 17 columns, representing a real-world scenario of a direct telemarketing campaign from a Portuguese banking institution (2008-2010). Here are the first five rows:

UCI Machine Learning Repository: Bank Dataset. University of California, Irvine.

Model Development: A High-Level Overview

Predictive Performance

The model achieved a predictive performance of 81% (ROC AUC), which was within range for a real-world application in my experience (although I have seen between 65% for prospecting to 90% for customer models). While I initially hard-coded the parameters, I followed up with a grid search (GridSearchCV) to ensure optimization. Since both approaches achieved the same predictive performance, I stopped tuning at that point. [Environment: Python Jupyter Notebook (Anaconda)].

Using Model for Decision Support: Rebalancing Marketing Frequency.

From a targeting perspective, we now have a list of customers and can develop a profile based on the characteristics of that population – either by analyzing the segment directly or by examining the most influential variables used in the P2B model.

The model reveals a ‘tipping point’ in telemarketing outreach. The campaign variable shows that as the number of contacts increases (red dots moving left in the SHAP plot), the propensity to buy drops. This suggests the bank is currently over-indexing on low-potential customers, essentially ‘inundating’ them—while missing the opportunity to focus that energy on high-potential segments that require lower frequency to convert.

The Negative Correlation: In the summary plot, the high values for campaign/ telemarketing calls (dark red dots) shift to the left of the center line. This indicates that a high number of contacts during a single campaign actually decreases the probability of a purchase.

Further examination (and data) is required here to determine whether messaging, media mix, brand awareness or other factors also come into play during execution, but this is definitely a red flag since typically effective reach is around ~3X+.

SHAP Insights:

Based on the SHAP (SHapley Additive exPlanations) visualizations provided, we can determine exactly which “levers” drive purchase propensity. This is the “explainability” phase that translates a black-box model like XGBoost into actionable business insights.
- The Bar Chart (Left) – Importance. It shows variables prioritized by the model (e.g., “Cellular contact is the most important piece of information”). One caveat here: this is an older dataset from the ML Library and for illustration only; interpret with caution!
- The Summary Plot (Right) – Direction. Shows how those variables change the outcome (e.g., “Being contacted via cell phone increases propensity, while a housing loan decreases it”).
Summary of Feature Influence

The model shows that a mix of communication channels, past behavior, and economic stability are the primary drivers of a “Yes” prediction.
1. Primary Driver: Communication Method (contact_cellular). The most influential variable and the strongest predictor of a purchase.
2. The “Momentum” Effect (poutcome_success). Success in previous marketing campaigns is a powerful indicator of future success. This validates my previous blog’s assertion that RFM (Recency/Frequency/Monetary Value) are highly influential in P2B models.
3. Financial Stability (housing no and balance). Customers without housing loans (housing_no) show a higher propensity to purchase. Further, higher bank balance levels correlate positively with conversion.
4. Timing and Outreach (day_of_week, month_jun, campaign). The specific timing of the outreach (months like June or March) influences the model, though to a lesser degree than the contact method. So, a telemarketing group and marketing programs should be adjusted for seasonality.
Conclusion

I’ll be returning to propensity-to-buy in future articles, since like RFM analysis this technique is foundational to successful quantitative marketing. Both techniques trace their origins to the 1980s as statistical tools for direct mailers and have evolved over time to become the foundational “workhorse” of modern digital marketing.

Citations

Miller, Thomas W. Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. FT Press, 2015.

Moro, S., Rita, P., & Cortez, P. (2014). Bank Marketing [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.

Ratner, Bruce. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data. 3rd ed., CRC Press, 2017.
My Favorite Segmentation Scheme

December 10, 2025
As I recall from school, the idea of optimizing a portfolio of B2B accounts goes back as far as Boston Consulting Group’s growth/share matrix and the initial list scoring techniques of direct marketers — and although the technology we use to segment customers has changed (i.e. Machine Learning) the underlying principles are the same: segment your B2C customers or B2B account base based on potential to optimize your go-to-market.

The Segmentation Challenge

The journey to effective segmentation often faces two extremes:

The Black Box (Technical): Techniques like K-means clustering and principal components analysis (PCA) are powerful ML tools, but they often require massive datasets and can lack interpretability. Justifying a segmentation strategy by explaining eigenvalues or complex algorithms to a business leadership team can create friction and slow adoption.

The Gray Area (Subjective): Conversely, creating detailed customer personas is highly intuitive but often subjective. Because they are based on opinion or aspiration, the segments can be endlessly debated, leading to unclear targeting and weak tactical execution.

The Power and Simplicity of RFM:

In my experience, one of the most elegant, simple and powerful segmentation schemes is based on RFM scoring, which allows the marketer to:
- Maximize ROI: Strategically allocate sales and marketing resources to the highest-potential customers.
- Drive Growth: Target active, high-value accounts for cross-sell and up-sell campaigns to increase purchase frequency.
- Minimize Churn: Increase retention and proactively intervene with the most valuable customers who show signs of drifting away.
- Improve Acquisition: Identify and target “lookalike” prospects who share the profiles of your best customers.
- Inform Value Metrics: Serve as a core input for calculating and improving Customer Lifetime Value (CLV).
In its simplest form, RFM (Recency, Frequency and Monetary Value) only requires customer purchase transaction data and is both statistically significant in predicting future purchases (on its own and when nested within other purchase-likelihood models) and easily understood by the business.

As Thomas Miller describes in his textbook, Marketing Data Science:

“Direct and database marketers build models for predicting who will buy in response to marketing promotions. Traditional models, or what are known as RFM models, consider the recency (date of most recent purchase), frequency (number of purchases), and monetary value (sales revenue) of previous purchases. More complicated models utilize a variety of explanatory variables relating to recency, frequency, monetary value, and customer demographics.”

Miller, Thomas W. Marketing Data Science. Pearson Education LTD., 2016.

Methodology: From Data to Actionable Segments

For each of the three categories, the customer is given a score on a scale of 1 – 5 where one is the lowest score and five is the highest or best score. If a scored list targeting the highest potential account is desired, add the scores together to get a total score between one and 15.

By assigning these scores, you effectively break your population into quintiles (20% groups) for each dimension. For simple prioritization, you can combine the scores for a total RFM score between 3 and 15, creating a ranked list of highest-potential accounts.

Patterns in the purchase data can be used as the foundation of personas (additional attributes can be layered on for profiling) and I have found that the scores have always been influential (statistically significant) as inputs into purchase likelihood ML models for scoring accounts. The following are some segments that are typically derived from the scores in the RFM Scores table (above):
- Champions have the highest score in all three categories (RFM) and highest total scores.
- New or Highest Potential Customers have high recency and monetary value scores, but have just started purchasing, and so their frequency score will be in the lower quintiles.
- Past or Churned customers have high monetary and frequency scores, but very low recency scores (i.e. bottom 20%).
- Additional segments can be created for average customers (to benchmark), new prospects, or totally lost customers.
I’ll return to RFM segmentation later for use in precision targeting, segmentation for relevant marketing communication and media mix optimization. As I mentioned at the beginning, this technique was pioneered by early direct mail marketers using spreadsheet analysis, and although we can build Python, SQL and R scripts to run it now, the fundamentals remain the same – and if the reader wants to investigate further just Google “RFM reference material” and a lot of material is widely available in the form of academic papers and videos.

Here are a few references:

https://www.investopedia.com/terms/r/rfm-recency-frequency-monetary-value.asp#:~:text=The%20recency%2C%20frequency%2C%20monetary%20value%20(RFM)%20model%20is,%2C%20the%20better%20the%20result).

https://mailchimp.com/resources/rfm-analysis/#:~:text=RFM%2C%20also%20known%20as%20RFM,monetary%20value%20of%20a%20transaction.