The Marketing Science Signal

Marketing Data Science

Mike is a leader in the field of Marketing Data Science & Operational Strategy with 20+ years leading global Data Science, AI/ML, and Marketing Analytics teams at Dell Technologies, Cisco, Pure Storage, Hitachi Vantara and Hearst Media. He is also an Accredited Professional StatisticianTM with the American Statistical Association.
  • Solving the Discovery Problem: Collaborative Filtering and the Science of “Look-Alike” Behavior

    Why Collaborative Filtering (CF) Matters

    In modern B2B or B2C marketing, the “Discovery Problem” is the primary barrier to growth. Collaborative Filtering addresses this by:

    • Driving Growth: Unlocking cross-sell and up-sell opportunities by identifying hidden preferences.
    • Managing the “Cold Start”: Effectively engaging new entry-point customers and prospects by leveraging “look-alike” behavior.
    • Precision for “Grey Sheep”: Navigating the complexity of low-frequency customers who purchase niche products.
    • Strategic Segmentation: Providing a mathematical foundation for persona development that goes beyond basic demographics.

    This is the third and final installment in my series on recommender systems, presented in the order I first applied them in B2B marketing. Fourteen years ago, my team began using Association Rules for product recommendations and media mix optimization — a technique that learns from individual “baskets” or bundles. We then progressed to Markov Chains, which decode a single customer’s sequence of actions. In this article, we move to Collaborative Filtering: a method that looks across the entire ecosystem of customers and items to identify “look-alike” patterns.

    It is a powerful tool for engaging prospects and solving the “cold start” and “grey sheep” problems that arise when historical data is scarce. Recently Jidan Duan built a recommendation engine using CF with Matrix Factorization that was effective in B2B digital hardware marketing.  In practice, keeping a Human In The Loop to inspect recommendations to ensure that the products recommended are optimal in terms of margin, future products vs. legacy products, or special strategic initiatives is critical and this can be accomplished through an additional layer of business rules.

    As we will see in this article, the techniques can be used together – as well as independently – when needed to solve a particular marketing problem.

    ArticleAlgorithmUnit of AnalysisQuestion
    1Association RulesBaskets / sessionsWhat travels together?
    2Markov ChainSingle customer’s sequenceWhat comes next?
    3Collaborative FilteringAll customers × all itemsWho buys like me?

    The Dataset: High-Volume Retail Reality

    To demonstrate these concepts, I am evaluating a subset of the H&M Personalized Fashion Recommendations dataset. In its raw form, the training data consists of over 31 million transactions.

    Here is a brief description of H&M Group and the dataset:

    H&M Group is a family of brands and businesses with 53 online markets and approximately 4,850 stores. Our online store offers shoppers an extensive selection of products to browse through. But with too many choices, customers might not quickly find what interests them or what they are looking for, and ultimately, they might not make a purchase. To enhance the shopping experience, product recommendations are key. More importantly, helping customers make the right choices also has a positive implications for sustainability, as it reduces returns, and thereby minimizes emissions from transportation… the purchase history of customers across time, along with supporting metadata.”


    Summary Technical Overview of Process

    Data Preparation: Loading raw transactions and quality filter

    At 35M rows, simply loading three datasets (customers, articles and transactions) and attempting to merge the data triggered a Memory Error.  I had to have an approach that would allow me to sample the data to get all the patterns using less memory. I removed columns, reformatted string IDs, removed customers with <3 purchases and filtered out all but the last six months of fashion data to get to the following:

    Loading TransactionsRecords
    Raw rows loaded  2,788,325
    After quality filter  2,576,695
     Unique customers    300,006
     Unique clothing articles     37,878

    Data Preparation: Exploratory Data Analysis

    The Long Tail and Marketing Efficiency

    If 20% of products drive 80% of H&M revenue (Pareto Principle), a top sales ranking could handle most purchases, so this is no surprise. Collaborative Filtering shines in the long tail — the 80% of items that collectively drive 20%+ of sales but are invisible to simple ranking.  In the data below:

    • 17.4% of unique items drive 80% of sales.
    • The remaining 82.6% is the long tail CF was built for.

    H&M has a catalog depth of 38,000 items, but their current momentum is driven by just 17%. To grow, they don’t need more products, they need better cross- / up-sell of existing products. A traditional merchandising approach would focus on best-sellers.  By employing collaborative filtering the 31,500 items in the long-tail would be marketed to the suitable niche target audiences.

    A retention campaign could be targeted at the 30-purchase loyal customers to proactively prevent churn.

    Matrix Sparsity: The Need for “Latent” Logic

    With thousands of products and over 100,000 users, the resulting Utility Matrix (where rows are users and columns are items) is 99.9% empty space. This is “Sparsity.” Simple statistics can’t fill these gaps. We need an algorithm that can “see” the hidden—or latent—factors that connect a customer’s preference for a specific blouse to their likely interest in a particular style of sweater.

    Record Category# of Records/Cells
    Users300,006
    Items             37,878
    Matrix (300,006 x 37,878)     11,363,627,268
    Filled (0.0227%)          2,576,695
    Sparsity (99.9773%)     11,361,050,573

    Visualizing the Utility Matrix

    A Spy Plot can be used to understand the data.  By sorting the matrix by activity (most active users at the top, most popular items on the left), we can map our strategic challenges.

    Visualizing Sparsity can help us build targeting segments for tactical use in messaging and promotion to improve sales through cross-/ up-sell, etc.

    The Four Quadrants of Interaction

    • Core Business (top left) are the “Champion” frequent shoppers of H&M popular items.  Association Rules from my earlier article could be applied here to increase their market basket of purchases and ensure retention.
    • Loyal Explorers (top right) heavily purchase the least popular items in the long tail.  Collaborative filtering can help in recommending low-volume niche products which appeal to look-alike customers.
    • Entry Point (bottom left) H&M shoppers may be new customers who have come in to purchase popular items. This is a Cold Start scenario where little data exists on the customer beyond popular product purchase and more of those should be served up – until more information is available.
    • Grey Sheep (bottom right) are occasional H&M shoppers who could be purchasing as a gift for someone else, or seasonally shopping.  Here again, there is little historical signal to help with recommendations.

    Selecting Latent Factors

    Output: Hidden Dimensions: Latent Factor Loadings by Product Category

    Statistically (using SVD or Singular Value Decomposition) we can separate the H&M customer base into groups or factors based on their product purchases, which allows us to make recommendations for each customer that are aligned with the typical purchases compelling to other customers in that grouping.

    Looking at the patterns in the factor loadings, we can develop some personas based on the types of products purchased.  This can become the starting place for general customer segmentation by adding demographic, intent and other third party syndicated data as well as internal financial data.

    FactorPrimary ProductMarketing LabelStrategic Insight
    F1All CategoriesGeneral PopularityHigh-volume “Must-haves” across all segments. (Retention, x-/up-sell marketing).
    F5/F9T-shirts, Shorts, Vest topsActive SummerHigh potential for seasonal x-sell (sunscreen/sunglasses).
    F2Underwear, Vest topsFoundation CoreHigh-loyalty, repeat-purchase items (Retention focus).
    F6Blouses, Sweaters, OuterwearProfessional“Back-to-Work” or “Evening” shopper persona.
    F8Foundation WardrobeCasualX-sell complements of shorts, shirts, trousers, etc.

    Summary

    Once tuned and put into place, collaborative filtering can be a powerful tool for developing marketing output beyond product recommendations, which is an area it shines in because of its ability to deal with Grey Sheep and Cold Starts. It can also be combined with other techniques to develop a marketing segmentation scheme that is an enhanced version of RFM Segmentation as well as combine CF data with demographics and intent data to develop holistic customer personas.

    I’ll come back to recommenders as I explore more applications of Agentic AI and marketing strategy.

  • MIKE THE ROBOT: SCALING EXPERTISE INTO THE SINGULARITY

    In a previous article, I explored the Analytics Shoot-Out: Mike vs. Agentic AI. Today, I’m shifting the lens from competition with AI to integration of AI to provide a scalable extension of my personal marketing methodology.

    Perhaps the most profound realization in building this agent is the ability to have a specialized ‘staff’ at one’s disposal—an editor, a researcher, a designer and a strategic consultant—all with access to a specific methodology and working at high velocity. That is the Singularity in practice.

    To be clear, I am defining this as a ‘Functional Singularity’—the point where a specific methodology is successfully codified into an agentic system, finally decoupling professional impact from the linear constraints of biological hours. This is not a claim of ‘better’ logic or a machine entering a recursive self-improvement loop; it is about a specialized engine delivering assessments grounded in a consistent, mathematically sound approach to quantitative marketing.


    Why Implementing An AI Agent Matters

    Building an extensive analytical portfolio is one thing; making it actionable is another. Transforming a library of insights (as documented in my analytics portfolio on The Marketing Science Signal) into a functional enterprise asset is the core value proposition of the Mike Agent. This shift moves an organization away from business intuition-based decision making and toward a Reasoning Engine that preserves a unique methodology while scaling into operations.

    • Methodology Preservation: It prevents “Methodology Dilution” by weighing all AI responses against specific frameworks found on my website (The Marketing Science Signal).
    • Democratized Expertise: It has the potential to make specialized marketing science accessible to every department—Sales, Finance, Marketing, and Ops—via a simple natural language interface.
    • Operational Velocity: It automates the transition from identifying a business problem to generating a structured, technical action plan.
    • Probabilistic Accuracy: It replaces “gut feeling” and qualitative intuition with calculated Markov transition and absorption probabilities.

    The Cyborg Evolution

    I find Yuval Noah Harari to be the most thought-provoking historian and philosopher alive today. He frequently argues that the biological era of human evolution is nearing its end, to be replaced by a technological one.

    “I think it is very likely that within a century or two, Homo sapiens as we know them will disappear. We will use technology to upgrade ourselves… into something different… the next evolution will be to become cyborgs. We are already becoming cyborgs. If ‘cyborg’ means a being that combines organic and inorganic parts, then we are already there. Your smartphone is not just a gadget; it’s part of you.”

    Yuval Noah Harari, Homo Deus: A Brief History of Tomorrow (2016)

    The “Singularity” won’t be a hostile machine takeover; it will be the point where man and machine merge into a single, unified entity. Scaling “best human thinking” into an AI agent is effectively the process of moving from a blog to a Reasoning Engine that applies a specific methodology to live data.


    2. Technical Architecture: How the Agent Thinks

    To scale this methodology across an organization, the Mike Agent operates on a “Consulting Triad” — three inputs that work together every time a question is asked:

    • My Methodology: At the start of every session, the agent reads The Marketing Science Signal website in real time, pulling my published content directly into the conversation as its primary context. This grounds every answer in my specific approach rather than generic AI advice.
    • The Business Context: The structured dynamic discovery questions capture the user’s immediate situation — their CRM state, channel gaps, churn signals, and forecasting blind spots — so the agent responds to the actual problem, not a hypothetical one.
    • The AI Engine: Gemini serves as the language and reasoning engine, synthesizing my site content against the user’s answers to produce a structured strategic assessment.

    This architecture keeps my intellectual property as the dominant input. The AI does not replace my thinking — it applies my thinking at scale.

    Preventing Methodology Dilution: Because my published site content is loaded into every session as the agent’s primary context, responses are grounded in my specific frameworks rather than the AI’s general knowledge. A user asking about customer retention will receive advice rooted in my churn modeling frameworks — for example, the behavioral finding that customers often leave because they never fully adopted the product, not because a competitor was cheaper.

    The Path to Enterprise Grade: The current implementation is a working prototype that demonstrates the core logic. A full enterprise deployment would extend this foundation by connecting to live data lakes, telemetry, CRM and marketing automation systems — enabling the agent to move from strategic advice to running actual calculations against real pipeline data.


    3. Deployment: The Department-Aware Consultant

    For the agent to deliver maximum value, it must speak the language of the person asking the questions. The current prototype uses a universal discovery flow that applies across all departments. Specifically, the “Mike the Robot” discovery session utilizes a structured yet flexible inquiry process where questions are both categorized and dynamic based on user input. The the same five question categories surface the critical gaps regardless of who is asking. In a full enterprise deployment, the system would identify the stakeholder and tailor its response to their specific business problem.

    DepartmentUse CaseThe “Mike” Angle
    SalesLead PrioritizationWeighted lead scoring based on historical conversion probability
    FinanceBudget AllocationRemoval Effect simulation to identify truly dead-weight spend
    MarketingCampaign PlanningState-Transition Matrices to predict lead leakage before it happens
    OpsData HygieneCRISP-DM framework applied to systematically flag dirty CRM data

    The department-routing capability represents the next phase of deployment, where a single agent (or multiple agents) serves the entire organization through one natural language interface.


    4. Implementation Roadmap

    Scaling expertise into an enterprise asset would be a phased process designed to ensure my methodology remains the dominant logic at every stage:

    • Phase 1 — Library (Operational Now): The agent reads and ingests The Marketing Science Signal in real time at the start of every session, establishing my published frameworks as its primary context. This phase is demonstrated in the accompanying prototype.
    • Phase 2 — Personality (Methodology Encoding): Refining the system prompt to explicitly prioritize Markov-based logic and my proprietary frameworks, sharpening the “Mike bias” beyond the current working baseline.
    • Phase 3 — Execution (Live Data Integration): Connecting to live CRM and marketing automation APIs so the agent moves from giving advice to running calculations against actual enterprise data, such as pipeline data.
    • Phase 4 — Access (Enterprise Deployment): Deploying via Slack or Microsoft Teams, making the agent available to every department through the tools they already use — no technical onboarding required.

    5. The “Mike Agent” in Action: A Real Consultation Session

    The following is a live interaction from the enhanced working prototype, run against a real B2B technology scenario. In this version, the agent scraped 12 articles from The Marketing Science Signal — more than double the six used in the earlier example — and generated five contextual discovery questions, each one adapting based on the previous answer, before delivering the Master Strategic Assessment below. Q1 is fixed — it opens every session identically. Questions 2 through 5 are written by the agent in real time, using the full conversation history as context. The exact wording above is unique to this session.

    The Core Objective “What is the core marketing objective today?”

    User: “Reduce customer churn.”

    The Discovery Session

    The agent’s dynamic questioning engine generated five contextually aware questions, each probing deeper based on the previous answer. This is not a fixed survey — it is a structured conversation.

    Q1: “Data Readiness: What is the current state of your CRM and historical sales data? (platforms used, years of history, data quality issues)”

    User: We use Salesforce.com, Marketo, and Adobe products with five years of history. Data completeness and matching across systems are the main issues.

    Q2: “Given the matching issues across your systems, how exactly are customer interactions and retention efforts currently handed off and tracked across Account Managers, Telemarketing, and Channel Partners?”

    User: All channels operate independently. Marketing targets at-risk accounts with retention campaigns, and those accounts are flagged in Salesforce — but sales has no guidance around where to execute competitive take-outs, pricing actions, or cross-sell campaigns.

    Q3: “When those flagged accounts ultimately churn, what are the top 2-3 reasons they cite for leaving, and how well do the reasons logged in Salesforce match what your Account Managers are actually hearing in the field?”

    User: The top reasons cited are competition, brand awareness, and price.

    Q4: “Since brand awareness and competition are key churn drivers, what ad platforms are you currently using to defend these accounts, and can your Marketo/Adobe stack feed them dynamic, real-time audiences of those flagged at-risk customers?”

    User: The at-risk accounts have been identified, but formal campaigns and account-specific targeting have not been developed because management does not realize the true cost of churn.

    Q5: “To help management quantify the true financial impact of this churn, how often do your sales teams provide qualitative revenue forecasts, and what is your current forecast error rate when predicting these at-risk renewals?”

    User: Forecasts are done quarterly and can be off by 5–15%. Churn is not part of the formal forecasting process.

    Master Strategic Assessment (Live Agent Output)

    The agent scraped 12 articles from The Marketing Science Signal — including articles on Markov Chains, the Hybrid Forecast, Multi-Channel Force Multiplier, Recommender Systems, and Predictive CLV — and synthesized them against the five discovery answers above. Every recommendation below is grounded in a specific published framework.


    Strategic Gap Analysis

    GapSeverityEvidenceTechnique
    The Churn Misdiagnosis (The “Price” Fallacy)8/10Sales cites competition, brand awareness, and price as primary churn drivers — relying on qualitative CRM drop-downs that mask true behavioral friction.Kaplan-Meier Survival Curves
    The Financial Blindspot9/10Management does not realize the cost of churn because it is not quantified in the language of finance, preventing budget approval for retention campaigns.Predictive Customer Lifetime Value (CLV) Modeling
    Siloed Forecasting and Execution7/10Quarterly forecasts carry a 5–15% error rate and completely ignore churn, while Marketing and Sales operate as independent units.Hybrid Forecast (XGBoost Propensity Model)

    The Churn Misdiagnosis — Severity 8/10: When sales teams attribute churn to “price” or “competition,” they are almost always reading from CRM drop-down menus filled in at close — not from behavioral data. The real question is whether customers ever fully adopted the product in the first place. Kaplan-Meier Survival Curves, built on product telemetry rather than self-reported close reasons, will identify exactly when in the customer lifecycle defection risk becomes critical — and whether it correlates with adoption milestones rather than competitive events.

    The Financial Blindspot — Severity 9/10: This is the highest-severity gap in the assessment. When management does not approve retention budget, it is almost always because churn has not been translated into their language. As detailed in my article The Financial Side of Marketing: Beyond RFM to Predictive CLV, the solution is not to ask for budget — it is to build a Predictive CLV model that quantifies the net present value of the at-risk accounts and presents churn as a revenue haircut on the forecast. Finance cannot ignore a number they helped build.

    Siloed Forecasting and Execution — Severity 7/10: A 5–15% forecast error is typical when churn is treated as an afterthought rather than a modeled input. As outlined in The Hybrid Forecast: Integrating Field Sales “Expert Opinion” with Deep Learning Ensembles, blending an XGBoost Propensity Model with the sales team’s pipeline data produces a mathematically grounded baseline that forces the sales team’s optimism to compete with real churn probabilities — rather than simply override them.


    Technical Roadmap

    PhaseActionTechniqueTimeline
    Phase 1Address Adobe/Marketo/Salesforce matching issues by establishing a unified data model and defining discrete customer states.CRISP-DM FrameworkShort-term
    Phase 2Build a state-transition matrix to calculate exact transition probabilities and map non-linear journeys to identify adoption dead ends.Markov ChainsMedium-term
    Phase 3Build a Recommender System to mathematically prescribe the next-likely-purchase for sales to pitch, deepening product adoption.Association Rules (Market Basket Analysis)Medium-term
    Phase 4Determine the true fractional contribution of Marketo emails, Adobe digital touches, and Sales calls in preventing churn.Markov Chain Removal EffectLong-term

    Phase 1 — Unify the State Space via CRISP-DM: Break down the silos between product telemetry, Marketo, and SFDC. Using the CRISP-DM framework, engineer a unified dataset where every account is assigned a behavioral State — Onboarded, Low-Adoption, Executive-Engaged, or At-Risk. This is the prerequisite for every subsequent model.

    Phase 2 — Implement a Markov Chain Attribution & Journey Model: Discard First/Last Touch. Build a State-Transition Matrix using Markov Chains to quantify how accounts move through media, partner touches, and product usage over time. Calculate Absorption Probabilities to forecast the exact likelihood of an account moving from Active to Churned based on its current state, providing the early warning signal you currently lack entirely.

    Phase 3 — Deploy the Hybrid Forecast: Transition away from pure Delphi Method forecasting. Build an XGBoost Propensity Model that scores renewal likelihood based on product telemetry and marketing engagement. Then use a Champion-Challenger Method to blend this ML baseline with the AMs’ pipeline data. This grounds the forecast in mathematical reality while preserving the operational context that sales leaders provide.

    Phase 4 — Automate Dynamic Interventions: Stop manually exporting static MQL lists. Integrate Markov state outputs directly into your ad-tech stack. When an account transitions into a “Low Adoption” state, automatically trigger a C-suite LinkedIn campaign. Use Agentic AI integrated with XGBoost outputs to generate personalized outreach to dark partner accounts before the renewal window closes.


    High-Value Counter-Intuitive Advice

    Stop fighting price wars and trying to reactivate dead accounts. Instead, force predicted churn into the financial forecast to secure the retention budget.

    Markov Rationale: Use Markov Absorption Probabilities to find exactly where customers stopped adopting the product, identifying the true friction points rather than relying on subjective sales feedback.

    Counter-Intuitive Element: Do not ask management for a retention budget. Instead, inject a mathematically sound revenue haircut directly into the quarterly forecast — and let the number make the argument for you. A CFO who helped build the model cannot dismiss its output as a marketing opinion.


    Key Observation: The expanded 12-article scrape allowed the agent to draw on the full methodology library — including Predictive CLV and Market Basket Analysis — in addition to the Markov and Hybrid Forecast frameworks used in the earlier example. The Financial Blindspot gap, rated the highest severity in this session, was surfaced by connecting the CLV article directly to the management budget problem the user described in Q4. That cross-article synthesis — connecting a financial modeling framework to a political obstacle inside the organization — is the core value of methodology encoding at scale.


    A note on scrape depth: The earlier prototype session used six articles of The Marketing Science Signal to keep context windows lean. This session ran against 12 articles (apx. 100 pages). The difference is visible in the output — the agent surfaced the CLV and Market Basket frameworks that were not available in the smaller context, and the gap analysis is sharper as a result. As the implementation roadmap matures toward a live data connection, this breadth (including future articles) will be the default, not the exception.


    6. The Code: The “Consultant” Logic

    The Agent’s Instructions (System Prompt)

    This is where the methodology encoding happens. Rather than a generic instruction, the system prompt explicitly names the frameworks the agent must apply, the churn lens it must use, and the output structure it must follow:

    The Dynamic Follow-Up Engine

    Questions 2-5 are not fixed scripts. The agent reads the full conversation history and generates each question dynamically, probing deeper into the gaps the user has already revealed:

    The Master Synthesis Loop

    The final synthesis call combines the live site scrape, full conversation history, and explicit methodology instructions into a single structured assessment (note: this excerpt is from my initial run using six articles, as noted the test that was run for this article included the full 12 article set):


    Conclusion: The Singularity is Strategic

    The future of leadership is about encoding my best thinking so it can be in every room at once. Mike the Robot is not a replacement for human expertise — it is the mechanism that scales it.


    Technical Appendix: The “Mike the Robot” Architecture

    For the analysts and engineers interested in the plumbing.

    The engine is built on a modular Python stack designed for real-time retrieval and reasoning. It doesn’t rely on a static database; it treats The Marketing Science Signal website as its live brain.

    The Tech Stack:

    • Orchestration: LiteLLM serves as the API gateway, providing a unified interface to toggle between Gemini 3.1 Pro Preview and other models for cost/latency optimization.
    • Retrieval (RAG): BeautifulSoup4 and Requests perform targeted scraping of my published articles, which are then vectorized into ChromaDB for context-injection.
    • Logic Layer: A custom CRISP-DM inspired prompt chain that forces the model to follow a specific “Discovery → Diagnosis → Roadmap” sequence rather than jumping to conclusions.
    • Environment: Developed in VS Code and deployed via a modular Python setup to ensure the “Mike-style” logic remains consistent across different sessions.

    Key Logic Constraint: The system is explicitly instructed to prioritize my documented frameworks (e.g., Markov Chains for journey mapping or Survival Analysis for churn) over generic LLM training data. If a scenario lacks a documented “Mike-style” framework, the agent is programmed to flag the gap rather than hallucinating a generic fix.

    Technical References: Python Libraries Used in the Notebook

    LibraryRole in PipelineCitation
    LiteLLMLLM API GatewayBerriAI. (2025). LiteLLM: Python SDK and proxy server (v1.81.8). github.com/BerriAI/litellm
    BeautifulSoup4Website ScrapingRichardson, L. (2024). Beautiful Soup (v4.12). crummy.com/software/BeautifulSoup
    RequestsHTTP ClientReitz, K. (2023). Requests: HTTP for Humans (v2.31). docs.python-requests.org
    ChromaDBVector Knowledge BaseTrychroma. (2024). Chroma: The AI-native open-source embedding database. trychroma.com
    Python-dotenvEnvironment / Key ManagementBertrand, S. (2024). python-dotenv (v1.0). github.com/theskumar/python-dotenv
    ipywidgetsInteractive Notebook WidgetsJupyter Team. (2024). ipywidgets: Interactive HTML widgets. ipywidgets.readthedocs.io

    The Marketing Science Signal  ·  AI & Human Intelligence  ·  mikesdatamarketing.com


  • The Future Only Depends on the Present: Markov Chains and the Customer Journey.

    Why Recommender Systems Matter

    • Map Non-Linear Journeys: Customers move between “states” (social, email, search) rather than a straight line. Use Markov Chains to visualize this web and capture micro-conversions.
    • Identify “Dead Ends”: Transition probabilities reveal where your funnel “leaks.” High churn from the intent stage indicates friction at opportunity closed/won (B2B) or pricing (B2C).
    • Optimize Marketing Mix: Use multi-touch attribution to credit awareness channels fairly. This prevents cutting essential top-of-funnel budget.
    • Increase Retention: Forecast the likelihood of customers moving from Active to Churned. This provides a window to intervene before they leave.

    Introduction

    In my first article on recommender systems entitled Recommender Systems: Market Basket Analysis & Next-Likely-Purchase in Cross-Sell, I explored Association Rules (Market Basket Analysis), which identifies which products tend to be purchased together in a single transaction. It’s a powerful “snapshot” tool and is built on deep historical data, like a lot of statistical and ML models.

    In contrast, one of the most elegant ideas in data science is the Markov property: the future depends only on the present state. It is a seemingly simple philosophy, but it has profound implications. What a customer is doing “now” is a state rich enough to predict what comes next. This is the world of state-based journeys, where Markov Chains quantify how customers move through media, lead funnels, and product purchases over time.

    Modern GTM strategies require more than a snapshot; they require a map. To optimize lead pipeline, media mix, and sequential selling, we need to quantify the customer journey over time. One of the most effective techniques for this is the Markov Chain. I was first introduced to the power of this method when a member of my team, Yexiazi (Summer) Song, utilized it to decode the complex sequences of B2B hardware and software sales for a global tech corporation.

    GTM Use Cases: Moving Beyond “Last Touch”

    Legacy attribution often fails because it looks at touchpoints in a vacuum. As RevSure AI notes, Markov Chain models offer a “full-funnel, unbiased insight” that standard models miss:

    “In complex B2B funnels, a lot happens between the first touch and the closed deal. Relying on first- or last-touch attribution misses the rich middle… This leaves marketers knowing what happened but not why—or what to do about it.”

    “Markov Chains and Next Best Action,” RevSure AI (2025)

    By using Markov Chains, we move from good business intuition to an evidence-based understanding of the Next Best Action.


    Methodology: The Power of the “Removal Effect”

    At its core, a Markov Chain is a stochastic statistical model developed by a Russian mathematician of the same name based on the theory that the probability of a system moving to a future state depends solely on the current state, not its previous state. In a marketing context, we define our State Space (S) as the various channels (Email, LinkedIn, Direct Sales) plus the terminal states: “Start,” “Conversion,” and “Churn.”

    The real “magic” lies in the Removal Effect. By mathematically “removing” a specific channel from the chain and observing how much the total probability of conversion drops, we can assign a precise weight (or value) to that channel.

    This calculation allows us to see which touchpoints are the “force multipliers” in the funnel and which are merely noise. Essentially, it says that the probability of what a customer does next is based on what they are doing now. Ultimately, the Markov Chain uses a transition matrix, where each row represents the probability of moving from the current state to the next, summing to one:


    Applying the Probabilistic Lens to the B2B SFDC Funnel

    While we often visualize the B2B funnel as a linear, gravity-fed pipe, the reality within Salesforce is far more probabilistic. In data science, we call this a stochastic process—meaning the path forward isn’t a fixed rail, but a series of possibilities influenced by where the prospect stands today. A prospect doesn’t just slide from Response to Won; they loop back for further discovery, stall in “Nurture” for six months, or skip stages entirely when a champion fast-tracks a deal.

    By modeling each SFDC stage as a “state” in a Markov process, we move from simple counting to advanced Probabilistic Attribution:

    • The Transition Matrix: We quantify the probability of moving from any stage (e.g., MQL) to any other stage (e.g., SAL or Lost). This reveals the true “leakage” points—such as a 60% drop-off between Sales Acceptance and Qualification—that a standard funnel report often masks.
    • The Removal Effect: This is the “killer app” for the B2B marketer. By statistically “removing” a stage from the chain and observing the drop in the final “Won” probability, we can calculate the exact value-add of mid-funnel activities. If removing “Stage 2: Solution Scoping” drops our total win probability by 40%, we have a mathematical mandate to invest in sales engineering and technical content.
    • Weighted Forecasting: Instead of applying a flat historical win rate to an opportunity, a Markov approach calculates the Absorption Probability. This tells us the likelihood that an account, given its current state and historical movement patterns, will eventually terminate in a “Closed/Won” state, providing a far more accurate revenue forecast for the CFO.

    The Data: Scaling to Real-World Complexity

    To demonstrate this, I’m moving away from small transactional samples to a high-resolution eCommerce behavioral dataset consisting of over 285 million user events. A sample is shown below. This clickstream data allows us to model the journey from “View” to “Cart” to “Purchase,” providing the scale necessary to see these transition probabilities in action.

    I am going to use this eCommerce dataset for my primary Markov Chain modeling exercise dealing with product sales, and at the end I am going to examine using Markov Chain for media mix optimization.

    Due to the eCommerce dataset’s size (which tested the limits of my Alienware workstation!), I implemented a chunk-based down-sampling strategy. I processed 100,000-row segments and took a random 1% sample from each to ensure a representative, non-biased subset that wouldn’t crash the Python kernel.

    High Level EDA – The Funnel Reality

    Product volume (below) follows a heavily skewed distribution, heavily weighted towards electronics and so we can anticipate that the Markov Chain results will be more reliable in the high-volume categories (as opposed to the lower volume or sparse product categories).

    Looking at the raw data, we can calculate the Transition Probabilities for these top categories — which shows if a customer is in one product category the subsequent probability of moving to another product category.  These are simply the next step in the analytical process, so interpret with caution because they are not the final Attribution Weights used for budget allocation, etc.

    Before building a Markov Chain, we need to see the States.  In this dataset those are view, cart and purchase – I’ve seen many funnels like this in B2B marketing where a high input (i.e. leads, unique visitors, responders, etc. in B2B) results in relatively few sales! That said, we see a 1.8% conversion rate, which is right at the average for consumer electronics but low for appliances and there is always room for improvement.

    Behavioral Trends: Activity over Time

    What states are most active to time interventions? This chart shows that a particular October weekend (Sat/Sun) was strongest in terms of views, as well as conversions.


    Results

    After filtering for only the top categories, I ran the Markov Chain and arrived at the following weights.  This tells the marketer which product categories lead to a final sale. Now we have the weights for funding:

    Cross-Promotion Opportunities

    This is the final step — now that we have the weighting, we can look at key categories based on contribution to the funnel as well as last touch. If it has high weight but low direct sales, it is helping mid-funnel, which would be missed with first- or last-touch attribution:

    As a marketer, I would bundle smartphone promotions with headphones, and follow-up every TV, clock and refrigerator sale with a mobile phone offering. Ideally, TVs, clocks, refrigerators and headphones can be moved to the right-hand quadrant to become high-volume, high-impact drivers through recalibrating marketing investments from the low-impact products up to the mid-funnel products.  That would yield higher conversion rates and revenue.


    Media Mix Optimization

    We can use the same approach to optimize the media mix. Here I took the UCI bank telemarketing campaign dataset from my article The Multi-Channel Force Multiplier: How Bridging Digital Nurture and Direct Outreach Triples Conversion Lift, which initially used association rules.

    Based on the dataset, the attributed conversions for each touchpoint/channel are as follows and these can be converted to weights for budget planning:

    • Cellular: ~3,484 conversions
    • Brand Awareness: ~579 conversions
    • Telemarketing: ~311 conversions
    • Email: ~218 conversions
    • New Product Launch: ~179 conversions

    Total conversions in the dataset: 5,289 (cases where y = 'yes').

    By taking the Markov Weight/Last-Touch Weight we can find the channels that are most effective in the middle of the media mix in contributing to conversion — Brand Awareness, New Product Launch, and EMail (below):


    Summary

    The primary advantage that Markov Chains have over Association Rules Mining is the ability to capture the chronological sequence of events. While Association Rules give us a powerful “snapshot” of what happens together in a single transaction, the Markov Journey requires time-stamped, sessionized data to map how one event leads to the next over time.

    The real payoff is in the Attribution Weights: the model provides a mathematical justification for budget allocation—proving, for instance, exactly why you should put 12% of your budget into TV promotions based on their mid-funnel contribution rather than just their last-click performance. I am still a professional fan of Association Rules for their simplicity and ease of explanation, but the predictive flow of a Markov Chain has revealed strategic capabilities and a level of visibility into the customer journey that a static snapshot simply cannot match.


    Key GTM Takeaways

    • Sequence Matters: Don’t just look at what was bought; look at the path taken to get there.
    • Quantify the “Middle”: Use the Removal Effect to stop guessing which mid-funnel activities actually drive revenue.
    • Probabilistic Forecasting: Move away from flat win-rates and toward absorption probabilities for a more accurate SFDC pipeline.

    References

    • Kakalejčík, L., et al. (2018): Multichannel Marketing Attribution Using Markov Chains. Journal of Applied Management and Investments.
    • Anderl, E., et al. (2014): Mapping the Customer Journey with Markov Chains.
    • RevSure AI (2025): Markov Chains and Next Best Action: The Future of Conversion Optimization.

    Python Libraries & Documentation

    The analysis in this post was conducted using the following open-source tools:

    • Anaconda & Python Jupyter Notebooks: For environment management and interactive development.
    • Pandas & NumPy: For data manipulation and matrix calculations.
    • Seaborn & Matplotlib: For generating heatmaps and scatter plots.
    • Statsmodels: For statistical modeling.
    • Kaleido: For high-resolution funnel visualizations.

  • Recommender Systems: Market Basket Analysis & Next-Likely-Purchase In Cross-Sell.

    Why Recommender Systems Matter

    • Discover Product Affinities: Identify items that frequently sell together to uncover hidden customer behaviors. Use these insights to optimize cross-selling and product bundles that reflect how people actually shop.
    • Boost Average Order Value (AOV): Use “Lift” metrics to place high-affinity products near each other in digital or physical layouts. This turns single-item purchases into multi-item baskets by simplifying the discovery of related goods.
    • Personalize Promotions: Move beyond generic discounts by offering coupons for “consequent” products based on what is already in the cart. This increases conversion rates by making offers feel tailored rather than random.
    • Optimize Inventory & Placement: Predict which products will face increased demand when a “seed” item goes on sale. Use these patterns to inform stocking levels and strategic shelf (or landing page) positioning.
    • Enhance Loyalty: By recommending the “next logical purchase” before the customer realizes they need it, you improve the user experience. This builds long-term retention by positioning your brand as an intuitive partner in their journey.

    Introduction

    In the first article of this series entitled My Favorite Segmentation Scheme (https://mikesdatamarketing.com/2025/12/10/my-favorite-segmentation-scheme/) I identified the Loyal or High-Potential Customer segment and postulated a strategic portfolio management strategy to migrate the HiPo customers to Champions through cross-sell and up-sell. Here is a data visualization for the customer base in that article below: 

    The question for a quantitative marketer in a company with multiple products is: what is the next likely purchase by these customers?  By identifying the next likely purchase we have the highest likelihood of selling and up-leveling these customers to Champions.

    Aside from an opinion-based approach, or perhaps a long-term strategy to introduce new products, the only scalable quantitative option is to use a recommender system.

    Any company that sells multiple products can benefit from a recommender system for cross-sell.  My expertise is primarily in B2B technology hardware marketing, but many of the techniques originated in B2C marketing.  My personal experience is in building recommenders using Association Rules Mining (also known as Market Basket Analysis) which I was first introduced to by Ling (Xiaoling) Huang and later Fuqiang (Kevin) Shi. 

    However, there are other ways to build recommenders, and each has its own advantages and disadvantages.  Yexiazi (Summer) Song built a recommender using Markov Chain when she was on my team years ago.  More recently, Jidan (Joanna) Duan developed a recommender using Collaborative Filtering with Matrix Factorization to reduce latency.  In my view, any of these are key for website personalization, or telemarketing to a list of high propensity accounts that are not targeted at a single product.

    For my next three articles, I am going to look at each of these three techniques, beginning with Association Rules (since that has been my “go to” technique for many years now and I am most comfortable with it).  During the course of these articles, I will do a capabilities assessment of each and compare it to the other techniques to the extent that the output and metrics are comparable.

    The Goal: To move away from “one-size-fits-all” marketing and toward high-propensity targeting that increases Average Order Value (AOV).


    Data Overview

    I returned to the Online Retail Data Set from the UC Irvine Machine Learning Repository from my article on customer lifetime value entitled The Financial Side of Marketing: Beyond RFM to Predictive CLV (https://mikesdatamarketing.com/2025/12/28/the-financial-side-of-marketing-beyond-rfm-to-predictive-clv/). This is a publicly available real-life dataset used by students for customer analytics, RFM (Recency, Frequency, Monetary) modeling, and market basket analysis. Although the values are consumer products, I have used association rules on tech hardware and software and so the technique is just as applicable to B2B as B2C.

    Dataset Description

    This is a transactional dataset containing all transactions occurring between December 1, 2010, and December 9, 2011, for a UK-based, non-store online retail company (Chen, D., 2012).

    • Business Nature: The company primarily sells unique all-occasion giftware.
    • Customer Base: While many are individuals, a significant portion are wholesalers (which accounts for the extreme outliers in spending and quantity).
    • Scale: 541,909 transactions and 8 attributes.

    Here is a sample from that dataset:

    I performed much of the same data cleaning tasks that I wrote about in the CLV article to prepare the data for modeling and exploratory data analysis. 

    Data Distributions: Outliers and Skewness

    Histograms of the RFM data show that none of the three components are normal (Gaussian) distributions.  We see a high concentration of customers who bought recently and tapered off, a lot of customers who were one-time purchasers, and a right-skew in monetary value due to high spending customers. So this is a great population for increasing purchase frequency through cross-sell using a recommender system!

    Here are the top products:

    Some seasonality around the Holidays, so this would be a good time for implementation:

    We can also look at shopping patterns, which are mid-morning into the afternoon:


    Association Rules

    In his text on marketing data science Thomas Miller (2015) describes Association Rules:

    “Association Rules Mining is another way of building recommender systems. Association rules modeling asks: What goes with what? What products are ordered or purchased together? What activities go together? What website areas are viewed together? A good way of understanding association rules is to consider their application to market basket analysis.”

    Miller (2015)

    Methodology

    I utilized the Apriori algorithm to identify frequent item sets, setting a minimum support threshold of 0.01 to ensure I wasn’t chasing statistical noise. According to Tan et al the Apriori principle states that “if an itemset is frequent, then all of its subsets must also be frequent… conversely, if an itemset is infrequent, then all of its supersets must be infrequent too.”

    When interpreting the output of the Apriori algorithm, the resulting table provides several key interest measures. While analysis typically centers on the ‘Big Three’—Support, Confidence, and Lift—the specific focus often shifts depending on the project’s objectives. A primary advantage of association rules mining is the flexibility to filter and rank these rules using various combinations of metrics, ensuring the final recommendations align with specific business needs.

    • Antecedents: These are the items already present in the customer’s purchase history.
    • Consequents: This is the item the model proposes as a recommendation.
    • Support: The percentage of all transactions that contain both the antecedent and the consequent. It measures how popular the rule is.
      • Support (A->C) = P(A union C)
    • Confidence: The probability that a customer will buy the consequent given that they have the antecedent. It measures reliability.
      • Confidence (A->C) = P(A intersection C)/P(A)
    • Leverage: This calculates the difference between the observed frequency of A and C appearing together and the frequency that would be expected if they were independent. A leverage of 0 indicates independence.
    • Conviction: This measures the degree of implication. A high conviction value means that the consequent is highly dependent on the antecedent.

    Lift is the Superior Ranking Metric

    Support-based pruning (Tan et al., 2005) is widely used.  While Confidence tells us how likely a purchase is, it can be highly misleading. If a “best-seller” (like your White Hanging Heart T-Light Holder) is bought by 50% of all customers, any antecedent will naturally have high confidence in leading to that product simply because it is popular—not because there is a meaningful relationship.

    Lift solves this by accounting for the base popularity of the consequent:

    • Lift (A->C) = [Support(A union C)]/[Support(A) x Support(C)]

    Why I rank by Lift:

    1. Filters out “Noise”: A Lift value of 1.0 means the two items are independent. Ranking by Lift ensures you aren’t just recommending your most popular items to everyone (which provides no strategic value).
    2. Identifies “Niche Bundles”: I’ve used Association Rules for media mix optimization and this was particularly important when a marketing group ran a lot of tactical programmatic programs but very few truly integrated campaigns (therefore low support) that nonetheless had high lift.  With regards to this open-sourced dataset, a high Lift (e.g., > 3.0) indicates a strong, specific relationship. For example, if the Regency Teacup leads to the Regency Cakestand with a high Lift, it proves that the purchase isn’t random—it’s a deliberate “Tea Party” bundle.

    Actionable Insight

    For business professionals, Lift represents the incremental gain. It tells the marketing team exactly which products, when placed together or promoted via email, telemarketing or on websites, will change customer behavior rather than just reflecting existing trends.

    So, now we have the top selling market baskets:

    Business Rules to Increase Precision

    A primary limitation of recommendation engines is their reliance on historical data, which restricts suggestions to items and behaviors already present in the dataset. To address this, business logic overlays or mapping tables can be introduced. These allow for the dynamic replacement of ‘End-of-Support’ items with newer alternatives, or the substitution of low-margin products with higher-margin equivalents. Furthermore, if the customer base is non-homogeneous—varying by geography, firmographics, or demographics—the data can be segmented into distinct subsets, allowing for specialized models tailored to each specific sub-segment.


    Summary

    The “Tea Party” and “Craft Kit” bundles identified above aren’t just interesting coincidences—they are actionable revenue drivers. By using Association Rules, we move beyond simply knowing what our best sellers are to understanding the context of the purchase.

    Key Takeaways for this Method

    • Cold-Start Efficiency: Unlike other models, Association Rules don’t require a deep customer history. This gives them the ability to assess new customers because they are based on transactions such as item or product co-occurrence rather than customer history.
    • Bundled-Pricing: Sales and Marketing can bundle, for example, hardware and software packages to increase sales revenue.
    • Targeting: given Apriori, if a marketer knows that a vendor has purchased products A and B, and a high lift rule is (A, B) -> C then a telemarketing campaign can focus on selling product C.
    • Incremental Lift: By ranking recommendations by Lift rather than just popularity, we ensure our marketing efforts are driving new behaviors (i.e. selling high margin or newer products) rather than just suggesting items the customer likely would have found on their own.

    While Association Rules are a powerful starting point for cross-sell, they treat every transaction as a “snapshot” in time. They don’t account for the order in which a customer explores a site or how their interests evolve over a single session.

    In a subsequent article, I will explore Markov Chains, a technique that looks at the path a customer takes, allowing us to predict the next step in their journey based on their most recent move. 


    References: Methodology & Python Packages

    Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), 487-499.

    Chen, D. (2012). Online Retail Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Online+Retail

    Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

    Raschka, S. (2018). MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. Journal of Open Source Software, 3(24), 638. https://doi.org/10.21105/joss.00638

    Tan, P. N., Steinbach, M., & Kumar, V. (2018). Introduction to Data Mining (2nd ed.). Pearson Education.

    Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021


  • Analytics Shoot-Out: Mike vs. Agentic AI

    Why Human-in-the-Loop Agentic AI Matters

    • Human vs. AI Strengths: While AI excels at scale and creative iteration, human analysts remain superior in governance and context. Recognize that AI can generate complex playbooks in seconds but still requires human oversight to avoid historical or logical errors and provide business rules that are not incorporated in the data.
    • Scale Your Execution: Use Agentic AI to automate the “last mile” of marketing—such as generating personalized subject lines and GTM strategies—tasks that are often too time-consuming for a single analyst to perform manually.
    • The Importance of Governance: AI can hallucinate or apply modern trends to old data (e.g., suggesting TikTok for a 2008 dataset). Always maintain a Human-in-the-Loop to interpret model outputs and ensure they align with business reality.
    • Precision and Speed: The ultimate competitive advantage isn’t choosing between humans or AI, but fusing a predictive foundation (like XGBoost) with an agentic execution layer (like Gemini) to achieve both accuracy and velocity.

    Introduction

    I like to think of myself as a capable analyst. I’ve been analyzing modeled output and large datasets for many years now, and my favorite part of modeling is interpreting results and telling a story, so I was naturally skeptical that an AI Agent could outperform me when it came to interpreting marketing data.

    To test this, I staged a “shoot-out.” I built an AI Agent using the Gemini 2.0 Flash LLM, integrated my existing XGBoost propensity model code, and compared its analysis of the Bank Marketing Dataset to my own findings.

    I was in for a surprise.

    It is hard to compete with an always-on AI agent in terms of speed and scale. While a human analyst might identify top leads, an AI Agent has the capability to analyze each lead individually and generate marketing recommendations tailored to their specific demographics in seconds. This level of micro-segmentation at scale is simply not practical for a single analyst.

    However, my experiment also proved that there are still massive advantages to having a “Human-In-The-Loop.”


    The Foundation: The Propensity-to-Buy Workhorse

    In a previous article, I built an ML classifier using the Bank Marketing Dataset from the UC Irvine repository (representing a real-world Portuguese bank campaign from 2008–2010). My initial human interpretation focused on the forensic view. Here is a sample of the data:

    The model is overconfident, and in the lower probabilities it is under-performing, that said as it gets to the better leads the accuracy increases, and so I am taking the top 500 leads to examine:

    In my initial article on propensity-to-buy, I had four key findings based on the most influential model features:

    • Primary Driver: Communication Method (Cellular). This was the strongest predictor of a purchase.
    • The “Momentum” Effect: Success in previous campaigns (poutcome_success) was a massive indicator of future conversion, validating the importance of RFM (Recency/Frequency/Monetary) models.
    • Financial Stability: Customers without housing loans and higher bank balances correlated positively with conversion.
    • Timing: Outreach in specific months (June/March) influenced the model, suggesting seasonal adjustments for telemarketing programs.

    The Challenger: The AI Agent

    I wanted to see if an AI Agent could run my model and interpret the findings in a way that was actually actionable for a sales team. I provided the Agent with these System Instructions:

    “You are a Senior Strategic Marketing Agent. Analyze these top leads, provide personalized pitch strategies, and reference specific rows to justify your advice.”

    First, the Agent provided a profile of the Top 500 Leads, identifying that while the demographics varied, the common thread was the high propensity score and the success of previous outcomes.

    Note: As shown in the chart above, there isn’t a direct correlation between account balance and propensity—intent is far more complex than just a bank balance.

    Micro-segmentation and The Agent’s Strategic Brief (Verbatim Output)

    Lead Profile Summary: The dataset contains a variety of demographics. Most leads own houses, don’t have loans, and prefer cellular contact. “Success” in previous campaigns is a dominant indicator.

    Strategy 1: Young to Middle-Aged Professionals (Leads 1 & 2)

    • Focus: Career advancement and financial stability. Highlight long-term planning.
    • Subject Line: “[Name], Unlock Your Management Potential with [Product/Service]”

    Strategy 2: Students (Lead 3)

    • Focus: Education and skill-building. Promote affordable solutions.
    • Subject Line: “[Name], Ace Your Studies and Future Career with [Product/Service]”

    The Tree Map Reveal

    The Agent then generated a tree map to visualize these segments. It quickly identified that Management and Administrative job titles weren’t just numerous, they represented a massive volume of the bank’s total account balance opportunity.


    Agent Detailed Data Output And Top Lead Strategies for Hyper-Targeting

    Once segmentation is developed, the agent shines at hyper-targeting — activation of target specific customers or micro-segments with recommendations and tailored content.

    The dataset contains a variety of individuals with varying demographics, including:

    • Age: Wide range, from teenagers to retirees.
    • Job: Diverse occupations, including management, students, self-employed, technicians, etc.
    • Marital Status: Mostly married or single.
    • Education: Secondary and Tertiary education are common, with a few having Primary.
    • Balance: Varies significantly.
    • Housing/Loan: Mostly own houses and don’t have loans.
    • Contact: Cellular is the primary mode of contact.
    • Poutcome: “Success” in previous campaigns is a strong indicator.
    • Propensity Score: All leads have very high propensity scores, indicating a high likelihood of conversion.

    1. Strategy for Young to Middle-Aged Professionals (e.g., Leads 1 & 2):

    • Focus: Career advancement and financial stability. Highlight products or services that aid in professional development, investment opportunities, or long-term financial planning.
    • Messaging: Emphasize efficiency, time-saving benefits, and the potential for increased income or wealth.
    • Channel: Utilize digital channels such as email, LinkedIn, and targeted online advertising.
    • Personalized Email Subject Line (Lead 1): “\[Name], Unlock Your Management Potential with \[Product/Service]” Personalized Email Subject Line (Lead 2): “\[Name], Enhance Your Financial Security with \[Product/Service]”

    2. Strategy for Students (e.g., Lead 3):

    • Focus: Education, skill-building, and prospects. Promote products/services that enhance their learning experience, provide career guidance, or offer affordable financial solutions tailored to students.
    • Messaging: Emphasize affordability, accessibility, and the potential for boosting their resume and job prospects.
    • Channel: Utilize social media platforms popular among students (e.g., Instagram, TikTok), student-focused websites, and university partnerships.
    • Personalized Email Subject Line (Lead 3): “\[Name], Ace Your Studies and Future Career with \[Product/Service]”

    AI Agent Important Considerations

    1. Success” Poutcome: Since the “poutcome” is “success” for a prior campaign for all of these individuals, emphasize the proven track record and build on their positive past experiences.
    2. A/B Testing: Continuously test different messaging and channels to optimize results within each segment.
    3. Respect Privacy: Always adhere to privacy regulations and provide an easy way for recipients to opt-out of communications.

    The Verdict: It’s a Draw (and a Win) – Human in the Loop is best!

    • The Edge: I still hold the advantage in Context, Storytelling, and Governance. When the Agent suggested TikTok to reach students using a 2008 dataset, it missed the temporal reality—TikTok didn’t exist then. Human expertise ensures data is interpreted within its historical and social truth.
    • The Scale: The AI Agent wins on Creative Iteration. It took my complex XGBoost output and, in seconds, generated three distinct GTM playbooks and 500 personalized subject lines—tasks that would take a marketing team days. For micro-segmentation at scale, the Agent is the clear victor.

    The Real Winner: It’s not Human vs. AI; it’s the organization that fuses a human-tuned predictive foundation with an Agentic Execution Layer. By moving from manual interpretation to automated hyper-targeting, we finally achieve the precision of a statistician with the instantaneous speed of AI.


    References & Further Reading

    • Moro, S., Cortez, P., & Rita, P. (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31. [The definitive study for the Bank Marketing Dataset].
    • UCI Machine Learning Repository. Bank Marketing Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
    • Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [The foundational paper for the XGBoost algorithm].
    • Google DeepMind. (2023). Gemini: A Family of Highly Capable Multimodal Models. Google Technical Report. [Context for the Agentic LLM used in the shoot-out].
    • Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). RFM and CLV: Using Iso-Value Curves for Customer Base Analysis. Journal of Marketing Research. [Supporting the Recency, Frequency, and Monetary value methodology].
    • American Statistical Association (ASA). Ethical Guidelines for Statistical Practice. [Reference for the “Human-in-the-Loop” governance and ethical AI interpretation mentioned in the verdict].

  • The Hybrid Forecast: Integrating Field Sales “Expert Opinion” with Deep Learning Ensembles

    Why Statistical and ML Forecasting Matters

    • Blend Intuition with Math: Traditional “black box” models often fail because they ignore human context. A hybrid approach integrates sales pipeline data (human intuition) with machine learning to create a forecast that leadership can actually trust.
    • Solve the Data Integrity Gap: Sparse or low-quality data doesn’t have to break your model. By focusing on outlier management and data engineering, you can transform poor quality CRM records into a reliable foundation for prediction.
    • The Champion-Challenger Method: Don’t rely on a single algorithm. Use a “shootout” between multiple methodologies to identify which model performs best for your specific business cycle and customer behavior.
    • Bridge the Trust Barrier: The primary obstacle to ML adoption is lack of visibility. Hybrid architectures provide the transparency stakeholders need by showing exactly how human inputs and statistical trends combine to drive the final number.

    Introduction

    I am always surprised when I join a company to find that the GTM and Finance functions are still relying solely on Excel spreadsheets and field sales “expert opinion” (the Delphi Method) for forecasting. This persists despite the wealth of statistical, machine-learning, and deep-learning methods available today. Often, this reliance stems from a lack of trust in “black box” technologies; if leadership doesn’t understand the path a model takes from Point A to Point B, their hesitation is understandable—especially when financial performance and board-level predictions are on the line.

    However, transitioning to modern forecasting doesn’t require abandoning qualitative insights. In fact, expert human opinion can be directly integrated into ML and statistical frameworks to improve accuracy (e.g., using field sales input to refine deal-size estimates within CRM data like Salesforce). In this article, I examine three techniques my teams and I have used ranging from traditional modeling to Deep Learning.

    On a related note, as modelers, it can be tempting to pre-select a favorite technique and assume it is the best fit for the problem. However, I was trained to always evaluate at least three distinct approaches to identify which produces the highest predictive accuracy with the lowest error.

    In this pursuit, statisticians often follow Occam’s Razor (the principle of parsimony): the philosophical rule that when competing models explain data equally well, the simplest model should be preferred. In this case, that would be a relatively simple univariate time series. However, when encountering significant unexplained variance, additional predictors—exogenous variables—are required. While a model like SARIMAX can outperform a standard ARIMA by accounting for these external factors, every additional variable risks increasing model complexity and error if not managed correctly. Similarly, while neural networks rule the world of Big Data, they can significantly underperform if the dataset is too small.


    The Models

    Rather than choosing a winner in advance, this three-model shoot-out guarantees the best possible prediction for my selected business case.

    1. SARIMA: The Statistical Foundation

    This is the traditional statistical method.  It looks strictly at the history of a single variable (i.e. Won Revenue) for prediction. It decomposes data into its past values (Autoregressive), its past errors (Moving Average), and its repeating cycles (Seasonality).

    2. SARIMAX: The Context-Aware Statistical Model

    The “X” stands for eXogenous variables. Building on SARIMA with additional features to explain random variation, SARIMAX looks at the calendar plus external factors—like sales account manager’s Forecasted Revenue, marketing spend, or economic indicators. It provides the power of time series + linear regression.

    3. LSTM: The Deep Learning Memory Model

    As François Chollet explains in Deep Learning with Python, LSTMs are a specialized type of Recurrent Neural Network (RNN). While traditional models may forget the beginning of a sequence by the time they reach the end, LSTMs have a carry (Cell State)—a way to keep track of long-term dependencies. Unlike SARIMA, which uses fixed formulas, the LSTM creates its own features through layers of neurons. It is Long Short-Term because it decides what to remember (Long-term) and what to throw away (Short-term).


    The Dataset and Exploratory Data Analysis (EDA)

    Sparse Data Cannot Be Used For Forecasting

    My first attempt to do a forecasting technique shoot-out leveraged an SFDC/CRM-style dataset (Source: Chioma Iwuchukwu – Sales Funnel Revenue Forecast). The raw data captured roughly 100 days of activity starting January 1st. While this seed data provided a realistic snapshot of B2B deal flow, it presented a cold-start problem: with only ~18 actual “Won Revenue” events, the data was too sparse for deep learning models to distinguish between a recurring pattern and random luck.

    Sometimes the data is so bad that it cannot be used for forecasting, and despite my best efforts this was as close as I could get with a n=18 ultra-sparse dataset that looked like this:

    As a result, my forecasts were way off the mark:

    In its original state, the data provided a snapshot of performance but was limited in volume, making it difficult for deep learning models like LSTMs to generalize without overfitting. A review of the summary statistics (df.describe()) reveals a high standard deviation (275% of the mean for Won Revenue) and an enormous range (0 to $46K for Won/Loss) across all key variables. In a B2B context, this indicates a high-variance environment where individual large deals can significantly swing daily totals. Another indicator was the fact that the mean for Won Revenue was $3,986 while the median was $0.

    The Augmented Dataset (n=200): Scaling for Deep Learning

    To do a comparison of techniques, I had to do a significant clean-up and rebuild of the n=100 dataset, and so I utilized synthetic data augmentation to expand the dataset to 200 daily observations that captured the underlying structure of the initial data but removed the volatility while capturing the weekly cadence and growth of the initial dataset. This was a deliberate data engineering step to create a robust training environment for the LSTM Neural Network. This augmentation was calibrated to mirror the original B2B cycle mathematically:

    • Deterministic Trend: I preserved the original growth trajectory, ensuring the models evaluate a business that is scaling.
    • Seasonal Harmonic: Using a sinusoidal function, I reinforced the 7-day weekly cycle. This captures the essential B2B weekend dip and mid-week peak patterns (shown below).
    • Stochastic Noise: I injected Gaussian Noise (random variance) to simulate market volatility. This forces the LSTM to learn signal over noise—distinguishing between a structural shift and daily chatter.
    • Lookback Depth: Doubling the volume allowed for a 14-day sliding window. This gives the neural network enough temporal depth to learn from two full business cycles before making its next prediction.

    Analysis of Variability and Decomposition

    For the new dataset, things look much more promising.  I removed the extreme observations (outliers) because the three $45,000 observations alone were adding $1,350 to the daily average.  The standard deviation is now around 0.19 off the mean because the max is much lower, and so the range is much tighter (min:max).  Another check showed that the median was $5,236 — really close to the $5,249 mean indicating a good symmetry and low skewness. It is ready to use.

    Here are some bar charts to show the daily trends, which are not highly variable but do show the weekend sales dip:

    Typically, Time Series Decomposition (as shown in the chart below) allows us to strip away the noise to see the underlying mechanics:

    1. Trend: A clear, 15% positive slope indicating long-term growth.
    2. Seasonality: A heavy daily/weekly influence.
    3. Residuals: A significant amount of randomness.

    Because of this high residual noise, a pure time series model like ARIMA (which only looks at past values) is likely to be insufficient. This is why I have utilized SARIMAX to incorporate Forecasted Revenue (the field sales pipeline, which is opinion based and gives us a blend of business intuition and machine learning) as an exogenous variable and an LSTM to capture non-linear relationships that traditional statistics might miss.

    By increasing the density of the dataset, I have smoothed the influence of extreme outliers, ensuring that the resulting forecast is a reflection of systemic performance rather than a reaction to a few bad days.


    Forecasting Model Results

    The logic is driven by the test size = 21 variable. In time-series forecasting, we typically set aside a test set to validate how well the models perform against real data.

    • Total Dataset (n=200): Since the frequency is daily (freq=’D’) the dataset covers approximately 6.5 months (from January 1st to mid-July).
    • Training Period: The first 179 days.
    • Forecast (Test) Period: The final 21 days.

    Based on the measures of predictive accuracy (below) SARIMAX is the winner, followed closely by LSTM. This is not surprising, since we saw the randomness (exogenous factors beyond trend and seasonality) that were driving sales, and these two techniques are able to capture some of that noise, especially because they are incorporating human opinion (similar to the Delphi Method) by adding the sales pipeline data into the model.  Sales account managers know things that are out of the model’s sight: if a key contact just quit (or joined) a company, competitor price competition/promotions, new product launch timing, etc. The forecast now includes this information.

    The fact that the SARIMAX model really outperformed the ARIMA model shows that, in this case at least, human opinion can add a great deal to machine learning models, and for this dataset (developed using past historical forecast and actual data) the two are highly correlated:

    So, humans still have a place in the world of AI!


    Citations

    • Chollet, F. (2021). Deep Learning with Python (2nd ed.). Manning Publications. (For logic regarding LSTM architecture and temporal representations).
    • Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. (For data visualization and time-series plotting).
    • Harris, C. R., et al. (2020). Array programming with NumPy. Nature. (For synthetic data generation and mathematical arrays).
    • Iwuchukwu, C. (2024). Sales Funnel Revenue Forecast [Dataset]. GitHub/Personal Collection. Augmented and expanded to n=200 using Python synthetic generation techniques (2024).
    • Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education. (For the application of SARIMA/SARIMAX in a business revenue context).
    • Pandas Development Team. (2023). pandas-dev/pandas: Pandas 2.0.0. Zenodo. (For data manipulation and time-series resampling).
    • Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference. (For SARIMA and Seasonal Decomposition implementation).
    • Synthetic Revenue Dataset. (2024). Enlarged B2B Sales Funnel Forecast Data [Generated Dataset]. (Derived from original Sales Funnel Revenue patterns using Python-based augmentation).



  • The Multi-Channel Force Multiplier: How Bridging Digital Nurture and Direct Outreach Triples Conversion Lift.

    Why Marketing Mix Optimization Matters

    • Discover Channel Synergy: Move beyond single-channel metrics to media mix Lift measurement. Use Association Rules to find specific channel combinations—like Brand Awareness and Email—that work together to increase conversion rates.
    • Identify High-Impact Sequences: Design effective integrated campaigns. Create successful product bundles and promotions, (such as combining Retargeting with Discount Offers). These synergies act as force multipliers, driving more value than each channel could achieve in isolation.
    • Balance Scale and Predictability: Use Support to find marketing mixes that reach a large audience, and Confidence to identify those that offer the most reliable path to a sale. This ensures your strategy is both broad and dependable.
    • Model the Hybrid Journey: Don’t treat digital and high-touch channels as separate silos. By merging disparate datasets, you can visualize a unified customer journey that reflects how people truly interact with your brand across every platform.

    Introduction

    The Problem: Beyond the “3+ Rule”: It is widely accepted that a synergistic media mix will always outperform a single media vehicle. Historically, the industry adhered to the “3+ rule”—popularized in the 1970s—which suggested that three exposures to a message were required to influence a purchase. In the digital age, however, that threshold has risen to a frequency of 7+ or more. While media and advertising agencies have used techniques like linear programming and mainframe-based syndicated survey data since the 1980s to optimize these mixes, modern integrated marketing campaigns require a more sophisticated touch.

    The Implementation Gap: Throughout my career, my data science teams and I have built media mix optimization models for numerous B2B companies. While these models are often adopted in principle at the executive level, they frequently prove too complex for practical implementation. Often, marketing groups remain so tactically focused that executing a systematically integrated campaign feels unfeasible, rendering the optimization an “academic exercise.” Although modern vendors provide multi-touch attribution (MTA) methods to track touches against opportunities and allocate budgets, the real value lies in using this data as a foundation for deeper optimization work.

    A Strategic Priority for the CMO: In practice, CMOs—such as Todd Forsythe and Jonathan Martin—leverage these models to calibrate marketing budgets and enhance overall effectiveness. Building a media mix model is typically the first task I undertake when launching a new data science practice. It is a baseline expectation that a CMO understands the optimal mix for generating pipeline and revenue and allocates their budget accordingly.


    Innovation through Association Rules

    My approach to media mix modeling has evolved toward leveraging association analysis. The idea originated from Ling (Xiaoling) Huang, and I refined the methodology through collaborations with Yexiazi (Summer) Song and, most recently, Fuqiang Shi, to develop models for diverse business units.

    According to Miller (2015), this technique is commonly referred to as Market Basket Analysis, a concept born in retail:

    “Market basket analysis, (also called affinity or association analysis) asks, what goes with what? What products are ordered or purchased together?”

    Miller (2015)

    By applying this retail-focused logic to media, we can uncover the hidden relationships between marketing channels. As Zhao et al. (2019) noted:

    “The challenge of multi-channel attribution lies not just in identifying the final touchpoint, but in uncovering the hidden synergies where the presence of one marketing stimulus significantly amplifies the effectiveness of another.”

    Zhao et al. (2019)

    The Data: Constructing a Hybrid Marketing Funnel

    To demonstrate this methodology, I synthesized a consolidated dataset of over 45,000 interactions by merging the UCI Bank Marketing Dataset with the Kaggle/Criteo Multi-Touch Attribution (MTA) Dataset. While these sources represent different industries—fixed-term bank deposits and e-commerce—their combination provides a comprehensive view of the modern buyer’s journey, blending high-frequency digital touchpoints with high-touch personal outreach.

    Dataset Profiles

    1. UCI Bank Marketing Dataset

    This dataset is famous for being highly imbalanced, which is a “real-world” marketing scenario.

    • Non-Converters: About 88% of the rows. Customers who were called but did not subscribe to the term deposit.
    • Converters: About 12% of the rows.
    • Why it matters: Because most people don’t convert, when the model finds a channel like Mobile Outreach that has a high Lift, it’s mathematically significant.

    2. Kaggle/Criteo Attribution Dataset

    This dataset is designed for Multi-Touch Attribution (MTA).

    • Non-Converters: These are “Journeys” where a user clicked on ads (Email, Display, etc.) without converting to a sale.
    • Converters: Customer journeys that resulted in sales.
    • Why it matters: this helps improve marketing effectiveness and efficiency by identifying waste (channels that people click on but never lead to a sale).

    Data Preprocessing

    Raw data labels were standardized into a unified “Media Mix” master dataframe to ensure consistency across sources:

    • (Kaggle MTA) Email Nurture: Represents the high-volume digital baseline.
    • Cellular (UCI Bank Mkt.) Mobile Outreach: Represents direct personal contact via mobile.
    • Telephone (UCI Bank Mkt.) Telemarketing: Represents landline-based outreach.

    The preprocessing phase involved concatenating the sources, indexing, removing non-essential characters, and handling missing values. I then formatted the dates and standardized the channel categories to enable a seamless cross-platform analysis.

    The Data: A Hybrid Funnel Approach

    By merging these records, we are modeling a hybrid B2C customer journey. This reflects the reality of high-value industries like financial services, where a customer might be prompted by a high-volume digital ad (Criteo) but requires a personal, high-touch mobile conversation (UCI) to finalize a complex transaction.

    This balanced view allows the association rules to uncover synergies across the entire funnel, rather than looking at digital or offline channels in isolation.


    Apriori Methodology

    Using the Apriori Algorithm (Raschka, 2018, Agraval et al. 1996), I processed over 45,000+ interactions to identify Synergy Lift which is one of several measures of impact. According to Miller (2015)  “an association rule is a division of each item set into two subsets with one subset, the antecedent, thought of as preceding the other subset, the consequent. The Apriori algorithm … deals with the large numbers of rules problem by using selection criteria that reflect the potential utility of association rules.” Essential the rule is Antecedent (marketing mix) à Consequent (conversion). Here are the key measures:

    • Support (scale) how often this mix occurs:
      • Support = occurrences of rule (mix+conversion) / total customer base
    • Confidence (predictability) how often a customer buys when exposed to this mix:
      • Confidence = ratio of conversions for each media mix combination.
    • Lift (synergy) how well this mix performs vs. the average:
      • Lift = Confidence/Support (consequent or avg. conversion rate)

    Model Output

    Once I ran the Apriori algorithm and selected the top rules by highest lift (mix performance vs. the average) it was clear that the volume of marketing interaction was not aligned to conversion potential.  The chart below has to be interpreted with caution, and illustrates the danger of looking at any channel in isolation, because email looks high volume/ low lift however it often occurs in high-potential integrated combinations.

    Lift (>1) is the force multiplier and so that is how I evaluated the combinations for effectiveness in converting customers. For example, many combinations without mobile, direct mail, email and telemarketing had high impact.  Activities that happen on the right-hand side of the arrows (>>) are happening at the same time as the customer conversion (Converted), and so should be considered part of the overall mix.

    New product launches, combined with brand awareness, email and retargeting had the most impact, followed by a similar mix that replaced launches with discounting – both are time sensitive calls to action and so this makes sense.   Activities on the left-hand side of the arrows are typically the ‘nurture’ phase while the right-hand side is the conversion event.

    This visual shows the top media mix combinations and their performance relative to the baseline.  So, for a financial services marketer this would be the roadmap for funding and executing integrated marketing campaigns:

    If we want to look at combinations to check a particular pair of marketing channels, or create a particular tactic, a correlogram like the one below shows the pairs with the most lift.


    Optimizing Marketing Budgets: A Data-Driven Approach

    From a funding perspective, analyzing our 55,211-record blended dataset through Ridge regression allows us to move beyond raw interaction volume to true contribution. By generating and normalizing beta coefficients, we can isolate the unique impact of each channel on the final conversion event, providing a mathematical foundation for marketing spend allocation.

    Based on this specific analysis, here is the performance breakdown of the primary drivers and their normalized contribution to conversion:


    Summary

    • The Efficiency of Retention: Email and Referral show the highest normalized contribution, suggesting that “warm” audience paths are the most reliable foundation for the budget.
    • The Synergy Mandate: Funding should prioritize synergistic pairs rather than siloed channels. For example, the high weights of Social Media and Search Ads suggest they function best when funded in tandem to capture both interest and intent.
    • Awareness as Air-Cover: “Brand Awareness” channels (Social/Display) provide the necessary air-cover for time-sensitive, high-conversion calls to action like New Product Launches and Discount Offers.

    Citations

    Criteo Labs (2018). Criteo Attribution Modeling & Bidding Dataset. Kaggle. Available at: https://www.kaggle.com/c/criteo-attribution/data

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

    Moro, S., Cortez, P., & Rita, P. (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, 62, 22-31. Elsevier. https://doi.org/10.1016/j.dss.2014.03.001

    Raschka, S. (2018). MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. Journal of Open Source Software, 3(24), 638. https://doi.org/10.21105/joss.00638

    Zhao, K., et al. (2019). Deep Learning and Association Rules for Multi-Channel Attribution. In Proceedings of the 2019 International Conference on Data Mining & Marketing Analytics.

    UCI Machine Learning Repository. (2012). Bank Marketing Dataset. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing


  • From Accounts to Contact Personas: a data-driven framework for B2B targeting and segmentation – when contacts matter.

    Why Contact Targeting Matters

    • High Potential Contacts in High Potential Accounts: Identify and target the right contacts within high propensity to buy accounts.
    • Humanize the Data: Transition from seeing Accounts as rows in a CRM to identifying the Buying Committee. Use data science to map the specific personas—from the Business User/Influencer to the New Technology Decision Maker.
    • Precision Targeting: Stop the “spray and pray” approach by aligning your messaging with the specific pain points of each persona. By predicting which contact is most likely to engage, you can improve targeting precision against the individuals who drive the highest conversion.
    • Optimize Multi-Touch Strategy: Design an integrated campaign to hit the right persona at the right time. This ensures that your Account-Based Marketing (ABM) efforts are statistically optimized for the collective decision-making process.
    • Contact Data Quality: Proactively maintain key contact coverage, permissions and data quality in high potential accounts.

    Introduction & The B2B Modeling Hierarchy

    In my experience, B2B contact data lacks the predictive weight found in B2C or subscriber databases. Having managed contact data at Hearst, Cisco, and DellEMC, I’ve seen firsthand that while contact attributes (PII, job titles, and history) are essential for execution, they are often secondary for targeting.

    Because contact data is lower in dimensionality and higher in volatility, it rarely contributes to B2B targeting models in a statistically significant way. Even in “greedy” models utilizing 250+ features (including firmographics, purchasing history, competitive install, intent and installed-base telemetrics) the company-level attributes consistently push out contact-level demographics.

    Machine learning and statistical algorithms, (such as logistic regression, SVM, Random Forest and XGBoost), will almost prioritize company-level variables, often rejecting contact-level data as “noise” that offers negligible improvement in predictive accuracy.


    Starting with the Account

    B2B propensity-to-buy (also respond or churn likelihood), RFM or cluster segmentation, and Customer Lifetime Value are all fundamentally based on the company (or account).  This has always been the case. To maximize impact, start with the company.

    Typically, the most influential variables (features) for these models include:

    • Past Purchases (RFM): The strongest indicator of future behavior.
      • Recency, Frequency, and Monetary values at the account level provide the historical baseline for any CLV calculation.
    • Firmographics: Standardizing variables like Industry and Revenue is critical.
      • These act as the primary “branches” in decision-tree models like XGBoost to segment high-value targets.
    • Engagement Data: Third-party syndicated media usage and first-party website traffic.
      • “Intent” signals, indicate where an account is in the buying journey.
    • Pipeline Dynamics: Lead volume and velocity.
      • How quickly multiple contacts from the same account are engaging.
    • Market Context: Installed base telemetrics (usage) and competitive product footprint.
      • Gap analysis of current products that are at maximum use or underutilized for cross-/ up-sell, and identifying accounts that are primed for a displacement campaign based on service contracts or competitive product purchases.

    However, once the targeting or segmentation scheme is developed, that is the time that contact penetration, quality and associated attributes become critical for go-to-market execution.


    Maintaining a Contact Data Foundation

    The foundation for contact data should ideally be in place prior to program execution. This means achieving high contact penetration in key functional areas with established brand awareness and permissions. Because contact data is notoriously volatile, maintenance must be an ongoing process to prevent decay.

    Technical Note: For this analysis, I generated synthetic data mirroring a typical B2B tech environment. All data wrangling and modeling were performed using Python (pandas/NumPy) in a Jupyter Notebook, with Matplotlib/Seaborn for graphics and Scikit-Learn for K-Means clustering.

    Contact Profiling: Turning Noise into Segments

    Marketing to hundreds of thousands of free-form job titles is impossible at scale. To develop a systematic approach, a contact hierarchy is critical. While companies once had to build internal mapping tables, most modern vendors now provide standardized hierarchies to assess quality and facilitate execution.

    Start with a demographic profile of contacts currently in the marketing data lake or datamart.  The table below is an illustration of a basic contact profiling scorecard that can be used to analyze and track the total marketable contact data foundation.

    Hypothetical Contact Profiling Scorecard

    Addendum: Contact Data Health Checklist

    Framework Reference: Forrester (formerly SiriusDecisions) Data Strategy Standards

    To ensure the “Workhorse” models discussed in previous articles perform at peak efficiency, I recommend auditing your contact database against these five critical health benchmarks:

    • Accuracy: >95% for core predictive features (Job Title, Industry).
      • Scientific Impact: High accuracy reduces “label noise” and improves the gain in classification trees.
    • Density: 100% completeness for critical path fields (Email, Account Name, Contact Name, Title).
      • Scientific Impact: Eliminates data sparsity, ensuring your models don’t rely on biased imputations.
    • Timeliness/Validity: <12 months since last verification.
      • The Decay Factor: B2B data decays at ~2.1% per month (25% annually). Records older than one year significantly increase bounce rates and skew survival analysis.
    • Consistency: 100% standardization on categorical variables (Country, Company Name, Seniority).
      • Scientific Impact: Standardizing “US” vs “USA” is essential for Wickham’s (2014) Tidy Data principles and ensures correct feature grouping.
    • Buying Group Linkage: >3 contacts mapped per target account.
      • Strategic Impact: Essential for moving from individual “Lead Scoring” to “Account-Based Propensity” models.

    I wanted to work from a single synthetic dataset to maintain continuity. For the dataset I’ll be using (below), here is a basic job title distribution to use as a starting point and later we will do some basic Persona mapping.


    Fitting to the Account Target

    Further cross-tabs can be done on the contact data to ascertain whether the contacts exist in high or low potential geographies, industries, etc.

    There are three ways to do this.  If only a few features are available or a specific industry is being targeted, then match the contacts to the companies:

    Contact Distribution by Country and Industry — Top 10 in Rank Order.

    Once the foundation is set, cross-tabulations can determine if contacts exist within high-potential geographies or industries. We can visualize this through a Tree Map of the top countries and industries. The larger the box, the higher the concentration of a specific job title within that segment.

    Tree Map: Top 10 Countries, Industries and Job Titles based on contact coverage.

    The “Sweet Spot” for Sales and Marketing

    For a synthesized view, we return to the Propensity-to-Buy (P2B) models. By identifying the top job titles within high-propensity accounts, we find the “sweet spot” for marketing spend.

    Propensity to Buy Customer Distribution

    Here we have the top job titles for high-propensity accounts initially in a bar chart (below):

    Taking segmentation a step further, we use K-Means clustering to group companies by Industry, Country, and P2B score. According to Tan et al. (2019),

    “Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. The goal is that the objects within a group be similar or related to one another and different from (or unrelated to) the objects in the other groups. The greater the similarity (or homogeneity) within a group and the greater the difference between groups, the better or more distinct the clustering.”

    Tan et al. (2019)

    This allows us to drill down into specific personas (e.g., TDMs and Strategic Managers) within “High Value/High Growth” clusters.

    Distribution of contact personas in the High Value cluster segment.

    Program Execution

    Fictitious PII synthetic data to illustrate a final list of contacts in high propensity-to-buy companies. Now the hypothetical program is ready for execution.

    Illustration only: synthetic data not intended to represent actual persons or PII.


    Conclusion

    By prioritizing account fit over contact volume, marketing effectiveness improves across every metric:

    • Lower cost-per-lead.
    • Higher response rates through relevance.
    • Better pipeline quality and higher conversion rates.
    • Increased revenue.

    Targeting precision in B2B marketing isn’t about how many contacts you reach, but about reaching the right people within the right organizations. Start with the account to find your target, then use high-quality contact data to hit it—this is the most reliable path to maximizing both ROI and market impact.


    Citations

    Forrester Research. (2024). The B2B Marketing and Sales Data Strategy Toolkit. [Online]. Available at: https://www.forrester.com/report/the-forrester-b2b-marketing-and-sales-data-strategy-toolkit/RES172091

    Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

    Integrate. (2025). Implementing the B2B Data Quality Toolkit: Standards for 2026. [Online]. Available at: https://www.integrate.com/resources/b2b-data-quality-toolkit

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

    Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to data mining (2nd ed.). Pearson Education.

    Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10


  • Plugging the Leak: Gradient Boosting and Survival Analysis for Customer Retention

    Why Churn Modeling Matters

    The “Leaky Bucket” metaphor was first popularized by subscription marketers using CRM in the ‘90’s and I see it finally making its way into B2B marketing as a result of XaaS and Cloud technology.

    Coming from subscriber marketing, the idea of customer churn and retention were always paramount for B2C marketers. With the rise of XaaS and Cloud technology, the ‘Leaky Bucket’ metaphor—a staple of 90s CRM—has evolved from a B2C concept into a mission-critical B2B framework.”

    Churn modeling fits nicely with Propensity-to-Buy and Customer Lifetime Value because it employs many of the same techniques. There are two ways to look at churn which require methodologies we’ve seen in my past articles:

    What is the likelihood of a customer leaving in the next 30 days? This requires a statistical or machine-learning classifier such as XGBoost (which I used in my article on propensity to buy).  Here we want a list of customers that have a high churn likelihood, so the target for prediction is changed from purchase to churn (yes/no).

    When will a customer churn? Knowing the timing of churn means turning to survival modeling, which I employed in my article on Customer Lifetime Value analysis, and here the recommended python library is called Lifelines.  There are also some great visualizations of survival probability of different customer segments that can be generated, and I think that is an excellent way to understand the identify the drivers and profile the customers who churn – whether they are people (B2C) or businesses (B2B accounts).


    The Dataset

    The IBM Telco Customer Churn dataset is a famous dataset used by data scientists to predict churn and work with marketers to develop customer retention strategies. It is a fictional telecommunications customer database, providing a mix of demographic, service, and financial data. It consists of data on 7,043 customers with 21 features including:

    Demographics: Information about the customer’s gender, age range (Senior Citizen), and whether they have partners or dependents.

    Account Information: How long they’ve been a customer (tenure), their contract type (Month-to-month, One year, Two year), payment method, paperless billing, and charges (MonthlyCharges and TotalCharges).

    Services: Specific services the customer has signed up for, including Phone, Multiple Lines, Internet (DSL, Fiber Optic, etc.), Online Security, Online Backup, Device Protection, Tech Support, and Streaming TV/Movies.

    The Target: The Churn column, indicating whether the customer left within the last month (Yes/No).


    Exploratory Data Analysis (EDA) and Data Quality

    An analysis of the dataset showed that it was very complete, with only 11 missing Total Charges so other than a few data transformations (strings à integers) I wasn’t concerned about doing a lot of cleaning and data manipulation.

    Since the data had a field for Churn (Yes/No) a customer profile could be generated which provided a lot of insight into churned customers:

    From these charts, we can see that locking customers in with long term contracts and automatic withdrawal seems a good retention strategy, as churners tend to be on monthly contracts and paying by check.  Perhaps Senior Citizens are more price-sensitive and have bandwidth to shop for discounts, but that would have to be tested.  I also generated a correlation matrix which showed the correlation of the features to customer churn as another way of looking at the characteristics of churned customers:


    The Classification Question: Will they leave? (XGBoost): Real-world tradeoffs in modeling the probability and timing of customer churn.

    There are a lot of features that can be used together to predict churn from this dataset. The first question is “will a customer churn”?  So, I think of this as a propensity to churn model which classifies churners based on the statistical probability of churn (essentially a propensity to buy model with the target changed from “purchase” to “churn”).

    Typically, in sales and marketing models we have to decide when we target:

    1. Do we have a conservative model that is relatively accurate in predicting purchases or churn, but misses a lot of customers because it is so conservative? From a marketing expense perspective this is efficient.
    2. Do we tune more aggressively and cast a wider net, but target a lot of prospects or customers that will not purchase or churn (false positives)? This is less cost-efficient but will uncover more absolute revenue or churn “by knocking on more doors.”

    In my experience, I have always leaned towards the more aggressive model (within reason), and sacrifice precision to hit more potential purchasers/churners.

    Looking at the confusion matrix below for my baseline (aggressive) model, it is great at capturing churn within a 30 day window (Recall = 0.82 means that it captures 82% of the customers who churned) but at a high cost of false positives (Precision = 0.50 which means that any marketing or sales effort will be inefficient because it will cast a very wide net).  Further tuning improved precision, but at the cost of rejecting a lot of churning customers who had lower probability scores based on the available data, so I would not go with the conservative model or would continue to tune.

    In short: a false positive will increase marketing or discounting, while a false negative will lose a customer.

    In the high-risk customer base, we have two groups:

    1. High risk and high monetary value customers that are on month-to-month contracts that should be encouraged to sign long term contracts.
    2. High risk and lower monetary value customers that are locked into one- to two-year contracts that are targets for long term retention and customer satisfaction programs.

    For some more input on sales and marketing program design, XGBoost also produces a list of the features (variables below) that are most influential in the model, which can be used to test tactical adjustments such as discounting for two-year contracts and content rebalancing.


    The Time-based Question: When will they leave? (Lifelines)

    While the XGBoost classifier was able to tell us “This customer is at-risk” a survival model can tell us “This customer has a 60% chance of making it a year” so that a marketer can time retention efforts. To find out when a customer will churn, we need a model that is designed to predict the timing of events.  Miller finds that “a good example of a duration or survival model in marketing is customer lifetime estimation” (Miller, 2015). These are generally categorized as Survival Models:

    “… medical researchers are often interested in the effects of certain drugs on the timing of death (or recovery) among a sample of patients. In fact, these statistical models are known most as survival models because they are used often by biostatisticians, epidemiologists, and other researchers to study the time between diagnosis and death. … social and behavioral scientists have adopted these models for a variety of purposes.”

    Hoffmann, J. P. (2016)

    In Python there is a package called Lifelines (lifelines import KaplanMeierFitter) which I used for this task. The survival probability curve by contract type highlights the value of two-year contracts.


    Financial Impact

    By combining two churn modeling approaches sales and marketing can evolve from a reactive strategy to proactively developing retention programs that improve financial performance. An aggressive XGBoost model gives the business the ability to identify high-risk/ high-monetary value customers in advance, while Survival Analysis improves marketing effectiveness by guiding the timing of retention and customer satisfaction campaigns.

    The addition of churned customer profiling can be used to develop relevant marketing communications and pricing strategies.


    Citations:

    Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785

    Davidson-Pilon, C. (2023). lifelines: survival analysis in Python (Version 0.27.8) [Software]. Zenodo. https://doi.org/10.5281/zenodo.8259706

    Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

    IBM Sample Data Sets. (n.d.). Telco Customer Churn: Focused customer retention programs. Retrieved from https://community.ibm.com/community/user/businessanalytics/viewdocument/telco-customer-churn

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.


  • The Financial Side of Marketing: Beyond RFM to Predictive CLV

    Why Customer Lifetime Value Matters

    • From History to Horizons: Traditional RFM (Recency, Frequency, Monetary) analysis only describes where a customer was. Predictive CLV uses survival models to forecast where they are going, transforming marketing from a cost center into a predictable revenue engine.
    • Quantify Customer Equity: Treat your customer base as a financial portfolio. By calculating the Discounted Cash Flow of future purchases, leadership can determine the true value of the business beyond the current balance sheet.
    • Optimize CAC:LTV Ratios: Stop measuring campaign success by immediate ROI. High-growth organizations use Predictive CLV to justify higher Customer Acquisition Costs (CAC) for segments that are statistically likely to become high-value “whales” over time.
    • Solve the “Long Tail” Problem: Identify the difference between an At-Risk customer and a Dormant customer. Predictive modeling provides the governance needed to stop wasting spend on attempting to reactivate dormant churned users while doubling down on those who have briefly paused their buying cycle and are at-risk of moving to the competition.

    Introduction

    My previous articles on RFM analysis and propensity-to-buy modeling explored stand-alone frameworks for segmentation and targeting. However, these models also serve as the foundational inputs for the ultimate metric in account prioritization: Customer Lifetime Value (CLV).

    While I have typically used CLV with subscription-based B2C models or B2B IT hardware (storage, routers, and switches), I believe that the principles are identical for online retail. By analyzing historical purchasing patterns through RFM (Recency, Frequency, and Monetary Value), we establish a behavioral baseline. To effectively prioritize sales coverage and marketing spend, we can then project the customer’s expected life.

    Leveraging RFM we have the CLV formula:

    CLV = (Frequency * Monetary) x Expected Lifespan

    By shifting from standard propensity modeling (the simple probability of purchase) to survival analysis (the timing and likelihood of the next purchase), we can distinguish between historical high-spenders and customers with the highest probability of future revenue. This allows us to treat customer acquisition and retention as a strategic financial investment—ensuring that projected cash in-flows exceed the Customer Acquisition Cost (CAC). As Miller (2015) notes:

    “Customer lifetime value analysis draws on concepts from financial management. We evaluate investments in terms of cash in-flows and out-flows. Before we pursue a prospective customer, we want to know that [sales] will exceed [costs]”

    Miller (2015)

    From this, we can derive Net CLV (CLV – CAC) or the Financial Efficiency Ratio (CLV:CAC). For this exercise, I have targeted the 3:1 KPI (popularized by venture capitalist David Skok) as the benchmark for a healthy account.


    Data Overview

    The Online Retail Data Set from the UC Irvine Machine Learning Repository is a publicly available real-life dataset used by students for customer analytics, RFM (Recency, Frequency, Monetary) modeling, and market basket analysis.

    Dataset Description

    This is a transactional dataset containing all transactions occurring between December 1, 2010, and December 9, 2011, for a UK-based, non-store online retail company (Chen, D., 2012).

    • Business Nature: The company primarily sells unique all-occasion giftware.
    • Customer Base: While many are individuals, a significant portion are wholesalers (which accounts for the extreme outliers in spending and quantity).
    • Scale: 541,909 transactions and 8 attributes.

    Here is a sample from that dataset:

    Exploratory Data Analysis (EDA) and Data Cleaning

     Visual inspection revealed several data quality issues. I filtered the dataset to focus strictly on product purchases, removing entries related to “bad data”: returns, exchanges, and duplicative descriptions. Examples of the values removed are below (there were many more): 

    filter_values = [    ‘add stock to allocate online orders’, ‘adjust’, ‘adjustment’,  ‘alan hodge cant mamage this section’, ‘allocate stock for dotcom orders ta’,  ‘barcode problem’, ‘broken’,  ‘came coded as 20713’, “can’t find”, ‘check’, ‘check?’, ‘code mix up? 84930’,  ‘counted’, ‘cracked’, ‘crushed’, ‘crushed boxes’, ‘crushed ctn’, ‘damaged’,  ‘damaged stock’, ‘damages’, ‘damages wax’, ‘damages/credits from ASOS’, ‘damages/display’, ‘damages/dotcom?’, ‘damages/showroom etc’, ‘damages?’]

    Data Distributions: Outliers and Skewness

    Histograms of the RFM data show that none of the three components are normal (Gaussian) distributions.  We see a high concentration of customers who bought recently and tapered off, a lot of customers who were one-time purchasers, and a right-skew in monetary value due to high spending customers.


    Model Selection and Development: Heuristic vs. Statistical Modeling

    The Manual “Heuristic” Model This is a “quick and dirty” approach using business rules of thumb. I utilized a step function to assign the probability of a customer remaining “alive” based on their last purchase:

    • 0–30 days since last purchase: 95% chance.
    • 31–180 days since last purchase: 80% chance.
    • Over 180 days: 10% chance.

    The Lifetimes Statistical Model (BG/NBD) Next, I applied the BG/NBD (Beta-Geometric/Negative Binomial Distribution) model via the Lifetimes library (Davidson-Pilon, C., 2021). Unlike the heuristic, this model analyzes the individual cadence of every customer. If a customer who typically buys every 10 days hasn’t purchased in 30, the model flags them as “at risk” much faster than a once-a-year purchaser.

    I’ll return to the choice of Heuristic business rules based models vs. statistical Survival Analysis in a later article on customer churn, where this comparison is also relevant.

    Results and Model Accuracy

    The Lifetimes model proved to be more conservative than the heuristic approach. It yielded a Mean Absolute Error (MAE) of 0.98, indicating that the model’s prediction was off by less than one transaction per customer over a three-month holdout window.

    While I utilized a 12-month window to account for retail seasonality and maintain precision, this horizon is flexible; for technology hardware with longer refresh cycles, a 3–5 year window is often more appropriate.

    Model Performance Metrics: The following metrics compare the model’s predictions against actual customer behavior during the holdout period:

    • MAE: 0.9846
    • Actual Avg. Purchases (Holdout): 1.4462
    • Predicted Avg. Purchases (Holdout): 1.0706

    Financial Output & Targeting Logic

    By applying the model to our customer base, we derived the following financial benchmarks:

    • Mean CLV: $3,816.56
    • Mean Opportunity Ratio: 1.30

    Using an average Customer Acquisition Cost (CAC) benchmark of $400 (a conservative estimate based on the consumer electronics industry) and the 3:1 efficiency ratio, we establish a CLV threshold of $1,200. Any customer with a predicted CLV below this mark represents a net loss, whereas those above it are prioritized for sales coverage.

    Putting this all together we have a targeting grid for planning and targeting high value customers:


    Conclusion

    Enterprise-level data is often more complex and so more work can be involved in tuning the CLV model, however the core approach here remains the same for strategic planning, account-based planning and target marketing. Understanding the future value of a customer (vs. only historical spend) is one of the most important capabilities in a marketer’s toolkit.


    Citations

    Chen, D. (2012). Online Retail [Dataset]. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Online+Retail

    Davidson-Pilon, C. (2021). Lifetimes: Measuring customer lifetime value in Python (Version 0.11.3) [Computer software]. https://github.com/CamDavidsonPilon/lifetimes

    Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

    Marianantoni, A. (2025, March 19). CLV to CAC Ratio: Guide for Startups 2025. M Accelerator. https://maccelerator.la/en/blog/entrepreneurship/clv-to-cac-ratio-guide-for-startups-2025/

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

    Shopify Staff. (2024, July 29). Customer acquisition costs by industry (2025). Shopify Blog. https://www.shopify.com/blog/customer-acquisition-cost-by-industry#4

    Skok, D. (2010, February 17). SaaS metrics 2.0 – A guide to measuring and improving what matters. For Entrepreneurs. https://www.forentrepreneurs.com/saas-metrics-2/


  • The Demand Generation Workhorse

    Why Propensity-to-Buy Targeting Matters

    • Precision Resource Allocation: It identifies the specific customers and prospects with the highest likelihood to purchase, allowing marketing and sales teams to focus their budget and energy where it will yield the highest return.
    • Churn Mitigation: By predicting which contacts or companies are at risk of leaving for a competitor, P2B allows for proactive retention strategies before the customer actually departs.
    • Optimized Marketing Frequency: It prevents “over-indexing” or inundating low-potential customers with excessive outreach, identifying the “tipping point” where increased contact frequency actually decreases the probability of a sale.
    • Foundation for Advanced Metrics: P2B generates the critical probability scores required to calculate more complex financial models, such as Customer Lifetime Value (CLV), effectively bridging the gap between marketing activity and long-term profitability.
    • Strategic Profiling: It isolates high-propensity populations to uncover the influential variables (e.g., communication channel, past success, or financial stability) that define the “ideal” customer profile for future targeting.

    Introduction

    The “workhorse” of modern demand generation is a family of binary classification models. These models are designed to predict a specific response, such as “…whether a customer buys, whether a customer stays with the company or leaves to buy from another company, and whether the customer recommends a company’s products to another customer” (Miller, 2015). Bruce Ratner (2017), in his work on machine learning and data mining, calls this approach the “workhorse of response modeling.”

    There are many statistical and machine-learning techniques for building Propensity-to-Buy (P2B) models—whether predicting churn, response rates, or identifying CRM opportunities that will convert to sales.  The concept has been around since the early days of direct mail (1980s), when direct mail marketers relied on Logistic Regression (still a powerful technique). Now, fueled by big data and high-performance computing, one of the most popular and effective techniques is called eXtreme Gradient Boosting (XGBoost) and I am using that for my examination of how it can be applied in marketing.

    Propensity-to-buy has been the first technique we have used in any company I’ve had the pleasure of working in; it has evolved to become the foundation of modern digital marketing.  Technically efficient and scalable, once the base code is set, a P2B model can be trained to predict different targets with limited recalibration. 

    Key Applications of P2B:

    1. Purchase Likelihood for a Product: Identifying which customers and prospects are most likely to purchase specific products. This applies to both new product launches and up-sell/cross-sell for existing products.
    2. Churn Risk: Predicting which companies or individual contacts are at risk of leaving to competitors.
    3. Lead Scoring and Conversion Probability: Forecasting which CRM opportunities are most likely to convert to “Closed-Won” to optimize the marketing and sales pipeline.
    4. CLV Integration: Generating the probability scores required to calculate a Customer Lifetime Value (CLV) model.
    5. Campaign Response: Determining which contacts are most likely to engage with or respond to a specific marketing program.
    6. Customer Valuation: Identifying which customers will be the most valuable overall across their entire purchase history.  For example, comparing a company’s Total IT Spend with P2B.
    7. Strategic Profiling: Isolating a “high-propensity” population to build ideal demographic and firmographic profiles for future targeting.

    For the following example, XGBoost was perfect (an open-source machine learning ensemble method that builds multiple decision trees to correct previous errors). First introduced to me six years ago by data scientist Fuqiang Shi, it has become the gold standard for structured data. XGBoost gained global fame around 2014–2016 for dominating Kaggle competitions (often outperforming popular methods like Random Forest, Support Vector Machine, Bayesian Classifier, etc.).  That said, in practice a data scientist should test several methods to find the best model by comparing performance metrics.


    The Data

    To demonstrate how to build and score a model, I utilized the Bank Marketing Dataset from the well-known UC Irvine Machine Learning Repository. This dataset consists of 45,211 rows and 17 columns, representing a real-world scenario of a direct telemarketing campaign from a Portuguese banking institution (2008-2010). Here are the first five rows:

    Bank Marketing Dataset from the well-known UC Irvine Machine Learning Repository.

    UCI Machine Learning Repository: Bank Dataset. University of California, Irvine.


    Model Development: A High-Level Overview

    Propensity model development process.

    Predictive Performance

    The model achieved a predictive performance of 81% (ROC AUC), which was within range for a real-world application in my experience (although I have seen between 65% for prospecting to 90% for customer models). While I initially hard-coded the parameters, I followed up with a grid search (GridSearchCV) to ensure optimization. Since both approaches achieved the same predictive performance, I stopped tuning at that point. [Environment: Python Jupyter Notebook (Anaconda)].


    Using Model for Decision Support: Rebalancing Marketing Frequency.

    From a targeting perspective, we now have a list of customers and can develop a profile based on the characteristics of that population – either by analyzing the segment directly or by examining the most influential variables used in the P2B model.

    The model reveals a ‘tipping point’ in telemarketing outreach. The campaign variable shows that as the number of contacts increases (red dots moving left in the SHAP plot), the propensity to buy drops. This suggests the bank is currently over-indexing on low-potential customers, essentially ‘inundating’ them—while missing the opportunity to focus that energy on high-potential segments that require lower frequency to convert.

    Scatterplot showing customers by number of telemarketing contacts and propensity to purchase with size = account balance.

    The Negative Correlation: In the summary plot, the high values for campaign/ telemarketing calls (dark red dots) shift to the left of the center line. This indicates that a high number of contacts during a single campaign actually decreases the probability of a purchase.

    Further examination (and data) is required here to determine whether messaging, media mix, brand awareness or other factors also come into play during execution, but this is definitely a red flag since typically effective reach is around ~3X+.

    SHAP Insights:

    SHAP (SHapley Additive exPlanations) visualization showing how variables change the outcome (positive or negative) and bar chart of feature importance.

    Based on the SHAP (SHapley Additive exPlanations) visualizations provided, we can determine exactly which “levers” drive purchase propensity. This is the “explainability” phase that translates a black-box model like XGBoost into actionable business insights.

    • The Bar Chart (Left) – Importance. It shows variables prioritized by the model (e.g., “Cellular contact is the most important piece of information”).  One caveat here: this is an older dataset from the ML Library and for illustration only; interpret with caution!
    • The Summary Plot (Right) – Direction. Shows how those variables change the outcome (e.g., “Being contacted via cell phone increases propensity, while a housing loan decreases it”).

    Summary of Feature Influence

    The model shows that a mix of communication channelspast behavior, and economic stability are the primary drivers of a “Yes” prediction.

    1. Primary Driver: Communication Method (contact_cellular). The most influential variable and  the strongest predictor of a purchase.
    2. The “Momentum” Effect (poutcome_success). Success in previous marketing campaigns is a powerful indicator of future success. This validates my previous blog’s assertion that RFM (Recency/Frequency/Monetary Value) are highly influential in P2B models. 
    3. Financial Stability (housing no and balance). Customers without housing loans (housing_no) show a higher propensity to purchase. Further, higher bank balance levels correlate positively with conversion.
    4. Timing and Outreach (day_of_weekmonth_juncampaign). The specific timing of the outreach (months like June or March) influences the model, though to a lesser degree than the contact method.  So, a telemarketing group and marketing programs should be adjusted for seasonality.

    Conclusion

    I’ll be returning to propensity-to-buy in future articles, since like RFM analysis this technique is foundational to successful quantitative marketing.  Both techniques trace their origins to the 1980s as statistical tools for direct mailers and have evolved over time to become the foundational “workhorse” of modern digital marketing.


    Citations

    Miller, Thomas W. Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. FT Press, 2015.

    Moro, S., Rita, P., & Cortez, P. (2014). Bank Marketing [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.

    Ratner, Bruce. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data. 3rd ed., CRC Press, 2017.


  • My Favorite Segmentation Scheme

    Why RFM Segmentation Matters

    • Improves Retention Strategy
      Identifies lapsed buyers and at-risk segments for reactivation.
    • Optimizes Campaign ROI
      Targets only the most responsive and profitable audiences.
    • Supports Personalization
      Enables tailored offers based on behavioral patterns.
    • Simplifies Scoring & Execution
      Easy-to-implement framework with clear segment definitions.

    Introduction

    Let me know if you’d like this visual refined or sequenced into your infographic series once your image limit resets. I can also prep the next slide — perhaps Accounts to Contact Personas or Multi-Channel Force Multiplier — so we keep momentum.

    As I recall from school, the idea of optimizing a portfolio of B2B accounts goes back as far as Boston Consulting Group’s growth/share matrix and the initial list scoring techniques of direct marketers — and although the technology we use to segment customers has changed (i.e. Machine Learning) the underlying principles are the same: segment your B2C customers or B2B account base based on potential to optimize your go-to-market.

    The Segmentation Challenge

    The journey to effective segmentation often faces two extremes:

    The Black Box (Technical): Techniques like K-means clustering and principal components analysis (PCA) are powerful ML tools, but they often require massive datasets and can lack interpretability. Justifying a segmentation strategy by explaining eigenvalues or complex algorithms to a business leadership team can create friction and slow adoption.

    The Gray Area (Subjective): Conversely, creating detailed customer personas is highly intuitive but often subjective. Because they are based on opinion or aspiration, the segments can be endlessly debated, leading to unclear targeting and weak tactical execution.


    The Power and Simplicity of RFM

    In my experience, one of the most elegant, simple and powerful segmentation schemes is based on RFM scoring, which allows the marketer to:

    • Maximize ROI: Strategically allocate sales and marketing resources to the highest-potential customers.
    • Drive Growth: Target active, high-value accounts for cross-sell and up-sell campaigns to increase purchase frequency.
    • Minimize Churn: Increase retention and proactively intervene with the most valuable customers who show signs of drifting away.
    • Improve Acquisition: Identify and target “lookalike” prospects who share the profiles of your best customers.
    • Inform Value Metrics: Serve as a core input for calculating and improving Customer Lifetime Value (CLV).

    In its simplest form, RFM (Recency, Frequency and Monetary Value) only requires customer purchase transaction data and is both statistically significant in predicting future purchases (on its own and when nested within other purchase-likelihood models) and easily understood by the business.

    As Thomas Miller describes in his textbook, Marketing Data Science:

    “Direct and database marketers build models for predicting who will buy in response to marketing promotions. Traditional models, or what are known as RFM models, consider the recency (date of most recent purchase), frequency (number of purchases), and monetary value (sales revenue) of previous purchases. More complicated models utilize a variety of explanatory variables relating to recency, frequency, monetary value, and customer demographics.”

    Miller, Thomas W. Marketing Data Science. Pearson Education LTD., 2015.

    Methodology: From Data to Actionable Segments

    For each of the three categories, the customer is given a score on a scale of 1 – 5 where one is the lowest score and five is the highest or best score.  If a scored list targeting the highest potential account is desired, add the scores together to get a total score between one and 15.

    By assigning these scores, you effectively break your population into quintiles (20% groups) for each dimension. For simple prioritization, you can combine the scores for a total RFM score between 3 and 15, creating a ranked list of highest-potential accounts.

    Patterns in the purchase data can be used as the foundation of personas (additional attributes can be layered on for profiling) and I have found that the scores have always been influential (statistically significant) as inputs into purchase likelihood ML models for scoring accounts. The following are some segments that are typically derived from the scores in the RFM Scores table (above):

    • Champions have the highest score in all three categories (RFM) and highest total scores.
    • New or Highest Potential Customers have high recency and monetary value scores, but have just started purchasing, and so their frequency score will be in the lower quintiles.
    • Past or Churned customers have high monetary and frequency scores, but very low recency scores (i.e. bottom 20%).
    • Additional segments can be created for average customers (to benchmark), new prospects, or totally lost customers.

    Summary

    I’ll return to RFM segmentation later for use in precision targeting, segmentation for relevant marketing communication and media mix optimization.  As I mentioned at the beginning, this technique was pioneered by early direct mail marketers using spreadsheet analysis, and although we can build Python, SQL and R scripts to run it now, the fundamentals remain the same – and if the reader wants to investigate further just Google “RFM reference material” and a lot of material is widely available in the form of academic papers and videos.


    Citations

    https://www.investopedia.com/terms/r/rfm-recency-frequency-monetary-value.asp#:~:text=The%20recency%2C%20frequency%2C%20monetary%20value%20(RFM)%20model%20is,%2C%20the%20better%20the%20result).

    https://mailchimp.com/resources/rfm-analysis/#:~:text=RFM%2C%20also%20known%20as%20RFM,monetary%20value%20of%20a%20transaction.