Coupon Redemption Rate Analysis¶

This notebook analyzes coupon usage patterns to identify products with the highest redemption rates.

Setup and Data Loading¶

In [1]:

Copied!





import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from completejourney_py import get_data

# Set plotting style
plt.style.use('default')
sns.set_palette("Set2")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from completejourney_py import get_data

# Set plotting style
plt.style.use('default')
sns.set_palette("Set2")

In [2]:

Copied!





# Load all coupon-related datasets
data = get_data(["coupons", "coupon_redemptions", "products", "campaigns", "campaign_descriptions"])
coupons = data["coupons"]
coupon_redemptions = data["coupon_redemptions"]
products = data["products"]
campaigns = data["campaigns"]
campaign_descriptions = data["campaign_descriptions"]

print(f"Coupons available: {len(coupons):,}")
print(f"Coupon redemptions: {len(coupon_redemptions):,}")
print(f"Products: {len(products):,}")
# Load all coupon-related datasets
data = get_data(["coupons", "coupon_redemptions", "products", "campaigns", "campaign_descriptions"])
coupons = data["coupons"]
coupon_redemptions = data["coupon_redemptions"]
products = data["products"]
campaigns = data["campaigns"]
campaign_descriptions = data["campaign_descriptions"]

print(f"Coupons available: {len(coupons):,}")
print(f"Coupon redemptions: {len(coupon_redemptions):,}")
print(f"Products: {len(products):,}")

Coupons available: 116,204
Coupon redemptions: 2,102
Products: 92,331

Data Exploration: Understanding Coupon Structure¶

In [3]:

Copied!





# Examine coupon data structure
print("Coupon Data Sample:")
print(coupons.head())
print("\nCoupon Redemption Data Sample:")
print(coupon_redemptions.head())

# Check for unique coupons and products
print(f"\nUnique coupons: {coupons['coupon_upc'].nunique():,}")
print(f"Unique products with coupons: {coupons['product_id'].nunique():,}")
print(f"Unique campaigns with coupons: {coupons['campaign_id'].nunique():,}")

# Show available columns for reference
print(f"\nCoupons columns: {list(coupons.columns)}")
print(f"Coupon redemptions columns: {list(coupon_redemptions.columns)}")
print(f"Products columns: {list(products.columns)}")
# Examine coupon data structure
print("Coupon Data Sample:")
print(coupons.head())
print("\nCoupon Redemption Data Sample:")
print(coupon_redemptions.head())

# Check for unique coupons and products
print(f"\nUnique coupons: {coupons['coupon_upc'].nunique():,}")
print(f"Unique products with coupons: {coupons['product_id'].nunique():,}")
print(f"Unique campaigns with coupons: {coupons['campaign_id'].nunique():,}")

# Show available columns for reference
print(f"\nCoupons columns: {list(coupons.columns)}")
print(f"Coupon redemptions columns: {list(coupon_redemptions.columns)}")
print(f"Products columns: {list(products.columns)}")

Coupon Data Sample:
    coupon_upc  product_id  campaign_id
0  10000085207     9676830           26
1  10000085207     9676943           26
2  10000085207     9676944           26
3  10000085207     9676947           26
4  10000085207     9677008           26

Coupon Redemption Data Sample:
   household_id   coupon_upc  campaign_id redemption_date
0          1029  51380041013           26      2017-01-01
1          1029  51380041313           26      2017-01-01
2           165  53377610033           26      2017-01-03
3           712  51380041013           26      2017-01-07
4           712  54300016033           26      2017-01-07

Unique coupons: 981
Unique products with coupons: 41,857
Unique campaigns with coupons: 27

Coupons columns: ['coupon_upc', 'product_id', 'campaign_id']
Coupon redemptions columns: ['household_id', 'coupon_upc', 'campaign_id', 'redemption_date']
Products columns: ['product_id', 'manufacturer_id', 'department', 'brand', 'product_category', 'product_type', 'package_size']

Calculate Coupon Redemption Rates by Product¶

In [4]:

Copied!





# Let's investigate the data structure more carefully to understand the duplication

print("=== INVESTIGATING COUPON DATA STRUCTURE ===")

# Look at a specific example product with high redemption rate
sample_product = 1029743  # One of the products from your screenshot

print(f"\nAnalyzing Product {sample_product}:")

# Check coupons table for this product
product_coupons = coupons[coupons['product_id'] == sample_product]
print(f"Coupons table entries for product {sample_product}:")
print(product_coupons)
print(f"Total coupon records: {len(product_coupons)}")
print(f"Unique coupon UPCs: {product_coupons['coupon_upc'].nunique()}")

# Check redemptions for this product's coupons
product_coupon_upcs = product_coupons['coupon_upc'].tolist()
product_redemptions = coupon_redemptions[coupon_redemptions['coupon_upc'].isin(product_coupon_upcs)]
print(f"\nRedemptions for product {sample_product} coupons:")
print(f"Total redemption records: {len(product_redemptions)}")
print(f"Unique coupon UPCs redeemed: {product_redemptions['coupon_upc'].nunique()}")
print(f"Unique households redeeming: {product_redemptions['household_id'].nunique()}")

if len(product_redemptions) > 0:
    print("\nSample redemption records:")
    print(product_redemptions.head())

# Check if one coupon UPC can be used for multiple products
print(f"\n=== CHECKING COUPON UPC RELATIONSHIPS ===")
sample_coupon_upc = product_coupons['coupon_upc'].iloc[0] if len(product_coupons) > 0 else None
if sample_coupon_upc:
    coupon_products = coupons[coupons['coupon_upc'] == sample_coupon_upc]
    print(f"Coupon UPC {sample_coupon_upc} is valid for {len(coupon_products)} products:")
    print(coupon_products[['product_id', 'campaign_id']].head(10))

# Check the join we're doing
print(f"\n=== CHECKING OUR JOIN LOGIC ===")
joined_redemptions = (
    coupon_redemptions
    .merge(coupons[['coupon_upc', 'product_id']], on='coupon_upc')
)
print(f"Original redemptions: {len(coupon_redemptions)}")
print(f"After joining with coupons: {len(joined_redemptions)}")
print(f"Duplication factor: {len(joined_redemptions) / len(coupon_redemptions):.2f}x")

# Show the issue
if len(joined_redemptions) > len(coupon_redemptions):
    print("\n⚠️  JOIN IS CREATING DUPLICATES!")
    print("This happens because one coupon UPC can be redeemed for multiple products")
    
# Show correct approach
print(f"\n=== CORRECT APPROACH ===")
print("The issue: One coupon redemption gets counted multiple times")
print("if that coupon is valid for multiple products.")
print("\nWe need to decide:")
print("1. Count redemption per specific product intended?")
print("2. Or analyze overall coupon redemption regardless of product?")

# Let's see the distribution of coupons per UPC
coupons_per_upc = coupons.groupby('coupon_upc')['product_id'].nunique()
print(f"\nCoupon UPC to Product Relationship:")
print(f"Coupons valid for 1 product: {(coupons_per_upc == 1).sum()}")
print(f"Coupons valid for multiple products: {(coupons_per_upc > 1).sum()}")
print(f"Max products per coupon: {coupons_per_upc.max()}")
print(f"Average products per coupon: {coupons_per_upc.mean():.1f}")
# Let's investigate the data structure more carefully to understand the duplication

print("=== INVESTIGATING COUPON DATA STRUCTURE ===")

# Look at a specific example product with high redemption rate
sample_product = 1029743  # One of the products from your screenshot

print(f"\nAnalyzing Product {sample_product}:")

# Check coupons table for this product
product_coupons = coupons[coupons['product_id'] == sample_product]
print(f"Coupons table entries for product {sample_product}:")
print(product_coupons)
print(f"Total coupon records: {len(product_coupons)}")
print(f"Unique coupon UPCs: {product_coupons['coupon_upc'].nunique()}")

# Check redemptions for this product's coupons
product_coupon_upcs = product_coupons['coupon_upc'].tolist()
product_redemptions = coupon_redemptions[coupon_redemptions['coupon_upc'].isin(product_coupon_upcs)]
print(f"\nRedemptions for product {sample_product} coupons:")
print(f"Total redemption records: {len(product_redemptions)}")
print(f"Unique coupon UPCs redeemed: {product_redemptions['coupon_upc'].nunique()}")
print(f"Unique households redeeming: {product_redemptions['household_id'].nunique()}")

if len(product_redemptions) > 0:
    print("\nSample redemption records:")
    print(product_redemptions.head())

# Check if one coupon UPC can be used for multiple products
print(f"\n=== CHECKING COUPON UPC RELATIONSHIPS ===")
sample_coupon_upc = product_coupons['coupon_upc'].iloc[0] if len(product_coupons) > 0 else None
if sample_coupon_upc:
    coupon_products = coupons[coupons['coupon_upc'] == sample_coupon_upc]
    print(f"Coupon UPC {sample_coupon_upc} is valid for {len(coupon_products)} products:")
    print(coupon_products[['product_id', 'campaign_id']].head(10))

# Check the join we're doing
print(f"\n=== CHECKING OUR JOIN LOGIC ===")
joined_redemptions = (
    coupon_redemptions
    .merge(coupons[['coupon_upc', 'product_id']], on='coupon_upc')
)
print(f"Original redemptions: {len(coupon_redemptions)}")
print(f"After joining with coupons: {len(joined_redemptions)}")
print(f"Duplication factor: {len(joined_redemptions) / len(coupon_redemptions):.2f}x")

# Show the issue
if len(joined_redemptions) > len(coupon_redemptions):
    print("\n⚠️  JOIN IS CREATING DUPLICATES!")
    print("This happens because one coupon UPC can be redeemed for multiple products")
    
# Show correct approach
print(f"\n=== CORRECT APPROACH ===")
print("The issue: One coupon redemption gets counted multiple times")
print("if that coupon is valid for multiple products.")
print("\nWe need to decide:")
print("1. Count redemption per specific product intended?")
print("2. Or analyze overall coupon redemption regardless of product?")

# Let's see the distribution of coupons per UPC
coupons_per_upc = coupons.groupby('coupon_upc')['product_id'].nunique()
print(f"\nCoupon UPC to Product Relationship:")
print(f"Coupons valid for 1 product: {(coupons_per_upc == 1).sum()}")
print(f"Coupons valid for multiple products: {(coupons_per_upc > 1).sum()}")
print(f"Max products per coupon: {coupons_per_upc.max()}")
print(f"Average products per coupon: {coupons_per_upc.mean():.1f}")

=== INVESTIGATING COUPON DATA STRUCTURE ===

Analyzing Product 1029743:
Coupons table entries for product 1029743:
        coupon_upc  product_id  campaign_id
12468  10000085364     1029743            8
18999  10000085426     1029743           13
52826  10000085476     1029743           18
Total coupon records: 3
Unique coupon UPCs: 3

Redemptions for product 1029743 coupons:
Total redemption records: 80
Unique coupon UPCs redeemed: 3
Unique households redeeming: 68

Sample redemption records:
     household_id   coupon_upc  campaign_id redemption_date
138           354  10000085364            8      2017-05-11
140           397  10000085364            8      2017-05-11
142           573  10000085364            8      2017-05-11
163           239  10000085364            8      2017-05-12
167          1098  10000085364            8      2017-05-12

=== CHECKING COUPON UPC RELATIONSHIPS ===
Coupon UPC 10000085364 is valid for 1881 products:
       product_id  campaign_id
12411     1001391            8
12412     1003441            8
12413     1003616            8
12414     1003750            8
12415     1004001            8
12416     1004480            8
12417     1004703            8
12418      100512            8
12419     1005172            8
12420     1005274            8

=== CHECKING OUR JOIN LOGIC ===
Original redemptions: 2102
After joining with coupons: 2265375
Duplication factor: 1077.72x

⚠️  JOIN IS CREATING DUPLICATES!
This happens because one coupon UPC can be redeemed for multiple products

=== CORRECT APPROACH ===
The issue: One coupon redemption gets counted multiple times
if that coupon is valid for multiple products.

We need to decide:
1. Count redemption per specific product intended?
2. Or analyze overall coupon redemption regardless of product?

Coupon UPC to Product Relationship:
Coupons valid for 1 product: 39
Coupons valid for multiple products: 942
Max products per coupon: 14477
Average products per coupon: 107.3

Top 10 Products by Redemption Rate¶

In [5]:

Copied!





# CORRECTED APPROACH: Handle the many-to-many coupon-product relationship properly

print("=== CORRECTED COUPON REDEMPTION ANALYSIS ===")
print("Fixed: Unrealistic redemption rates due to coupon-product duplication")

# The key insight: We need to analyze coupon UPC redemption rates, not product-specific rates
# because a single coupon redemption might be valid for multiple products

print("\nApproach 1: Analyze by Coupon UPC (most accurate)")
print("-" * 50)

# Calculate redemption rate by coupon UPC
coupon_upc_analysis = (
    coupons
    .groupby('coupon_upc')
    .agg({
        'product_id': 'nunique',  # How many products this coupon covers
        'campaign_id': 'first'    # Which campaign
    })
    .reset_index()
    .rename(columns={'product_id': 'products_covered'})
)

# Count redemptions per coupon UPC
redemptions_per_coupon = (
    coupon_redemptions
    .groupby('coupon_upc')
    .size()
    .reset_index(name='total_redemptions')
)

# Calculate coupon UPC redemption rates
coupon_rates = (
    coupon_upc_analysis
    .merge(redemptions_per_coupon, on='coupon_upc', how='left')
    .fillna(0)
)

# For realistic redemption rates, we need to properly understand distribution
# Since we can't determine exact distribution quantities, we'll focus on actual redemption counts
print("Top 10 Coupon UPCs by Total Redemptions:")
top_coupons = coupon_rates.nlargest(10, 'total_redemptions')
n = 0

for i, row in top_coupons.iterrows():
    n += 1
    redemptions = int(row['total_redemptions'])
    products = int(row['products_covered'])
    campaign = int(row['campaign_id'])
    print(f"{n:2d}. Coupon {row['coupon_upc']}: {redemptions} redemptions")
    print(f"    Covers {products} products | Campaign {campaign}")

print(f"\nOverall Coupon Performance:")
print(f"- Total unique coupon UPCs: {len(coupon_rates):,}")
print(f"- Coupons with redemptions: {len(coupon_rates[coupon_rates['total_redemptions'] > 0]):,}")
print(f"- Average redemptions per coupon: {coupon_rates['total_redemptions'].mean():.2f}")
print(f"- Total redemptions: {coupon_rates['total_redemptions'].sum():,}")

# Calculate realistic redemption statistics
redeemed_coupons = len(coupon_rates[coupon_rates['total_redemptions'] > 0])
total_coupons = len(coupon_rates)
coupon_redemption_rate = (redeemed_coupons / total_coupons) * 100

print(f"- Coupon UPC redemption rate: {coupon_redemption_rate:.1f}% ({redeemed_coupons} of {total_coupons} coupon types redeemed)")

print("\n" + "="*70)
print("\nApproach 2: Product-Level Analysis (strategic insights)")
print("-" * 50)

# Get all redeemed coupon UPCs
redeemed_coupon_upcs = coupon_redemptions['coupon_upc'].unique()

# Find products that appear in redeemed coupons
products_in_redeemed_coupons = (
    coupons[coupons['coupon_upc'].isin(redeemed_coupon_upcs)]
    .groupby('product_id')
    .agg({
        'coupon_upc': 'nunique',  # Number of different redeemed coupon types for this product
        'campaign_id': 'nunique'   # Number of different campaigns
    })
    .reset_index()
    .rename(columns={'coupon_upc': 'redeemed_coupon_types', 'campaign_id': 'campaigns_involved'})
)

# Get total coupon types available for each product  
products_total_coupons = (
    coupons
    .groupby('product_id')
    .agg({
        'coupon_upc': 'nunique',
        'campaign_id': 'nunique'
    })
    .reset_index()
    .rename(columns={'coupon_upc': 'total_coupon_types', 'campaign_id': 'total_campaigns'})
)

# Combine data
product_coupon_analysis = (
    products_total_coupons
    .merge(products_in_redeemed_coupons, on='product_id', how='left')
    .fillna(0)
)

# Calculate "coupon success rate" - what percentage of this product's coupons were redeemed
product_coupon_analysis['coupon_success_rate'] = (
    product_coupon_analysis['redeemed_coupon_types'] / 
    product_coupon_analysis['total_coupon_types'] * 100
)

# Filter for products with meaningful coupon presence (at least 3 coupon types)
meaningful_products = product_coupon_analysis[
    product_coupon_analysis['total_coupon_types'] >= 3
]

print(f"Products with coupons: {len(product_coupon_analysis):,}")
print(f"Products with ≥3 coupon types: {len(meaningful_products):,}")

# Analyze the distribution
success_rate_distribution = meaningful_products['coupon_success_rate'].describe()
print(f"\nCoupon Success Rate Distribution:")
print(f"- Mean: {success_rate_distribution['mean']:.1f}%")
print(f"- Median: {success_rate_distribution['50%']:.1f}%")
print(f"- Standard deviation: {success_rate_distribution['std']:.1f}%")
print(f"- 25th percentile: {success_rate_distribution['25%']:.1f}%")
print(f"- 75th percentile: {success_rate_distribution['75%']:.1f}%")

# Categorize products by performance
high_performers = meaningful_products[meaningful_products['coupon_success_rate'] == 100]
good_performers = meaningful_products[
    (meaningful_products['coupon_success_rate'] >= 75) & 
    (meaningful_products['coupon_success_rate'] < 100)
]
average_performers = meaningful_products[
    (meaningful_products['coupon_success_rate'] >= 25) & 
    (meaningful_products['coupon_success_rate'] < 75)
]
underperformers = meaningful_products[meaningful_products['coupon_success_rate'] < 25]

print(f"\nPerformance Categories:")
print(f"- High performers (100% success): {len(high_performers):,} products")
print(f"- Good performers (75-99% success): {len(good_performers):,} products")
print(f"- Average performers (25-74% success): {len(average_performers):,} products")
print(f"- Underperformers (<25% success): {len(underperformers):,} products")

# Show improvement opportunities
if len(underperformers) > 0:
    print(f"\nImprovement Opportunities (Underperforming Products):")
    top_underperformers = underperformers.nlargest(10, 'total_coupon_types')
    for i, row in top_underperformers.iterrows():
        product_id = int(row['product_id'])
        redeemed = int(row['redeemed_coupon_types'])
        total = int(row['total_coupon_types'])
        success_rate = row['coupon_success_rate']
        print(f"- Product {product_id}: {success_rate:.1f}% success ({redeemed}/{total} types)")

# Show products with mixed results for strategic focus
if len(average_performers) > 0:
    print(f"\nStrategic Focus Candidates (Average Performers):")
    top_average = average_performers.nlargest(10, 'total_coupon_types')
    for i, row in top_average.iterrows():
        product_id = int(row['product_id'])
        redeemed = int(row['redeemed_coupon_types'])
        total = int(row['total_coupon_types'])
        success_rate = row['coupon_success_rate']
        print(f"- Product {product_id}: {success_rate:.1f}% success ({redeemed}/{total} types)")

# Show a few high performers for reference
if len(high_performers) > 0:
    print(f"\nTop High Performers (100% Success - Reference):")
    sample_high_performers = high_performers.nlargest(5, 'total_coupon_types')
    for i, row in sample_high_performers.iterrows():
        product_id = int(row['product_id'])
        redeemed = int(row['redeemed_coupon_types'])
        total = int(row['total_coupon_types'])
        success_rate = row['coupon_success_rate']
        print(f"- Product {product_id}: {success_rate:.1f}% success ({redeemed}/{total} types)")

print(f"\nStrategic Recommendation:")
print(f"- Focus on the {len(average_performers + underperformers):,} products with <75% success rates")
print(f"- These represent opportunities to improve coupon campaign effectiveness")
# CORRECTED APPROACH: Handle the many-to-many coupon-product relationship properly

print("=== CORRECTED COUPON REDEMPTION ANALYSIS ===")
print("Fixed: Unrealistic redemption rates due to coupon-product duplication")

# The key insight: We need to analyze coupon UPC redemption rates, not product-specific rates
# because a single coupon redemption might be valid for multiple products

print("\nApproach 1: Analyze by Coupon UPC (most accurate)")
print("-" * 50)

# Calculate redemption rate by coupon UPC
coupon_upc_analysis = (
    coupons
    .groupby('coupon_upc')
    .agg({
        'product_id': 'nunique',  # How many products this coupon covers
        'campaign_id': 'first'    # Which campaign
    })
    .reset_index()
    .rename(columns={'product_id': 'products_covered'})
)

# Count redemptions per coupon UPC
redemptions_per_coupon = (
    coupon_redemptions
    .groupby('coupon_upc')
    .size()
    .reset_index(name='total_redemptions')
)

# Calculate coupon UPC redemption rates
coupon_rates = (
    coupon_upc_analysis
    .merge(redemptions_per_coupon, on='coupon_upc', how='left')
    .fillna(0)
)

# For realistic redemption rates, we need to properly understand distribution
# Since we can't determine exact distribution quantities, we'll focus on actual redemption counts
print("Top 10 Coupon UPCs by Total Redemptions:")
top_coupons = coupon_rates.nlargest(10, 'total_redemptions')
n = 0

for i, row in top_coupons.iterrows():
    n += 1
    redemptions = int(row['total_redemptions'])
    products = int(row['products_covered'])
    campaign = int(row['campaign_id'])
    print(f"{n:2d}. Coupon {row['coupon_upc']}: {redemptions} redemptions")
    print(f"    Covers {products} products | Campaign {campaign}")

print(f"\nOverall Coupon Performance:")
print(f"- Total unique coupon UPCs: {len(coupon_rates):,}")
print(f"- Coupons with redemptions: {len(coupon_rates[coupon_rates['total_redemptions'] > 0]):,}")
print(f"- Average redemptions per coupon: {coupon_rates['total_redemptions'].mean():.2f}")
print(f"- Total redemptions: {coupon_rates['total_redemptions'].sum():,}")

# Calculate realistic redemption statistics
redeemed_coupons = len(coupon_rates[coupon_rates['total_redemptions'] > 0])
total_coupons = len(coupon_rates)
coupon_redemption_rate = (redeemed_coupons / total_coupons) * 100

print(f"- Coupon UPC redemption rate: {coupon_redemption_rate:.1f}% ({redeemed_coupons} of {total_coupons} coupon types redeemed)")

print("\n" + "="*70)
print("\nApproach 2: Product-Level Analysis (strategic insights)")
print("-" * 50)

# Get all redeemed coupon UPCs
redeemed_coupon_upcs = coupon_redemptions['coupon_upc'].unique()

# Find products that appear in redeemed coupons
products_in_redeemed_coupons = (
    coupons[coupons['coupon_upc'].isin(redeemed_coupon_upcs)]
    .groupby('product_id')
    .agg({
        'coupon_upc': 'nunique',  # Number of different redeemed coupon types for this product
        'campaign_id': 'nunique'   # Number of different campaigns
    })
    .reset_index()
    .rename(columns={'coupon_upc': 'redeemed_coupon_types', 'campaign_id': 'campaigns_involved'})
)

# Get total coupon types available for each product  
products_total_coupons = (
    coupons
    .groupby('product_id')
    .agg({
        'coupon_upc': 'nunique',
        'campaign_id': 'nunique'
    })
    .reset_index()
    .rename(columns={'coupon_upc': 'total_coupon_types', 'campaign_id': 'total_campaigns'})
)

# Combine data
product_coupon_analysis = (
    products_total_coupons
    .merge(products_in_redeemed_coupons, on='product_id', how='left')
    .fillna(0)
)

# Calculate "coupon success rate" - what percentage of this product's coupons were redeemed
product_coupon_analysis['coupon_success_rate'] = (
    product_coupon_analysis['redeemed_coupon_types'] / 
    product_coupon_analysis['total_coupon_types'] * 100
)

# Filter for products with meaningful coupon presence (at least 3 coupon types)
meaningful_products = product_coupon_analysis[
    product_coupon_analysis['total_coupon_types'] >= 3
]

print(f"Products with coupons: {len(product_coupon_analysis):,}")
print(f"Products with ≥3 coupon types: {len(meaningful_products):,}")

# Analyze the distribution
success_rate_distribution = meaningful_products['coupon_success_rate'].describe()
print(f"\nCoupon Success Rate Distribution:")
print(f"- Mean: {success_rate_distribution['mean']:.1f}%")
print(f"- Median: {success_rate_distribution['50%']:.1f}%")
print(f"- Standard deviation: {success_rate_distribution['std']:.1f}%")
print(f"- 25th percentile: {success_rate_distribution['25%']:.1f}%")
print(f"- 75th percentile: {success_rate_distribution['75%']:.1f}%")

# Categorize products by performance
high_performers = meaningful_products[meaningful_products['coupon_success_rate'] == 100]
good_performers = meaningful_products[
    (meaningful_products['coupon_success_rate'] >= 75) & 
    (meaningful_products['coupon_success_rate'] < 100)
]
average_performers = meaningful_products[
    (meaningful_products['coupon_success_rate'] >= 25) & 
    (meaningful_products['coupon_success_rate'] < 75)
]
underperformers = meaningful_products[meaningful_products['coupon_success_rate'] < 25]

print(f"\nPerformance Categories:")
print(f"- High performers (100% success): {len(high_performers):,} products")
print(f"- Good performers (75-99% success): {len(good_performers):,} products")
print(f"- Average performers (25-74% success): {len(average_performers):,} products")
print(f"- Underperformers (<25% success): {len(underperformers):,} products")

# Show improvement opportunities
if len(underperformers) > 0:
    print(f"\nImprovement Opportunities (Underperforming Products):")
    top_underperformers = underperformers.nlargest(10, 'total_coupon_types')
    for i, row in top_underperformers.iterrows():
        product_id = int(row['product_id'])
        redeemed = int(row['redeemed_coupon_types'])
        total = int(row['total_coupon_types'])
        success_rate = row['coupon_success_rate']
        print(f"- Product {product_id}: {success_rate:.1f}% success ({redeemed}/{total} types)")

# Show products with mixed results for strategic focus
if len(average_performers) > 0:
    print(f"\nStrategic Focus Candidates (Average Performers):")
    top_average = average_performers.nlargest(10, 'total_coupon_types')
    for i, row in top_average.iterrows():
        product_id = int(row['product_id'])
        redeemed = int(row['redeemed_coupon_types'])
        total = int(row['total_coupon_types'])
        success_rate = row['coupon_success_rate']
        print(f"- Product {product_id}: {success_rate:.1f}% success ({redeemed}/{total} types)")

# Show a few high performers for reference
if len(high_performers) > 0:
    print(f"\nTop High Performers (100% Success - Reference):")
    sample_high_performers = high_performers.nlargest(5, 'total_coupon_types')
    for i, row in sample_high_performers.iterrows():
        product_id = int(row['product_id'])
        redeemed = int(row['redeemed_coupon_types'])
        total = int(row['total_coupon_types'])
        success_rate = row['coupon_success_rate']
        print(f"- Product {product_id}: {success_rate:.1f}% success ({redeemed}/{total} types)")

print(f"\nStrategic Recommendation:")
print(f"- Focus on the {len(average_performers + underperformers):,} products with <75% success rates")
print(f"- These represent opportunities to improve coupon campaign effectiveness")

=== CORRECTED COUPON REDEMPTION ANALYSIS ===
Fixed: Unrealistic redemption rates due to coupon-product duplication

Approach 1: Analyze by Coupon UPC (most accurate)
--------------------------------------------------
Top 10 Coupon UPCs by Total Redemptions:
 1. Coupon 10000085475.0: 63 redemptions
    Covers 4294 products | Campaign 18
 2. Coupon 51600070033.0: 52 redemptions
    Covers 128 products | Campaign 9
 3. Coupon 54154888076.0: 49 redemptions
    Covers 99 products | Campaign 13
 4. Coupon 10000085429.0: 45 redemptions
    Covers 5423 products | Campaign 13
 5. Coupon 10000085363.0: 42 redemptions
    Covers 4538 products | Campaign 8
 6. Coupon 10000085364.0: 39 redemptions
    Covers 1881 products | Campaign 8
 7. Coupon 10000085425.0: 35 redemptions
    Covers 4189 products | Campaign 13
 8. Coupon 10000085479.0: 34 redemptions
    Covers 5474 products | Campaign 18
 9. Coupon 10000085426.0: 28 redemptions
    Covers 3842 products | Campaign 13
10. Coupon 10000085477.0: 28 redemptions
    Covers 5686 products | Campaign 18

Overall Coupon Performance:
- Total unique coupon UPCs: 981
- Coupons with redemptions: 491
- Average redemptions per coupon: 2.14
- Total redemptions: 2,102.0
- Coupon UPC redemption rate: 50.1% (491 of 981 coupon types redeemed)

======================================================================

Approach 2: Product-Level Analysis (strategic insights)
--------------------------------------------------
Products with coupons: 41,857
Products with ≥3 coupon types: 20,270

Coupon Success Rate Distribution:
- Mean: 89.0%
- Median: 100.0%
- Standard deviation: 16.5%
- 25th percentile: 66.7%
- 75th percentile: 100.0%

Performance Categories:
- High performers (100% success): 13,545 products
- Good performers (75-99% success): 1,290 products
- Average performers (25-74% success): 5,384 products
- Underperformers (<25% success): 51 products

Improvement Opportunities (Underperforming Products):
- Product 323018: 0.0% success (0/4 types)
- Product 505074: 0.0% success (0/4 types)
- Product 957736: 0.0% success (0/4 types)
- Product 1389035: 0.0% success (0/4 types)
- Product 1465981: 0.0% success (0/4 types)
- Product 1988684: 0.0% success (0/4 types)
- Product 2182786: 0.0% success (0/4 types)
- Product 8019845: 0.0% success (0/4 types)
- Product 49411: 0.0% success (0/3 types)
- Product 52876: 0.0% success (0/3 types)

Strategic Focus Candidates (Average Performers):
- Product 9368501: 63.6% success (7/11 types)
- Product 66800: 50.0% success (5/10 types)
- Product 73584: 50.0% success (5/10 types)
- Product 85389: 50.0% success (5/10 types)
- Product 90172: 50.0% success (5/10 types)
- Product 92317: 50.0% success (5/10 types)
- Product 98133: 50.0% success (5/10 types)
- Product 101999: 50.0% success (5/10 types)
- Product 134309: 50.0% success (5/10 types)
- Product 764088: 50.0% success (5/10 types)

Top High Performers (100% Success - Reference):
- Product 71938: 100.0% success (7/7 types)
- Product 73540: 100.0% success (7/7 types)
- Product 74503: 100.0% success (7/7 types)
- Product 77341: 100.0% success (7/7 types)
- Product 77360: 100.0% success (7/7 types)

Strategic Recommendation:
- Focus on the 5,435 products with <75% success rates
- These represent opportunities to improve coupon campaign effectiveness

Visualization: Redemption Rates¶

In [6]:

Copied!





# Create strategic visualization focusing on improvement opportunities
if 'meaningful_products' in locals() and len(meaningful_products) > 0:
    # Much larger figure size to take up full screen
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(24, 18))
    
    # 1. Distribution of coupon success rates
    meaningful_products['coupon_success_rate'].hist(bins=20, alpha=0.7, edgecolor='black', ax=ax1)
    ax1.axvline(meaningful_products['coupon_success_rate'].mean(), color='red', linestyle='--', 
               label=f'Mean: {meaningful_products["coupon_success_rate"].mean():.1f}%', linewidth=2)
    ax1.axvline(meaningful_products['coupon_success_rate'].median(), color='orange', linestyle='--', 
               label=f'Median: {meaningful_products["coupon_success_rate"].median():.1f}%', linewidth=2)
    ax1.set_xlabel('Coupon Success Rate (%)', fontsize=14)
    ax1.set_ylabel('Number of Products', fontsize=14)
    ax1.set_title('Distribution of Coupon Success Rates', fontsize=18, fontweight='bold', pad=20)
    ax1.legend(fontsize=12)
    ax1.grid(True, alpha=0.3)
    ax1.tick_params(axis='both', which='major', labelsize=12)
    
    # 2. Performance categories pie chart
    performance_counts = [
        len(high_performers),
        len(good_performers), 
        len(average_performers),
        len(underperformers)
    ]
    performance_labels = ['High (100%)', 'Good (75-99%)', 'Average (25-74%)', 'Under (<25%)']
    colors = ['darkgreen', 'lightgreen', 'orange', 'red']
    
    wedges, texts, autotexts = ax2.pie(performance_counts, labels=performance_labels, autopct='%1.1f%%', 
                                       startangle=90, colors=colors, textprops={'fontsize': 14})
    ax2.set_title('Product Performance Categories', fontsize=18, fontweight='bold', pad=20)
    
    # Make pie chart text larger
    for autotext in autotexts:
        autotext.set_fontsize(14)
        autotext.set_fontweight('bold')
    
    # 3. Top improvement opportunities (underperformers)
    if len(underperformers) > 0:
        top_improvement = underperformers.nlargest(8, 'total_coupon_types')
        labels = [f"Product {int(row['product_id'])}" for _, row in top_improvement.iterrows()]
        bars3 = ax3.barh(range(len(top_improvement)), top_improvement['coupon_success_rate'], 
                        color='red', alpha=0.7, height=0.6)
        
        ax3.set_yticks(range(len(top_improvement)))
        ax3.set_yticklabels(labels, fontsize=12)
        ax3.set_xlabel('Coupon Success Rate (%)', fontsize=14)
        ax3.set_title('Top Improvement Opportunities\n(Underperforming Products)', 
                     fontsize=18, fontweight='bold', pad=20)
        ax3.invert_yaxis()
        ax3.tick_params(axis='both', which='major', labelsize=12)
        
        # Set x-axis limits with proper error handling
        max_rate = top_improvement['coupon_success_rate'].max()
        min_rate = top_improvement['coupon_success_rate'].min()
        if max_rate == min_rate:  # All values are the same
            if max_rate == 0:
                ax3.set_xlim(0, 10)  # Default range for 0% rates
            else:
                ax3.set_xlim(0, max_rate * 1.5)  # Give some padding
        else:
            ax3.set_xlim(0, max_rate * 1.3)  # Normal case with 30% padding
        
        # Add value labels with better positioning
        for i, bar in enumerate(bars3):
            width = bar.get_width()
            coupon_types = top_improvement.iloc[i]['total_coupon_types']
            # Position text appropriately
            if max_rate == min_rate and max_rate == 0:
                text_x = 1  # Fixed position for 0% rates
            else:
                current_xlim = ax3.get_xlim()[1]
                text_x = min(width + current_xlim * 0.02, current_xlim * 0.9)
            ax3.text(text_x, bar.get_y() + bar.get_height()/2, 
                    f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                    fontsize=10, fontweight='bold')
    else:
        # Show analysis of low performers instead
        low_performers = meaningful_products[meaningful_products['coupon_success_rate'] < 50]
        if len(low_performers) > 0:
            top_low = low_performers.nlargest(8, 'total_coupon_types')
            labels = [f"Product {int(row['product_id'])}" for _, row in top_low.iterrows()]
            bars3 = ax3.barh(range(len(top_low)), top_low['coupon_success_rate'], 
                            color='red', alpha=0.7, height=0.6)
            
            ax3.set_yticks(range(len(top_low)))
            ax3.set_yticklabels(labels, fontsize=12)
            ax3.set_xlabel('Coupon Success Rate (%)', fontsize=14)
            ax3.set_title('Top Improvement Opportunities\n(Products <50% Success Rate)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax3.invert_yaxis()
            ax3.tick_params(axis='both', which='major', labelsize=12)
            
            # Set appropriate x-axis limits
            max_rate = top_low['coupon_success_rate'].max()
            ax3.set_xlim(0, max(max_rate * 1.3, 60))
            
            # Add value labels
            for i, bar in enumerate(bars3):
                width = bar.get_width()
                coupon_types = top_low.iloc[i]['total_coupon_types']
                current_xlim = ax3.get_xlim()[1]
                text_x = min(width + current_xlim * 0.02, current_xlim * 0.9)
                ax3.text(text_x, bar.get_y() + bar.get_height()/2, 
                        f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                        fontsize=10, fontweight='bold')
        else:
            ax3.text(0.5, 0.5, 'No products found with\\nlow success rates\\n(All products perform well)', 
                    ha='center', va='center', transform=ax3.transAxes, fontsize=16)
            ax3.set_title('Improvement Opportunities\n(Low Success Rate Products)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax3.set_xlim(0, 100)
            ax3.set_ylim(-1, 1)
    
    # 4. Strategic focus candidates (average performers)
    if len(average_performers) > 0:
        top_strategic = average_performers.nlargest(8, 'total_coupon_types')
        labels = [f"Product {int(row['product_id'])}" for _, row in top_strategic.iterrows()]
        bars4 = ax4.barh(range(len(top_strategic)), top_strategic['coupon_success_rate'], 
                        color='orange', alpha=0.7, height=0.6)
        
        ax4.set_yticks(range(len(top_strategic)))
        ax4.set_yticklabels(labels, fontsize=12)
        ax4.set_xlabel('Coupon Success Rate (%)', fontsize=14)
        ax4.set_title('Strategic Focus Candidates\n(Average Performers)', 
                     fontsize=18, fontweight='bold', pad=20)
        ax4.invert_yaxis()
        ax4.tick_params(axis='both', which='major', labelsize=12)
        
        # Set x-axis limits with proper error handling
        max_rate = top_strategic['coupon_success_rate'].max()
        min_rate = top_strategic['coupon_success_rate'].min()
        if max_rate == min_rate:  # All values are the same
            ax4.set_xlim(max(0, max_rate - 10), max_rate + 10)  # Add padding around the value
        else:
            ax4.set_xlim(0, max_rate * 1.3)  # Normal case with 30% padding
        
        # Add value labels with better positioning
        for i, bar in enumerate(bars4):
            width = bar.get_width()
            coupon_types = top_strategic.iloc[i]['total_coupon_types']
            # Position text appropriately
            current_xlim = ax4.get_xlim()[1]
            text_x = min(width + (current_xlim - ax4.get_xlim()[0]) * 0.02, current_xlim * 0.9)
            ax4.text(text_x, bar.get_y() + bar.get_height()/2, 
                    f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                    fontsize=10, fontweight='bold')
    else:
        # Show products that could be strategic focus
        potential_strategic = meaningful_products[
            (meaningful_products['coupon_success_rate'] >= 50) & 
            (meaningful_products['coupon_success_rate'] < 100)
        ]
        if len(potential_strategic) > 0:
            top_potential = potential_strategic.nlargest(8, 'total_coupon_types')
            labels = [f"Product {int(row['product_id'])}" for _, row in top_potential.iterrows()]
            bars4 = ax4.barh(range(len(top_potential)), top_potential['coupon_success_rate'], 
                            color='orange', alpha=0.7, height=0.6)
            
            ax4.set_yticks(range(len(top_potential)))
            ax4.set_yticklabels(labels, fontsize=12)
            ax4.set_xlabel('Coupon Success Rate (%)', fontsize=14)
            ax4.set_title('Strategic Focus Candidates\n(Good Performers 50-99%)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax4.invert_yaxis()
            ax4.tick_params(axis='both', which='major', labelsize=12)
            
            # Set appropriate x-axis limits
            max_rate = top_potential['coupon_success_rate'].max()
            ax4.set_xlim(0, max_rate * 1.3)
            
            # Add value labels
            for i, bar in enumerate(bars4):
                width = bar.get_width()
                coupon_types = top_potential.iloc[i]['total_coupon_types']
                current_xlim = ax4.get_xlim()[1]
                text_x = min(width + (current_xlim - ax4.get_xlim()[0]) * 0.02, current_xlim * 0.9)
                ax4.text(text_x, bar.get_y() + bar.get_height()/2, 
                        f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                        fontsize=10, fontweight='bold')
        else:
            ax4.text(0.5, 0.5, 'No strategic candidates found\\n(Most products are either\\nhigh or low performers)', 
                    ha='center', va='center', transform=ax4.transAxes, fontsize=16)
            ax4.set_title('Strategic Focus Candidates\n(Average Performers)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax4.set_xlim(0, 100)
            ax4.set_ylim(-1, 1)
    
    # Use subplots_adjust for optimal spacing with larger figure
    plt.subplots_adjust(left=0.06, bottom=0.06, right=0.96, top=0.94, wspace=0.22, hspace=0.28)
    plt.show()
    
    print("\nStrategic Analysis Results:")
    print("- Distribution shows the spread of coupon success rates across products")
    print("- Performance categories identify improvement opportunities")
    print("- Focus on underperformers and average performers for campaign optimization")
    print("- High performers can serve as benchmarks for best practices")
    
else:
    # Fallback to coupon UPC analysis with much larger size
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 10))
    
    # Top coupon UPCs by redemptions
    top_10_coupons = coupon_rates.nlargest(10, 'total_redemptions')
    bars1 = ax1.barh(range(len(top_10_coupons)), top_10_coupons['total_redemptions'], height=0.6)
    
    ax1.set_yticks(range(len(top_10_coupons)))
    ax1.set_yticklabels([f"Coupon {int(upc)}" for upc in top_10_coupons['coupon_upc']], fontsize=12)
    ax1.set_xlabel('Total Redemptions', fontsize=14)
    ax1.set_title('Top 10 Coupon UPCs by Redemptions', fontsize=18, fontweight='bold', pad=20)
    ax1.invert_yaxis()
    ax1.tick_params(axis='both', which='major', labelsize=12)
    
    # Set x-axis limits with proper error handling
    max_redemptions = top_10_coupons['total_redemptions'].max()
    min_redemptions = top_10_coupons['total_redemptions'].min()
    if max_redemptions == min_redemptions:
        ax1.set_xlim(0, max(max_redemptions * 2, 10))  # Ensure some range
    else:
        ax1.set_xlim(0, max_redemptions * 1.2)
    
    # Add value labels with better positioning
    for i, bar in enumerate(bars1):
        width = bar.get_width()
        current_xlim = ax1.get_xlim()[1]
        text_x = min(width + current_xlim * 0.02, current_xlim * 0.9)
        ax1.text(text_x, bar.get_y() + bar.get_height()/2, 
                f'{int(width)}', ha='left', va='center', fontsize=12, fontweight='bold')
    
    # Distribution of redemptions per coupon
    coupon_rates[coupon_rates['total_redemptions'] > 0]['total_redemptions'].hist(
        bins=15, alpha=0.7, edgecolor='black', ax=ax2)
    ax2.set_xlabel('Redemptions per Coupon UPC', fontsize=14)
    ax2.set_ylabel('Number of Coupon UPCs', fontsize=14)
    ax2.set_title('Distribution of Redemptions\n(Redeemed Coupons Only)', 
                 fontsize=18, fontweight='bold', pad=20)
    ax2.grid(True, alpha=0.3)
    ax2.tick_params(axis='both', which='major', labelsize=12)
    
    # Use subplots_adjust for better spacing
    plt.subplots_adjust(left=0.06, bottom=0.08, right=0.96, top=0.90, wspace=0.25)
    plt.show()
    
    print("\nCoupon UPC Analysis Results:")
    print("- Focus on actual redemption counts rather than rates")
    print("- Shows distribution of coupon performance")
    print("- Provides realistic view of coupon effectiveness")
# Create strategic visualization focusing on improvement opportunities
if 'meaningful_products' in locals() and len(meaningful_products) > 0:
    # Much larger figure size to take up full screen
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(24, 18))
    
    # 1. Distribution of coupon success rates
    meaningful_products['coupon_success_rate'].hist(bins=20, alpha=0.7, edgecolor='black', ax=ax1)
    ax1.axvline(meaningful_products['coupon_success_rate'].mean(), color='red', linestyle='--', 
               label=f'Mean: {meaningful_products["coupon_success_rate"].mean():.1f}%', linewidth=2)
    ax1.axvline(meaningful_products['coupon_success_rate'].median(), color='orange', linestyle='--', 
               label=f'Median: {meaningful_products["coupon_success_rate"].median():.1f}%', linewidth=2)
    ax1.set_xlabel('Coupon Success Rate (%)', fontsize=14)
    ax1.set_ylabel('Number of Products', fontsize=14)
    ax1.set_title('Distribution of Coupon Success Rates', fontsize=18, fontweight='bold', pad=20)
    ax1.legend(fontsize=12)
    ax1.grid(True, alpha=0.3)
    ax1.tick_params(axis='both', which='major', labelsize=12)
    
    # 2. Performance categories pie chart
    performance_counts = [
        len(high_performers),
        len(good_performers), 
        len(average_performers),
        len(underperformers)
    ]
    performance_labels = ['High (100%)', 'Good (75-99%)', 'Average (25-74%)', 'Under (<25%)']
    colors = ['darkgreen', 'lightgreen', 'orange', 'red']
    
    wedges, texts, autotexts = ax2.pie(performance_counts, labels=performance_labels, autopct='%1.1f%%', 
                                       startangle=90, colors=colors, textprops={'fontsize': 14})
    ax2.set_title('Product Performance Categories', fontsize=18, fontweight='bold', pad=20)
    
    # Make pie chart text larger
    for autotext in autotexts:
        autotext.set_fontsize(14)
        autotext.set_fontweight('bold')
    
    # 3. Top improvement opportunities (underperformers)
    if len(underperformers) > 0:
        top_improvement = underperformers.nlargest(8, 'total_coupon_types')
        labels = [f"Product {int(row['product_id'])}" for _, row in top_improvement.iterrows()]
        bars3 = ax3.barh(range(len(top_improvement)), top_improvement['coupon_success_rate'], 
                        color='red', alpha=0.7, height=0.6)
        
        ax3.set_yticks(range(len(top_improvement)))
        ax3.set_yticklabels(labels, fontsize=12)
        ax3.set_xlabel('Coupon Success Rate (%)', fontsize=14)
        ax3.set_title('Top Improvement Opportunities\n(Underperforming Products)', 
                     fontsize=18, fontweight='bold', pad=20)
        ax3.invert_yaxis()
        ax3.tick_params(axis='both', which='major', labelsize=12)
        
        # Set x-axis limits with proper error handling
        max_rate = top_improvement['coupon_success_rate'].max()
        min_rate = top_improvement['coupon_success_rate'].min()
        if max_rate == min_rate:  # All values are the same
            if max_rate == 0:
                ax3.set_xlim(0, 10)  # Default range for 0% rates
            else:
                ax3.set_xlim(0, max_rate * 1.5)  # Give some padding
        else:
            ax3.set_xlim(0, max_rate * 1.3)  # Normal case with 30% padding
        
        # Add value labels with better positioning
        for i, bar in enumerate(bars3):
            width = bar.get_width()
            coupon_types = top_improvement.iloc[i]['total_coupon_types']
            # Position text appropriately
            if max_rate == min_rate and max_rate == 0:
                text_x = 1  # Fixed position for 0% rates
            else:
                current_xlim = ax3.get_xlim()[1]
                text_x = min(width + current_xlim * 0.02, current_xlim * 0.9)
            ax3.text(text_x, bar.get_y() + bar.get_height()/2, 
                    f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                    fontsize=10, fontweight='bold')
    else:
        # Show analysis of low performers instead
        low_performers = meaningful_products[meaningful_products['coupon_success_rate'] < 50]
        if len(low_performers) > 0:
            top_low = low_performers.nlargest(8, 'total_coupon_types')
            labels = [f"Product {int(row['product_id'])}" for _, row in top_low.iterrows()]
            bars3 = ax3.barh(range(len(top_low)), top_low['coupon_success_rate'], 
                            color='red', alpha=0.7, height=0.6)
            
            ax3.set_yticks(range(len(top_low)))
            ax3.set_yticklabels(labels, fontsize=12)
            ax3.set_xlabel('Coupon Success Rate (%)', fontsize=14)
            ax3.set_title('Top Improvement Opportunities\n(Products <50% Success Rate)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax3.invert_yaxis()
            ax3.tick_params(axis='both', which='major', labelsize=12)
            
            # Set appropriate x-axis limits
            max_rate = top_low['coupon_success_rate'].max()
            ax3.set_xlim(0, max(max_rate * 1.3, 60))
            
            # Add value labels
            for i, bar in enumerate(bars3):
                width = bar.get_width()
                coupon_types = top_low.iloc[i]['total_coupon_types']
                current_xlim = ax3.get_xlim()[1]
                text_x = min(width + current_xlim * 0.02, current_xlim * 0.9)
                ax3.text(text_x, bar.get_y() + bar.get_height()/2, 
                        f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                        fontsize=10, fontweight='bold')
        else:
            ax3.text(0.5, 0.5, 'No products found with\\nlow success rates\\n(All products perform well)', 
                    ha='center', va='center', transform=ax3.transAxes, fontsize=16)
            ax3.set_title('Improvement Opportunities\n(Low Success Rate Products)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax3.set_xlim(0, 100)
            ax3.set_ylim(-1, 1)
    
    # 4. Strategic focus candidates (average performers)
    if len(average_performers) > 0:
        top_strategic = average_performers.nlargest(8, 'total_coupon_types')
        labels = [f"Product {int(row['product_id'])}" for _, row in top_strategic.iterrows()]
        bars4 = ax4.barh(range(len(top_strategic)), top_strategic['coupon_success_rate'], 
                        color='orange', alpha=0.7, height=0.6)
        
        ax4.set_yticks(range(len(top_strategic)))
        ax4.set_yticklabels(labels, fontsize=12)
        ax4.set_xlabel('Coupon Success Rate (%)', fontsize=14)
        ax4.set_title('Strategic Focus Candidates\n(Average Performers)', 
                     fontsize=18, fontweight='bold', pad=20)
        ax4.invert_yaxis()
        ax4.tick_params(axis='both', which='major', labelsize=12)
        
        # Set x-axis limits with proper error handling
        max_rate = top_strategic['coupon_success_rate'].max()
        min_rate = top_strategic['coupon_success_rate'].min()
        if max_rate == min_rate:  # All values are the same
            ax4.set_xlim(max(0, max_rate - 10), max_rate + 10)  # Add padding around the value
        else:
            ax4.set_xlim(0, max_rate * 1.3)  # Normal case with 30% padding
        
        # Add value labels with better positioning
        for i, bar in enumerate(bars4):
            width = bar.get_width()
            coupon_types = top_strategic.iloc[i]['total_coupon_types']
            # Position text appropriately
            current_xlim = ax4.get_xlim()[1]
            text_x = min(width + (current_xlim - ax4.get_xlim()[0]) * 0.02, current_xlim * 0.9)
            ax4.text(text_x, bar.get_y() + bar.get_height()/2, 
                    f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                    fontsize=10, fontweight='bold')
    else:
        # Show products that could be strategic focus
        potential_strategic = meaningful_products[
            (meaningful_products['coupon_success_rate'] >= 50) & 
            (meaningful_products['coupon_success_rate'] < 100)
        ]
        if len(potential_strategic) > 0:
            top_potential = potential_strategic.nlargest(8, 'total_coupon_types')
            labels = [f"Product {int(row['product_id'])}" for _, row in top_potential.iterrows()]
            bars4 = ax4.barh(range(len(top_potential)), top_potential['coupon_success_rate'], 
                            color='orange', alpha=0.7, height=0.6)
            
            ax4.set_yticks(range(len(top_potential)))
            ax4.set_yticklabels(labels, fontsize=12)
            ax4.set_xlabel('Coupon Success Rate (%)', fontsize=14)
            ax4.set_title('Strategic Focus Candidates\n(Good Performers 50-99%)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax4.invert_yaxis()
            ax4.tick_params(axis='both', which='major', labelsize=12)
            
            # Set appropriate x-axis limits
            max_rate = top_potential['coupon_success_rate'].max()
            ax4.set_xlim(0, max_rate * 1.3)
            
            # Add value labels
            for i, bar in enumerate(bars4):
                width = bar.get_width()
                coupon_types = top_potential.iloc[i]['total_coupon_types']
                current_xlim = ax4.get_xlim()[1]
                text_x = min(width + (current_xlim - ax4.get_xlim()[0]) * 0.02, current_xlim * 0.9)
                ax4.text(text_x, bar.get_y() + bar.get_height()/2, 
                        f'{width:.1f}% ({int(coupon_types)})', ha='left', va='center', 
                        fontsize=10, fontweight='bold')
        else:
            ax4.text(0.5, 0.5, 'No strategic candidates found\\n(Most products are either\\nhigh or low performers)', 
                    ha='center', va='center', transform=ax4.transAxes, fontsize=16)
            ax4.set_title('Strategic Focus Candidates\n(Average Performers)', 
                         fontsize=18, fontweight='bold', pad=20)
            ax4.set_xlim(0, 100)
            ax4.set_ylim(-1, 1)
    
    # Use subplots_adjust for optimal spacing with larger figure
    plt.subplots_adjust(left=0.06, bottom=0.06, right=0.96, top=0.94, wspace=0.22, hspace=0.28)
    plt.show()
    
    print("\nStrategic Analysis Results:")
    print("- Distribution shows the spread of coupon success rates across products")
    print("- Performance categories identify improvement opportunities")
    print("- Focus on underperformers and average performers for campaign optimization")
    print("- High performers can serve as benchmarks for best practices")
    
else:
    # Fallback to coupon UPC analysis with much larger size
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 10))
    
    # Top coupon UPCs by redemptions
    top_10_coupons = coupon_rates.nlargest(10, 'total_redemptions')
    bars1 = ax1.barh(range(len(top_10_coupons)), top_10_coupons['total_redemptions'], height=0.6)
    
    ax1.set_yticks(range(len(top_10_coupons)))
    ax1.set_yticklabels([f"Coupon {int(upc)}" for upc in top_10_coupons['coupon_upc']], fontsize=12)
    ax1.set_xlabel('Total Redemptions', fontsize=14)
    ax1.set_title('Top 10 Coupon UPCs by Redemptions', fontsize=18, fontweight='bold', pad=20)
    ax1.invert_yaxis()
    ax1.tick_params(axis='both', which='major', labelsize=12)
    
    # Set x-axis limits with proper error handling
    max_redemptions = top_10_coupons['total_redemptions'].max()
    min_redemptions = top_10_coupons['total_redemptions'].min()
    if max_redemptions == min_redemptions:
        ax1.set_xlim(0, max(max_redemptions * 2, 10))  # Ensure some range
    else:
        ax1.set_xlim(0, max_redemptions * 1.2)
    
    # Add value labels with better positioning
    for i, bar in enumerate(bars1):
        width = bar.get_width()
        current_xlim = ax1.get_xlim()[1]
        text_x = min(width + current_xlim * 0.02, current_xlim * 0.9)
        ax1.text(text_x, bar.get_y() + bar.get_height()/2, 
                f'{int(width)}', ha='left', va='center', fontsize=12, fontweight='bold')
    
    # Distribution of redemptions per coupon
    coupon_rates[coupon_rates['total_redemptions'] > 0]['total_redemptions'].hist(
        bins=15, alpha=0.7, edgecolor='black', ax=ax2)
    ax2.set_xlabel('Redemptions per Coupon UPC', fontsize=14)
    ax2.set_ylabel('Number of Coupon UPCs', fontsize=14)
    ax2.set_title('Distribution of Redemptions\n(Redeemed Coupons Only)', 
                 fontsize=18, fontweight='bold', pad=20)
    ax2.grid(True, alpha=0.3)
    ax2.tick_params(axis='both', which='major', labelsize=12)
    
    # Use subplots_adjust for better spacing
    plt.subplots_adjust(left=0.06, bottom=0.08, right=0.96, top=0.90, wspace=0.25)
    plt.show()
    
    print("\nCoupon UPC Analysis Results:")
    print("- Focus on actual redemption counts rather than rates")
    print("- Shows distribution of coupon performance")
    print("- Provides realistic view of coupon effectiveness")

No description has been provided for this image

Strategic Analysis Results:
- Distribution shows the spread of coupon success rates across products
- Performance categories identify improvement opportunities
- Focus on underperformers and average performers for campaign optimization
- High performers can serve as benchmarks for best practices

Coupon Redemption Distribution Analysis¶

In [7]:

Copied!





# Analyze coupon redemption distribution (corrected approach)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Distribution of redemptions per coupon UPC
coupon_rates[coupon_rates['total_redemptions'] > 0]['total_redemptions'].hist(bins=20, alpha=0.7, edgecolor='black', ax=ax1)
ax1.set_xlabel('Redemptions per Coupon UPC')
ax1.set_ylabel('Number of Coupon UPCs')
ax1.set_title('Distribution of Redemptions per Coupon UPC\n(Redeemed Coupons Only)')
ax1.axvline(coupon_rates[coupon_rates['total_redemptions'] > 0]['total_redemptions'].mean(), 
           color='red', linestyle='--', 
           label=f'Mean: {coupon_rates[coupon_rates["total_redemptions"] > 0]["total_redemptions"].mean():.1f}')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Distribution of products covered per coupon
coupon_rates['products_covered'].hist(bins=30, alpha=0.7, edgecolor='black', ax=ax2)
ax2.set_xlabel('Products Covered per Coupon UPC')
ax2.set_ylabel('Number of Coupon UPCs')
ax2.set_title('Distribution of Products Covered per Coupon UPC')
ax2.axvline(coupon_rates['products_covered'].mean(), 
           color='red', linestyle='--', 
           label=f'Mean: {coupon_rates["products_covered"].mean():.0f}')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary statistics (corrected)
redeemed_coupons_data = coupon_rates[coupon_rates['total_redemptions'] > 0]
print("\nCorrected Coupon Redemption Statistics:")
print(f"- Total coupon UPCs: {len(coupon_rates):,}")
print(f"- Coupon UPCs with redemptions: {len(redeemed_coupons_data):,}")
print(f"- Overall redemption rate: {len(redeemed_coupons_data) / len(coupon_rates) * 100:.1f}%")
print(f"\nAmong redeemed coupons:")
print(f"- Mean redemptions per coupon: {redeemed_coupons_data['total_redemptions'].mean():.2f}")
print(f"- Median redemptions per coupon: {redeemed_coupons_data['total_redemptions'].median():.2f}")
print(f"- Standard deviation: {redeemed_coupons_data['total_redemptions'].std():.2f}")
print(f"- 75th percentile: {redeemed_coupons_data['total_redemptions'].quantile(0.75):.2f}")
print(f"- 95th percentile: {redeemed_coupons_data['total_redemptions'].quantile(0.95):.2f}")
print(f"- Max redemptions for single coupon: {redeemed_coupons_data['total_redemptions'].max():.0f}")

print(f"\nProduct coverage:")
print(f"- Mean products per coupon: {coupon_rates['products_covered'].mean():.0f}")
print(f"- Single-product coupons: {(coupon_rates['products_covered'] == 1).sum():,} ({(coupon_rates['products_covered'] == 1).mean()*100:.1f}%)")
print(f"- Multi-product coupons: {(coupon_rates['products_covered'] > 1).sum():,} ({(coupon_rates['products_covered'] > 1).mean()*100:.1f}%)")

# Check for any remaining issues
high_redemption_coupons = redeemed_coupons_data[redeemed_coupons_data['total_redemptions'] > 50]
if len(high_redemption_coupons) > 0:
    print(f"\nHigh-redemption coupons (>50 redemptions): {len(high_redemption_coupons)}")
    print("Top high-redemption coupons:")
    for _, row in high_redemption_coupons.nlargest(5, 'total_redemptions').iterrows():
        print(f"  Coupon {row['coupon_upc']}: {row['total_redemptions']:.0f} redemptions, {row['products_covered']} products")
else:
    print("\n✅ All redemption counts are within realistic ranges")
# Analyze coupon redemption distribution (corrected approach)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Distribution of redemptions per coupon UPC
coupon_rates[coupon_rates['total_redemptions'] > 0]['total_redemptions'].hist(bins=20, alpha=0.7, edgecolor='black', ax=ax1)
ax1.set_xlabel('Redemptions per Coupon UPC')
ax1.set_ylabel('Number of Coupon UPCs')
ax1.set_title('Distribution of Redemptions per Coupon UPC\n(Redeemed Coupons Only)')
ax1.axvline(coupon_rates[coupon_rates['total_redemptions'] > 0]['total_redemptions'].mean(), 
           color='red', linestyle='--', 
           label=f'Mean: {coupon_rates[coupon_rates["total_redemptions"] > 0]["total_redemptions"].mean():.1f}')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Distribution of products covered per coupon
coupon_rates['products_covered'].hist(bins=30, alpha=0.7, edgecolor='black', ax=ax2)
ax2.set_xlabel('Products Covered per Coupon UPC')
ax2.set_ylabel('Number of Coupon UPCs')
ax2.set_title('Distribution of Products Covered per Coupon UPC')
ax2.axvline(coupon_rates['products_covered'].mean(), 
           color='red', linestyle='--', 
           label=f'Mean: {coupon_rates["products_covered"].mean():.0f}')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary statistics (corrected)
redeemed_coupons_data = coupon_rates[coupon_rates['total_redemptions'] > 0]
print("\nCorrected Coupon Redemption Statistics:")
print(f"- Total coupon UPCs: {len(coupon_rates):,}")
print(f"- Coupon UPCs with redemptions: {len(redeemed_coupons_data):,}")
print(f"- Overall redemption rate: {len(redeemed_coupons_data) / len(coupon_rates) * 100:.1f}%")
print(f"\nAmong redeemed coupons:")
print(f"- Mean redemptions per coupon: {redeemed_coupons_data['total_redemptions'].mean():.2f}")
print(f"- Median redemptions per coupon: {redeemed_coupons_data['total_redemptions'].median():.2f}")
print(f"- Standard deviation: {redeemed_coupons_data['total_redemptions'].std():.2f}")
print(f"- 75th percentile: {redeemed_coupons_data['total_redemptions'].quantile(0.75):.2f}")
print(f"- 95th percentile: {redeemed_coupons_data['total_redemptions'].quantile(0.95):.2f}")
print(f"- Max redemptions for single coupon: {redeemed_coupons_data['total_redemptions'].max():.0f}")

print(f"\nProduct coverage:")
print(f"- Mean products per coupon: {coupon_rates['products_covered'].mean():.0f}")
print(f"- Single-product coupons: {(coupon_rates['products_covered'] == 1).sum():,} ({(coupon_rates['products_covered'] == 1).mean()*100:.1f}%)")
print(f"- Multi-product coupons: {(coupon_rates['products_covered'] > 1).sum():,} ({(coupon_rates['products_covered'] > 1).mean()*100:.1f}%)")

# Check for any remaining issues
high_redemption_coupons = redeemed_coupons_data[redeemed_coupons_data['total_redemptions'] > 50]
if len(high_redemption_coupons) > 0:
    print(f"\nHigh-redemption coupons (>50 redemptions): {len(high_redemption_coupons)}")
    print("Top high-redemption coupons:")
    for _, row in high_redemption_coupons.nlargest(5, 'total_redemptions').iterrows():
        print(f"  Coupon {row['coupon_upc']}: {row['total_redemptions']:.0f} redemptions, {row['products_covered']} products")
else:
    print("\n✅ All redemption counts are within realistic ranges")

Corrected Coupon Redemption Statistics:
- Total coupon UPCs: 981
- Coupon UPCs with redemptions: 491
- Overall redemption rate: 50.1%

Among redeemed coupons:
- Mean redemptions per coupon: 4.28
- Median redemptions per coupon: 2.00
- Standard deviation: 7.02
- 75th percentile: 4.00
- 95th percentile: 16.00
- Max redemptions for single coupon: 63

Product coverage:
- Mean products per coupon: 107
- Single-product coupons: 39 (4.0%)
- Multi-product coupons: 942 (96.0%)

High-redemption coupons (>50 redemptions): 2
Top high-redemption coupons:
  Coupon 10000085475.0: 63 redemptions, 4294.0 products
  Coupon 51600070033.0: 52 redemptions, 128.0 products

Department-Level Coupon Analysis¶

In [8]:

Copied!





# Analyze coupon performance by department using our corrected approach
if 'meaningful_products' in locals() and len(meaningful_products) > 0:
    # Check if products DataFrame is available and valid
    try:
        # Ensure we have the products dataset available
        if 'products' not in locals() or not hasattr(products, 'merge'):
            # Reload products from the original data if needed
            products = data["products"]
        
        print("Coupon Performance by Department:")
        print("=" * 105)
        
        # Merge with departments - add error checking
        dept_products = meaningful_products.merge(
            products[['product_id', 'department']], 
            on='product_id', 
            how='left'
        )
        
        print(f"Successfully merged {len(dept_products)} products with department information")
        
        # Check for missing department information
        missing_dept = dept_products['department'].isna().sum()
        if missing_dept > 0:
            print(f"Warning: {missing_dept} products have missing department information")
        
        # Calculate department-level statistics separately to avoid MultiIndex complications
        dept_stats = []
        
        for dept in dept_products['department'].dropna().unique():
            dept_data = dept_products[dept_products['department'] == dept]
            
            # Calculate statistics for this department
            avg_success = dept_data['coupon_success_rate'].mean()
            median_success = dept_data['coupon_success_rate'].median()
            product_count = len(dept_data)
            total_redeemed = dept_data['redeemed_coupon_types'].sum()
            total_available = dept_data['total_coupon_types'].sum()
            
            # Calculate department success rate
            dept_success_rate = (total_redeemed / total_available * 100) if total_available > 0 else 0
            
            dept_stats.append({
                'department': dept,
                'avg_success_rate': round(avg_success, 1),
                'median_success_rate': round(median_success, 1),
                'product_count': product_count,
                'total_redeemed_types': int(total_redeemed),
                'total_available_types': int(total_available),
                'dept_success_rate': round(dept_success_rate, 1)
            })
        
        # Sort by department success rate
        dept_stats.sort(key=lambda x: x['dept_success_rate'], reverse=True)
        
        # Display results
        print(f"\n{'Department':<20} {'Dept Rate':<10} {'Avg Rate':<10} {'Median':<8} {'Products':<9} {'Redeemed':<10} {'Available':<10}")
        print("-" * 105)

        for stat in dept_stats:
            print(f"{stat['department']:<20} {stat['dept_success_rate']:>8.1f}% {stat['avg_success_rate']:>8.1f}% {stat['median_success_rate']:>6.1f}% {stat['product_count']:>7d} {stat['total_redeemed_types']:>8d} {stat['total_available_types']:>8d}")

        print("\nKey:")
        print("- Dept Rate: Overall success rate for the department (redeemed/available coupon types)")
        print("- Avg Rate: Average product-level success rate in department") 
        print("- Median: Median product-level success rate in department")
        print("- Redeemed: Total coupon types redeemed in department")
        print("- Available: Total coupon types available in department")
        
        print(f"\nSummary:")
        print(f"- Total departments analyzed: {len(dept_stats)}")
        if dept_stats:
            print(f"- Best performing department: {dept_stats[0]['department']} ({dept_stats[0]['dept_success_rate']:.1f}%)")
            print(f"- Departments with >50% success rate: {sum(1 for d in dept_stats if d['dept_success_rate'] > 50)}")
        else:
            print("- No departments found with valid data")
        
    except Exception as e:
        print(f"Error in department analysis: {str(e)}")
        print(f"Error type: {type(e).__name__}")
        print("\nTroubleshooting:")
        print(f"- products type: {type(products) if 'products' in locals() else 'Not found'}")
        print(f"- meaningful_products columns: {list(meaningful_products.columns) if 'meaningful_products' in locals() else 'Not found'}")
        
        # Try to reload the data
        print("\nAttempting to reload products data...")
        try:
            products = data["products"]
            print(f"Successfully reloaded products data: {len(products)} records")
        except Exception as reload_error:
            print(f"Failed to reload products data: {reload_error}")
    
else:
    print("Department analysis requires meaningful_products data.")
    print("This analysis shows how different departments perform with coupon campaigns.")
    print("Run the previous Product-Level Analysis section first to generate the required data.")
# Analyze coupon performance by department using our corrected approach
if 'meaningful_products' in locals() and len(meaningful_products) > 0:
    # Check if products DataFrame is available and valid
    try:
        # Ensure we have the products dataset available
        if 'products' not in locals() or not hasattr(products, 'merge'):
            # Reload products from the original data if needed
            products = data["products"]
        
        print("Coupon Performance by Department:")
        print("=" * 105)
        
        # Merge with departments - add error checking
        dept_products = meaningful_products.merge(
            products[['product_id', 'department']], 
            on='product_id', 
            how='left'
        )
        
        print(f"Successfully merged {len(dept_products)} products with department information")
        
        # Check for missing department information
        missing_dept = dept_products['department'].isna().sum()
        if missing_dept > 0:
            print(f"Warning: {missing_dept} products have missing department information")
        
        # Calculate department-level statistics separately to avoid MultiIndex complications
        dept_stats = []
        
        for dept in dept_products['department'].dropna().unique():
            dept_data = dept_products[dept_products['department'] == dept]
            
            # Calculate statistics for this department
            avg_success = dept_data['coupon_success_rate'].mean()
            median_success = dept_data['coupon_success_rate'].median()
            product_count = len(dept_data)
            total_redeemed = dept_data['redeemed_coupon_types'].sum()
            total_available = dept_data['total_coupon_types'].sum()
            
            # Calculate department success rate
            dept_success_rate = (total_redeemed / total_available * 100) if total_available > 0 else 0
            
            dept_stats.append({
                'department': dept,
                'avg_success_rate': round(avg_success, 1),
                'median_success_rate': round(median_success, 1),
                'product_count': product_count,
                'total_redeemed_types': int(total_redeemed),
                'total_available_types': int(total_available),
                'dept_success_rate': round(dept_success_rate, 1)
            })
        
        # Sort by department success rate
        dept_stats.sort(key=lambda x: x['dept_success_rate'], reverse=True)
        
        # Display results
        print(f"\n{'Department':<20} {'Dept Rate':<10} {'Avg Rate':<10} {'Median':<8} {'Products':<9} {'Redeemed':<10} {'Available':<10}")
        print("-" * 105)

        for stat in dept_stats:
            print(f"{stat['department']:<20} {stat['dept_success_rate']:>8.1f}% {stat['avg_success_rate']:>8.1f}% {stat['median_success_rate']:>6.1f}% {stat['product_count']:>7d} {stat['total_redeemed_types']:>8d} {stat['total_available_types']:>8d}")

        print("\nKey:")
        print("- Dept Rate: Overall success rate for the department (redeemed/available coupon types)")
        print("- Avg Rate: Average product-level success rate in department") 
        print("- Median: Median product-level success rate in department")
        print("- Redeemed: Total coupon types redeemed in department")
        print("- Available: Total coupon types available in department")
        
        print(f"\nSummary:")
        print(f"- Total departments analyzed: {len(dept_stats)}")
        if dept_stats:
            print(f"- Best performing department: {dept_stats[0]['department']} ({dept_stats[0]['dept_success_rate']:.1f}%)")
            print(f"- Departments with >50% success rate: {sum(1 for d in dept_stats if d['dept_success_rate'] > 50)}")
        else:
            print("- No departments found with valid data")
        
    except Exception as e:
        print(f"Error in department analysis: {str(e)}")
        print(f"Error type: {type(e).__name__}")
        print("\nTroubleshooting:")
        print(f"- products type: {type(products) if 'products' in locals() else 'Not found'}")
        print(f"- meaningful_products columns: {list(meaningful_products.columns) if 'meaningful_products' in locals() else 'Not found'}")
        
        # Try to reload the data
        print("\nAttempting to reload products data...")
        try:
            products = data["products"]
            print(f"Successfully reloaded products data: {len(products)} records")
        except Exception as reload_error:
            print(f"Failed to reload products data: {reload_error}")
    
else:
    print("Department analysis requires meaningful_products data.")
    print("This analysis shows how different departments perform with coupon campaigns.")
    print("Run the previous Product-Level Analysis section first to generate the required data.")

Coupon Performance by Department:
=========================================================================================================
Successfully merged 20270 products with department information
Warning: 4 products have missing department information

Department           Dept Rate  Avg Rate   Median   Products  Redeemed   Available 
---------------------------------------------------------------------------------------------------------
SEAFOOD                 100.0%    100.0%  100.0%      15       55       55
MEAT                     99.9%    100.0%  100.0%    2466     7418     7422
MEAT-PCKGD               98.9%     99.1%  100.0%    2325     7318     7401
COSMETICS                98.9%     99.1%  100.0%     787     2357     2384
DELI                     95.8%     95.8%  100.0%       8       23       24
GROCERY                  92.1%     92.8%  100.0%    7446    25053    27201
NUTRITION                84.0%     84.1%  100.0%     565     1449     1726
SEAFOOD-PCKGD            82.8%     91.7%  100.0%     346     1298     1567
DRUG GM                  82.7%     85.0%  100.0%    3043     9227    11153
MISCELLANEOUS            79.1%     79.0%   66.7%      54      129      163
COUPON                   75.0%     78.3%   75.0%       3        9       12
SALAD BAR                68.6%     68.1%   66.7%      23       48       70
TRAVEL & LEISURE         66.7%     66.7%   66.7%      10       20       30
GARDEN CENTER            66.7%     66.7%   66.7%     112      224      336
CHEF SHOPPE              66.7%     66.7%   66.7%       7       14       21
FROZEN GROCERY           66.7%     66.7%   66.7%       4        8       12
PASTRY                   66.7%     66.7%   66.7%       3        6        9
FLORAL                   66.5%     66.6%   66.7%     852     1704     2561
PRODUCE                  65.8%     66.5%   66.7%    2197     4542     6903

Key:
- Dept Rate: Overall success rate for the department (redeemed/available coupon types)
- Avg Rate: Average product-level success rate in department
- Median: Median product-level success rate in department
- Redeemed: Total coupon types redeemed in department
- Available: Total coupon types available in department

Summary:
- Total departments analyzed: 19
- Best performing department: SEAFOOD (100.0%)
- Departments with >50% success rate: 19

Campaign Effectiveness Analysis¶

In [9]:

Copied!





# Analyze coupon redemption by campaign
campaign_coupon_analysis = (
    coupon_redemptions
    .groupby('campaign_id')
    .size()
    .reset_index(name='redemptions')
    .merge(
        coupons.groupby('campaign_id')['coupon_upc'].nunique().reset_index().rename(columns={'coupon_upc': 'total_coupons'}),
        on='campaign_id'
    )
)

campaign_coupon_analysis['campaign_redemption_rate'] = (
    campaign_coupon_analysis['redemptions'] / campaign_coupon_analysis['total_coupons'] * 100
).round(2)

# Add campaign descriptions
campaign_coupon_detailed = campaign_coupon_analysis.merge(
    campaign_descriptions[['campaign_id', 'campaign_type']],
    on='campaign_id',
    how='left'
)

# Sort by redemption rate
campaign_coupon_detailed = campaign_coupon_detailed.sort_values('campaign_redemption_rate', ascending=False)

print("Top 10 Campaigns by Coupon Redemption Rate:")
campaign_coupon_detailed.head(10)
# Analyze coupon redemption by campaign
campaign_coupon_analysis = (
    coupon_redemptions
    .groupby('campaign_id')
    .size()
    .reset_index(name='redemptions')
    .merge(
        coupons.groupby('campaign_id')['coupon_upc'].nunique().reset_index().rename(columns={'coupon_upc': 'total_coupons'}),
        on='campaign_id'
    )
)

campaign_coupon_analysis['campaign_redemption_rate'] = (
    campaign_coupon_analysis['redemptions'] / campaign_coupon_analysis['total_coupons'] * 100
).round(2)

# Add campaign descriptions
campaign_coupon_detailed = campaign_coupon_analysis.merge(
    campaign_descriptions[['campaign_id', 'campaign_type']],
    on='campaign_id',
    how='left'
)

# Sort by redemption rate
campaign_coupon_detailed = campaign_coupon_detailed.sort_values('campaign_redemption_rate', ascending=False)

print("Top 10 Campaigns by Coupon Redemption Rate:")
campaign_coupon_detailed.head(10)

Top 10 Campaigns by Coupon Redemption Rate:

Out[9]:

	campaign_id	redemptions	total_coupons	campaign_redemption_rate	campaign_type
15	16	43	13	330.77	Type B
17	18	653	209	312.44	Type A
12	13	629	207	303.86	Type A
18	19	29	11	263.64	Type B
8	9	43	18	238.89	Type B
16	17	45	19	236.84	Type B
21	22	43	21	204.76	Type B
7	8	372	209	177.99	Type A
11	12	26	15	173.33	Type B
13	14	34	26	130.77	Type C

Visualization: Campaign Performance¶

In [10]:

Copied!





# Plot campaign redemption rates by campaign type
fig, ax = plt.subplots(figsize=(12, 6))

# Box plot by campaign type
campaign_types = campaign_coupon_detailed['campaign_type'].unique()
campaign_type_data = [campaign_coupon_detailed[campaign_coupon_detailed['campaign_type'] == ct]['campaign_redemption_rate'] 
                     for ct in campaign_types]

# Fix the deprecation warning by using tick_labels instead of labels
bp = ax.boxplot(campaign_type_data, tick_labels=campaign_types, patch_artist=True)

# Color the boxes
colors = sns.color_palette("Set2", len(campaign_types))
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)

ax.set_ylabel('Redemption Rate (%)')
ax.set_xlabel('Campaign Type')
ax.set_title('Coupon Redemption Rates by Campaign Type', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary by campaign type
campaign_type_summary = (
    campaign_coupon_detailed
    .groupby('campaign_type')['campaign_redemption_rate']
    .agg(['mean', 'median', 'count'])
    .round(2)
)

print("\nCoupon Redemption Summary by Campaign Type:")
print(campaign_type_summary)
# Plot campaign redemption rates by campaign type
fig, ax = plt.subplots(figsize=(12, 6))

# Box plot by campaign type
campaign_types = campaign_coupon_detailed['campaign_type'].unique()
campaign_type_data = [campaign_coupon_detailed[campaign_coupon_detailed['campaign_type'] == ct]['campaign_redemption_rate'] 
                     for ct in campaign_types]

# Fix the deprecation warning by using tick_labels instead of labels
bp = ax.boxplot(campaign_type_data, tick_labels=campaign_types, patch_artist=True)

# Color the boxes
colors = sns.color_palette("Set2", len(campaign_types))
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)

ax.set_ylabel('Redemption Rate (%)')
ax.set_xlabel('Campaign Type')
ax.set_title('Coupon Redemption Rates by Campaign Type', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary by campaign type
campaign_type_summary = (
    campaign_coupon_detailed
    .groupby('campaign_type')['campaign_redemption_rate']
    .agg(['mean', 'median', 'count'])
    .round(2)
)

print("\nCoupon Redemption Summary by Campaign Type:")
print(campaign_type_summary)

Coupon Redemption Summary by Campaign Type:
                 mean  median  count
campaign_type                       
Type A         207.41  240.92      4
Type B         122.83   91.67     17
Type C          85.66  100.00      5

Key Insights¶

From this corrected coupon redemption analysis, we can observe realistic and actionable patterns based on the actual data results:

Corrected Analysis Methodology¶

Fixed duplication issue: Resolved unrealistic redemption rates (previously >3000%) caused by many-to-many coupon-product relationships where one coupon UPC is valid for multiple products (average 107.3 products per coupon)
Coupon UPC focus: Analyzed actual coupon redemption counts rather than inflated product-level rates
Realistic metrics: All redemption rates and counts are now within believable ranges (50.1% overall coupon UPC redemption rate)

Coupon Performance Insights¶

Overall redemption rate: 50.1% of coupon types are actually redeemed by customers (491 out of 981 unique coupon UPCs)
Redemption patterns: Average of 2.14 redemptions per coupon, with top performers achieving 28-63 redemptions
High performers: Top coupon (10000085475) achieved 63 redemptions covering 4,294 products in Campaign 18
Multi-product advantage: 96% of coupons (942 out of 981) are valid for multiple products, with some covering up to 14,477 products
Product coverage efficiency: Multi-product coupons show higher absolute redemption counts while maintaining operational simplicity

Product-Level Strategic Insights¶

Product coverage: 41,857 products have coupon offers, with 20,270 products having meaningful coupon presence (≥3 coupon types)
Success rate distribution: 89.0% mean success rate, 100.0% median, indicating strong overall performance
Performance categories:
- High performers (100% success): 13,545 products (66.8%)
- Good performers (75-99% success): 1,290 products (6.4%)
- Average performers (25-74% success): 5,384 products (26.6%)
- Underperformers (<25% success): 51 products (0.3%)
Improvement opportunities: 5,435 products with <75% success rates represent strategic optimization targets

Department-Level Performance¶

Top performing departments: SEAFOOD (100.0%), MEAT (99.9%), MEAT-PCKGD (98.9%), COSMETICS (98.9%)
Strong performers: GROCERY (92.1% success rate) with 7,446 products and 25,053 redeemed coupon types
All departments successful: All 19 analyzed departments show >50% success rates
Volume leaders: GROCERY department leads in total coupon activity (27,201 available types), followed by DRUG GM (11,153 types)

Campaign Effectiveness¶

Campaign variation: Redemption rates range from 35.71% to 330.77% across different campaigns
Top performing campaigns: Campaign 16 (330.77%), Campaign 18 (312.44%), Campaign 13 (303.86%)
Type distribution: Type A and Type B campaigns show comparable high performance in top rankings
Campaign focus: Most effective campaigns achieve 200%+ redemption rates, indicating strong customer response

Customer Behavior Patterns¶

Household participation: 410 unique households actively redeem coupons from the dataset
Multi-campaign engagement: Top campaigns (8, 13, 18) generate 372-653 total redemptions each
Redemption concentration: Peak redemption months show clear seasonal patterns with November (25.0%) and August (20.2%) leading

Strategic Business Applications¶

Coupon Design Optimization:

Multi-product strategy: Leverage the 96% multi-product coupon model for broader appeal while maintaining tracking clarity
High-volume targeting: Focus on coupon types similar to top performers (Campaigns 16, 18, 13) that achieve 300%+ redemption rates
Department focus: Prioritize MEAT, COSMETICS, and GROCERY departments for new coupon initiatives

Campaign Planning:

Campaign type strategy: Both Type A and Type B campaigns can achieve exceptional performance (300%+ rates)
Volume planning: Design campaigns to handle 200-650 redemptions for high-performing campaign types
Seasonal timing: Align major campaigns with peak redemption periods (August, November)

Product Strategy:

Performance optimization: Address the 5,435 products with <75% success rates through improved targeting or offer design
Department expansion: Leverage success in SEAFOOD (100% success) and MEAT departments (99.9% success) for broader initiatives
Category focus: Prioritize products in high-performing departments for future coupon campaigns

Operational Planning:

Inventory management: Prepare for campaigns covering 1,000-5,000+ products (top coupon UPCs)
Redemption processing: Plan systems to handle 28-63 redemptions per top-performing coupon type
Department coordination: Ensure adequate staff and inventory in MEAT, GROCERY, and COSMETICS during campaign periods

Performance Measurement:

Primary KPIs: Target 50%+ coupon UPC redemption rates and 200%+ campaign redemption rates for success
Department benchmarks: Aim for 90%+ department success rates following GROCERY (92.1%) and MEAT (99.9%) models
Product optimization: Focus improvement efforts on the 26.6% of products with 25-74% success rates

This analysis provides a data-driven foundation for optimizing coupon strategy, campaign design, and operational planning based on proven performance patterns in the Complete Journey dataset.

Next Steps¶

Consider extending this analysis by:

Examining seasonal patterns in coupon redemption
Analyzing customer segments that are most responsive to coupons
Investigating the relationship between coupon values and redemption rates
Studying the impact of coupon redemption on customer loyalty and repeat purchases