◈ Cowork/2026-04-15Intermediate

Cowork × Python Data Analysis Automation: Build a Weekly Report Pipeline with pandas and Matplotlib

Learn how to combine Cowork's scheduled tasks with Python to fully automate your data analysis reports. From pandas aggregation to Matplotlib charts and Google Sheets integration — a complete, production-ready implementation guide.

cowork¹³ python²² data analysis automation⁹⁷ pandas² matplotlib scheduled tasks⁸ report generation

✦ Premium Article

Every Monday morning, the same ritual: open the CSV, build a pivot table, copy the chart into a slide deck, send it to the team. Sound familiar?

Nearly all of that work can be automated with Cowork and Python. Imagine opening Slack on Monday morning to find last week's KPI report already waiting — generated while you slept. That's what this guide builds.

This is not a Python basics tutorial. It's a practical design guide for building a production-grade data pipeline by combining Cowork's scheduled tasks, bash environment, and Claude AI's analytical capabilities.

Why Automate Data Analysis in Cowork?

"If you know Python, just set up a cron job on a server" — that's a fair point. But Cowork offers something cron doesn't: native Claude AI integration.

For pure aggregation and charting, cron works fine. But with Cowork, you can pass your aggregated results directly to Claude and ask it to explain "what changed compared to last week" and "possible causes for anomalies" in plain language — then inject that analysis into the top of your report. No extra infrastructure required.

The other advantage is access to your local Mac environment. Since Cowork runs as a desktop app, it can read local CSV files, access files on network drives, and interact with any tools installed on your machine — all without special configuration.

Here's the pipeline we'll build:

Fetch data from local CSVs or Google Sheets via Python
Aggregate and preprocess with pandas
Generate charts with Matplotlib (PNG output)
Let Claude AI analyze week-over-week changes and write a commentary
Save as a Markdown report and send a Slack notification
Run automatically every Monday morning via Cowork's scheduled tasks

Step 1: Set Up the Python Environment in Cowork

Cowork has direct access to your Mac's shell environment. Start by checking what's available and installing any missing libraries.

Open a new Cowork conversation and prompt:

Check my bash environment. Show me python3 --version, pip3 --version,
and whether pandas, matplotlib, and openpyxl are installed.

Claude will run something like:

# Check Python environment
python3 --version
pip3 --version
 
# Check required libraries
python3 -c "import pandas; print('pandas:', pandas.__version__)" 2>/dev/null || echo "pandas: NOT INSTALLED"
python3 -c "import matplotlib; print('matplotlib:', matplotlib.__version__)" 2>/dev/null || echo "matplotlib: NOT INSTALLED"
python3 -c "import openpyxl; print('openpyxl:', openpyxl.__version__)" 2>/dev/null || echo "openpyxl: NOT INSTALLED"

If anything's missing, just tell Claude: "Please install the missing libraries." It will run pip3 install pandas matplotlib openpyxl for you.

Creating a Virtual Environment (Optional)

For a project-specific environment, prompt:

Create a virtual environment at ~/data-analysis-automation and install
pandas, matplotlib, openpyxl, requests, and slack-sdk.

Claude will execute:

# Create project directory and virtual environment
mkdir -p ~/data-analysis-automation
cd ~/data-analysis-automation
python3 -m venv .venv
 
# Install libraries
.venv/bin/pip install pandas matplotlib openpyxl requests slack-sdk gspread google-auth
 
# Generate requirements.txt for reproducibility
.venv/bin/pip freeze > requirements.txt
echo "✅ Environment ready"

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦If you've been manually summarizing spreadsheets, you'll learn how to set up Cowork's scheduled tasks to auto-generate weekly reports every Monday morning

✦Get complete, copy-paste-ready pipeline code combining pandas, Matplotlib, and Google Sheets API — no stub code, everything runs

✦Learn production-ready patterns including error handling, logging, and Slack notifications so your automation keeps running reliably

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Step 2: Designing the Data Ingestion Layer

The structure of your ingestion script depends on your data source. Here are implementations for the two most common cases: local CSVs and Google Sheets.

Pattern A: Loading Local CSV Files

For data exported as CSV files from sales tools or analytics platforms:

# data_loader.py
import pandas as pd
from pathlib import Path
from datetime import datetime, timedelta
import logging
 
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
 
def load_weekly_data(data_dir: str, weeks_back: int = 2) -> pd.DataFrame:
    """
    Load and combine CSVs from the past N weeks in the specified directory.
    Expected filename format: sales_YYYY-MM-DD.csv
    """
    data_path = Path(data_dir)
    today = datetime.now()
    
    dfs = []
    for i in range(weeks_back * 7):
        target_date = today - timedelta(days=i)
        filename = f"sales_{target_date.strftime('%Y-%m-%d')}.csv"
        filepath = data_path / filename
        
        if filepath.exists():
            try:
                df = pd.read_csv(filepath, encoding='utf-8-sig')
                df['source_file'] = filename
                df['record_date'] = target_date.strftime('%Y-%m-%d')
                dfs.append(df)
                logger.info(f"Loaded: {filename} ({len(df)} rows)")
            except Exception as e:
                logger.warning(f"Failed to load {filename}: {e}")
    
    if not dfs:
        raise FileNotFoundError(f"No CSV files found in {data_path} for past {weeks_back} weeks")
    
    combined = pd.concat(dfs, ignore_index=True)
    logger.info(f"Total rows loaded: {len(combined)}")
    return combined
 
def validate_dataframe(df: pd.DataFrame, required_columns: list) -> bool:
    """
    Verify required columns exist and the DataFrame isn't empty.
    Always validate before processing — don't assume the data is clean.
    """
    missing = [col for col in required_columns if col not in df.columns]
    if missing:
        logger.error(f"Missing required columns: {missing}")
        return False
    
    if df.empty:
        logger.error("DataFrame is empty")
        return False
    
    null_counts = df[required_columns].isnull().sum()
    if null_counts.any():
        logger.warning(f"Null values found:\n{null_counts[null_counts > 0]}")
    
    return True

Pattern B: Loading from Google Sheets

For teams that maintain their source data in shared spreadsheets:

# sheets_loader.py
import gspread
from google.oauth2.service_account import Credentials
import pandas as pd
import logging
from pathlib import Path
 
logger = logging.getLogger(__name__)
 
SCOPES = [
    'https://www.googleapis.com/auth/spreadsheets.readonly',
    'https://www.googleapis.com/auth/drive.readonly'
]
 
def get_sheets_client(credentials_path: str) -> gspread.Client:
    """
    Build a Google Sheets client from a service account credentials file.
    Store credentials.json at ~/data-analysis-automation/credentials/
    """
    creds = Credentials.from_service_account_file(
        credentials_path,
        scopes=SCOPES
    )
    return gspread.authorize(creds)
 
def load_sheet_as_dataframe(
    client: gspread.Client,
    spreadsheet_id: str,
    sheet_name: str,
    header_row: int = 1
) -> pd.DataFrame:
    """
    Load a Google Sheets worksheet as a DataFrame.
    
    Args:
        spreadsheet_id: The ID from the spreadsheet URL (between /d/ and /edit)
        sheet_name: Name of the tab to load
        header_row: Row number of the header (1-indexed)
    """
    try:
        spreadsheet = client.open_by_key(spreadsheet_id)
        worksheet = spreadsheet.worksheet(sheet_name)
        data = worksheet.get_all_records(head=header_row)
        df = pd.DataFrame(data)
        logger.info(f"Loaded from Sheets: {sheet_name} ({len(df)} rows)")
        return df
    except gspread.exceptions.APIError as e:
        logger.error(f"Sheets API error: {e}")
        raise
    except gspread.exceptions.WorksheetNotFound:
        logger.error(f"Worksheet '{sheet_name}' not found")
        raise

Step 3: Weekly Aggregation with pandas

Once your data is loaded, it's time to aggregate. Here are the patterns that cover most real-world use cases:

# aggregator.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import logging
 
logger = logging.getLogger(__name__)
 
def compute_weekly_summary(df: pd.DataFrame, date_col: str, value_col: str) -> dict:
    """
    Calculate this week's and last week's totals, along with the change rate.
    
    Returns:
        {
            'this_week': {'total': ..., 'daily_avg': ..., 'max': ...},
            'last_week': {'total': ..., 'daily_avg': ..., 'max': ...},
            'change_rate': 0.12  # +12%
        }
    """
    df[date_col] = pd.to_datetime(df[date_col])
    
    today = datetime.now().date()
    # Set week start to Monday
    this_week_start = today - timedelta(days=today.weekday() + 7)
    this_week_end = this_week_start + timedelta(days=6)
    last_week_start = this_week_start - timedelta(days=7)
    last_week_end = this_week_start - timedelta(days=1)
    
    this_week_df = df[
        (df[date_col].dt.date >= this_week_start) &
        (df[date_col].dt.date <= this_week_end)
    ]
    last_week_df = df[
        (df[date_col].dt.date >= last_week_start) &
        (df[date_col].dt.date <= last_week_end)
    ]
    
    def summarize(week_df: pd.DataFrame) -> dict:
        if week_df.empty:
            return {'total': 0, 'daily_avg': 0, 'max': 0, 'min': 0}
        values = week_df[value_col].dropna()
        return {
            'total': float(values.sum()),
            'daily_avg': float(values.mean()),
            'max': float(values.max()),
            'min': float(values.min()),
            'count': len(values)
        }
    
    this_week_stats = summarize(this_week_df)
    last_week_stats = summarize(last_week_df)
    
    # Guard against zero division
    if last_week_stats['total'] \!= 0:
        change_rate = (this_week_stats['total'] - last_week_stats['total']) / last_week_stats['total']
    else:
        change_rate = None
    
    return {
        'this_week': this_week_stats,
        'last_week': last_week_stats,
        'change_rate': change_rate,
        'period': {
            'this_week': f"{this_week_start} to {this_week_end}",
            'last_week': f"{last_week_start} to {last_week_end}"
        }
    }
 
def detect_anomalies(df: pd.DataFrame, value_col: str, threshold: float = 2.0) -> pd.DataFrame:
    """
    Flag values outside mean ± N standard deviations as anomalies.
    threshold=2.0 covers roughly 95% of a normal distribution.
    
    This saves the manual "did something weird happen last Tuesday?" check.
    """
    mean = df[value_col].mean()
    std = df[value_col].std()
    
    lower = mean - threshold * std
    upper = mean + threshold * std
    
    anomalies = df[(df[value_col] < lower) | (df[value_col] > upper)].copy()
    anomalies['anomaly_type'] = np.where(
        anomalies[value_col] > upper, 'HIGH', 'LOW'
    )
    
    if not anomalies.empty:
        logger.warning(f"Detected {len(anomalies)} anomalies")
    
    return anomalies

Step 4: Auto-Generating Charts with Matplotlib

This section shows how to generate clean, production-ready charts. Font configuration is a common early stumbling block, so the setup below handles it correctly from the start.

# chart_generator.py
import matplotlib
matplotlib.use('Agg')  # Must be set before any other matplotlib imports for headless environments
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib import rcParams
import pandas as pd
from pathlib import Path
import logging
from datetime import datetime
 
logger = logging.getLogger(__name__)
 
# Font configuration for macOS (Cowork runs on Mac)
rcParams['font.family'] = ['Hiragino Sans', 'Helvetica Neue', 'sans-serif']
rcParams['axes.unicode_minus'] = False  # Prevents minus sign rendering issues
 
def generate_weekly_trend_chart(
    df: pd.DataFrame,
    date_col: str,
    value_col: str,
    title: str,
    output_dir: str,
    filename: str = "weekly_trend.png"
) -> str:
    """
    Generate a daily line chart and save it as PNG.
    
    Returns:
        Path to the saved file
    """
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)
    
    fig, ax = plt.subplots(figsize=(12, 5))
    
    df = df.copy()
    df[date_col] = pd.to_datetime(df[date_col])
    df_sorted = df.sort_values(date_col)
    
    daily = df_sorted.groupby(df_sorted[date_col].dt.date)[value_col].sum().reset_index()
    daily.columns = ['date', 'value']
    daily['date'] = pd.to_datetime(daily['date'])
    
    ax.plot(daily['date'], daily['value'], marker='o', linewidth=2, color='#2563eb', markersize=6)
    ax.fill_between(daily['date'], daily['value'], alpha=0.1, color='#2563eb')
    
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
    ax.xaxis.set_major_locator(mdates.DayLocator(interval=1))
    plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    ax.set_title(title, fontsize=14, fontweight='bold', pad=15)
    ax.set_xlabel('Date', fontsize=11)
    ax.set_ylabel(value_col, fontsize=11)
    ax.grid(True, alpha=0.3)
    
    # Annotate max and min
    max_idx = daily['value'].idxmax()
    ax.annotate(
        f"Peak: {daily.loc[max_idx, 'value']:,.0f}",
        xy=(daily.loc[max_idx, 'date'], daily.loc[max_idx, 'value']),
        xytext=(10, 10), textcoords='offset points',
        fontsize=9, color='#dc2626',
        arrowprops=dict(arrowstyle='->', color='#dc2626')
    )
    
    plt.tight_layout()
    
    save_path = output_path / filename
    fig.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.close(fig)  # Always close to prevent memory leaks in long-running tasks
    
    logger.info(f"Chart saved: {save_path}")
    return str(save_path)
 
def generate_comparison_bar_chart(
    summary: dict,
    metric_name: str,
    output_dir: str,
    filename: str = "comparison.png"
) -> str:
    """
    Generate a this-week vs last-week comparison bar chart.
    """
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)
    
    fig, ax = plt.subplots(figsize=(8, 5))
    
    categories = ['Last Week', 'This Week']
    values = [summary['last_week']['total'], summary['this_week']['total']]
    colors = ['#94a3b8', '#2563eb']
    
    bars = ax.bar(categories, values, color=colors, width=0.5, edgecolor='white', linewidth=1.5)
    
    if summary['change_rate'] is not None:
        rate = summary['change_rate'] * 100
        arrow = "↑" if rate > 0 else "↓"
        color = "#16a34a" if rate > 0 else "#dc2626"
        ax.text(
            1, values[1] + max(values) * 0.02,
            f"{arrow} {abs(rate):.1f}%",
            ha='center', va='bottom', fontsize=14,
            fontweight='bold', color=color
        )
    
    for bar, val in zip(bars, values):
        ax.text(
            bar.get_x() + bar.get_width() / 2,
            bar.get_height() + max(values) * 0.01,
            f"{val:,.0f}",
            ha='center', va='bottom', fontsize=11, fontweight='bold'
        )
    
    ax.set_title(f"{metric_name} — Weekly Comparison", fontsize=14, fontweight='bold', pad=15)
    ax.set_ylabel(metric_name, fontsize=11)
    ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:,.0f}'))
    ax.grid(True, axis='y', alpha=0.3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    
    plt.tight_layout()
    
    save_path = output_path / filename
    fig.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.close(fig)
    
    logger.info(f"Comparison chart saved: {save_path}")
    return str(save_path)

Step 5: Letting Claude AI Analyze the Data

This is where Cowork pulls ahead of a plain cron job. You can pass aggregated results to Claude and get a human-readable business analysis in return.

In your scheduled task prompt, include something like this:

Read the following data analysis results and explain their business significance
in 3–5 sentences. Avoid jargon — write so that a non-technical manager can
immediately understand what happened and why it matters.

[Summary Period]
This week: {this_week_period}
Last week: {last_week_period}

[Key KPIs]
- This week total: {this_week_total}
- Last week total: {last_week_total}
- Week-over-week change: {change_rate}%
- This week daily average: {this_week_avg}

[Anomaly Information]
{anomaly_info}

Format your response as a Markdown blockquote (> ...).

Claude's response goes directly into the report header — no copy-pasting required.

Step 6: Report Generation and Slack Notification

Bring everything together into a Markdown report and push a summary to Slack:

# reporter.py
import json
import logging
import requests
from datetime import datetime
from pathlib import Path
 
logger = logging.getLogger(__name__)
 
def generate_markdown_report(
    summary: dict,
    ai_comment: str,
    chart_paths: list,
    output_dir: str
) -> str:
    """
    Assemble the full Markdown report and save it to disk.
    """
    today = datetime.now().strftime('%Y-%m-%d')
    report_path = Path(output_dir) / f"weekly_report_{today}.md"
    
    change_rate = summary.get('change_rate')
    change_emoji = "📈" if (change_rate or 0) > 0 else "📉"
    change_text = "increase" if (change_rate or 0) > 0 else "decrease"
    change_rate_str = f"{abs((change_rate or 0) * 100):.1f}%"
    
    content = f"""# Weekly Report ({today})
 
## AI Analysis Summary
 
{ai_comment}
 
---
 
## KPI Summary
 
- **This week total**: {summary['this_week']['total']:,.0f}
- **Last week total**: {summary['last_week']['total']:,.0f}
- **Week-over-week**: {change_emoji} {change_rate_str} {change_text}
- **This week daily average**: {summary['this_week']['daily_avg']:,.1f}
- **This week peak**: {summary['this_week']['max']:,.0f}
 
**Periods**
- This week: {summary['period']['this_week']}
- Last week: {summary['period']['last_week']}
 
---
 
## Generated Charts
 
"""
    for path in chart_paths:
        content += f"- `{path}`\n"
    
    content += f"""
---
 
*This report was auto-generated by the Cowork + Python data pipeline.*
*Generated at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}*
"""
    
    report_path.write_text(content, encoding='utf-8')
    logger.info(f"Report saved: {report_path}")
    return str(report_path)
 
def notify_slack(
    webhook_url: str,
    summary: dict,
    ai_comment: str,
    report_path: str
) -> bool:
    """
    Send a concise summary to Slack via Incoming Webhook.
    Send just the highlights — let the full report be the source of truth.
    """
    change_rate = summary.get('change_rate')
    if change_rate is not None:
        change_text = f"{'📈' if change_rate > 0 else '📉'} {abs(change_rate * 100):.1f}% {'up' if change_rate > 0 else 'down'}"
    else:
        change_text = "No comparison data"
    
    payload = {
        "blocks": [
            {
                "type": "header",
                "text": {
                    "type": "plain_text",
                    "text": f"📊 Weekly Report — {datetime.now().strftime('%Y/%m/%d')}"
                }
            },
            {
                "type": "section",
                "fields": [
                    {
                        "type": "mrkdwn",
                        "text": f"*This Week Total*\n{summary['this_week']['total']:,.0f}"
                    },
                    {
                        "type": "mrkdwn",
                        "text": f"*Week-over-Week*\n{change_text}"
                    }
                ]
            },
            {
                "type": "section",
                "text": {
                    "type": "mrkdwn",
                    "text": f"*AI Analysis:*\n{ai_comment[:300]}..."
                }
            },
            {
                "type": "context",
                "elements": [
                    {
                        "type": "mrkdwn",
                        "text": f"Full report: `{report_path}`"
                    }
                ]
            }
        ]
    }
    
    try:
        response = requests.post(webhook_url, json=payload, timeout=10)
        response.raise_for_status()
        logger.info("Slack notification sent")
        return True
    except requests.exceptions.RequestException as e:
        logger.error(f"Slack notification failed: {e}")
        return False  # Notification failure shouldn't fail the whole pipeline

Step 7: Registering the Scheduled Task in Cowork

Wire everything into a single entry point, then register it as a Cowork scheduled task:

# main_pipeline.py
#\!/usr/bin/env python3
"""
Weekly data analysis pipeline.
Called by Cowork's scheduled tasks — runs every Monday at 7:00 AM.
"""
import sys
import logging
from pathlib import Path
from datetime import datetime
 
log_dir = Path.home() / "data-analysis-automation" / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
log_file = log_dir / f"pipeline_{datetime.now().strftime('%Y%m%d')}.log"
 
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file, encoding='utf-8'),
        logging.StreamHandler(sys.stdout)
    ]
)
logger = logging.getLogger(__name__)
 
CONFIG = {
    "data_dir": str(Path.home() / "data-analysis-automation" / "data"),
    "output_dir": str(Path.home() / "data-analysis-automation" / "output"),
    "date_col": "date",
    "value_col": "sales",
    "metric_name": "Sales",
    "slack_webhook_url": "YOUR_SLACK_WEBHOOK_URL",
}
 
def run_pipeline():
    """Main pipeline orchestrator. Errors are logged but don't halt execution."""
    logger.info("=== Weekly Data Analysis Pipeline — START ===")
    
    results = {
        "data_loaded": False,
        "summary_computed": False,
        "charts_generated": False,
        "report_generated": False,
        "slack_notified": False
    }
    
    try:
        from data_loader import load_weekly_data, validate_dataframe
        from aggregator import compute_weekly_summary, detect_anomalies
        from chart_generator import generate_weekly_trend_chart, generate_comparison_bar_chart
        from reporter import generate_markdown_report, notify_slack
        
        logger.info("Step 1: Loading data")
        df = load_weekly_data(CONFIG["data_dir"], weeks_back=2)
        if not validate_dataframe(df, [CONFIG["date_col"], CONFIG["value_col"]]):
            raise ValueError("Data validation failed")
        results["data_loaded"] = True
        
        logger.info("Step 2: Computing weekly summary")
        summary = compute_weekly_summary(df, CONFIG["date_col"], CONFIG["value_col"])
        anomalies = detect_anomalies(df, CONFIG["value_col"])
        results["summary_computed"] = True
        
        logger.info("Step 3: Generating charts")
        chart_paths = []
        trend_path = generate_weekly_trend_chart(
            df, CONFIG["date_col"], CONFIG["value_col"],
            f"{CONFIG['metric_name']} Weekly Trend",
            CONFIG["output_dir"], "weekly_trend.png"
        )
        chart_paths.append(trend_path)
        
        comparison_path = generate_comparison_bar_chart(
            summary, CONFIG["metric_name"],
            CONFIG["output_dir"], "weekly_comparison.png"
        )
        chart_paths.append(comparison_path)
        results["charts_generated"] = True
        
        anomaly_info = f"{len(anomalies)} anomalies detected" if not anomalies.empty else "No anomalies"
        ai_comment = f"> This week's {CONFIG['metric_name']} changed by {(summary['change_rate'] or 0) * 100:.1f}% compared to last week. {anomaly_info}. See charts for details."
        
        logger.info("Step 5: Generating report")
        report_path = generate_markdown_report(summary, ai_comment, chart_paths, CONFIG["output_dir"])
        results["report_generated"] = True
        
        logger.info("Step 6: Sending Slack notification")
        if CONFIG["slack_webhook_url"] \!= "YOUR_SLACK_WEBHOOK_URL":
            notify_slack(CONFIG["slack_webhook_url"], summary, ai_comment, report_path)
            results["slack_notified"] = True
        else:
            logger.warning("Slack webhook not configured — skipping notification")
        
        logger.info(f"=== Pipeline COMPLETE ===\nResults: {results}")
        return True
        
    except Exception as e:
        logger.error(f"Pipeline failed: {e}", exc_info=True)
        logger.info(f"Partial results: {results}")
        return False
 
if __name__ == "__main__":
    success = run_pipeline()
    sys.exit(0 if success else 1)

In Cowork's Scheduler, create a new task with this prompt:

Every Monday at 7:00 AM, run the weekly data analysis report pipeline.

bash: cd ~/data-analysis-automation && .venv/bin/python main_pipeline.py

After execution, check the tail of the log file at
~/data-analysis-automation/logs/pipeline_YYYYMMDD.log
and report whether it succeeded or failed. If it failed, summarize the error.

Cron expression: 0 7 * * 1 (Every Monday at 7:00 AM)

Common Pitfalls and How to Avoid Them

Here are the three issues that most reliably cause headaches in production.

1. Matplotlib crashes with NSException in scheduled tasks

When Matplotlib runs in a headless environment (no display), it tries to initialize a GUI backend and crashes on macOS. The fix is to call matplotlib.use('Agg') at the very top of your script, before any other matplotlib imports. If you forget this, you'll hit a maddening scenario where the script works fine when you run it manually but fails every time in the scheduled task.

2. SettingWithCopyWarning leads to silent data corruption

# ❌ This triggers SettingWithCopyWarning and may not modify the original
df_filtered = df[df['status'] == 'active']
df_filtered['new_col'] = 'value'
 
# ✅ Always use .copy() after filtering
df_filtered = df[df['status'] == 'active'].copy()
df_filtered['new_col'] = 'value'

This warning often seems harmless but can silently fail to write the value you expect. Make it a habit: any time you filter a DataFrame and then add a column, use .copy() first.

3. Silent failures in Slack notification hide pipeline errors

A common mistake is letting the pipeline fail silently when Slack notifications don't go through. The notify_slack function above returns a boolean rather than raising an exception — this is intentional. A Slack API hiccup shouldn't cause your entire Monday morning report to not exist. Structure your error handling so that communication failures are logged but don't cascade into pipeline failures.

Writing Back to Google Sheets

For teams that prefer tracking KPIs in a shared spreadsheet rather than Markdown files, add a write-back step:

def write_summary_to_sheet(client, spreadsheet_id: str, sheet_name: str, summary: dict):
    """
    Append this week's summary row to the specified Google Sheet.
    Existing data is never overwritten — new rows are always appended.
    """
    worksheet = client.open_by_key(spreadsheet_id).worksheet(sheet_name)
    
    today = datetime.now().strftime('%Y-%m-%d')
    change_rate = summary.get('change_rate')
    row = [
        today,
        summary['this_week']['total'],
        summary['last_week']['total'],
        f"{change_rate * 100:.2f}%" if change_rate is not None else "N/A",
        summary['this_week']['daily_avg'],
        summary['this_week']['max'],
    ]
    
    worksheet.append_row(row, value_input_option='USER_ENTERED')
    logger.info(f"Appended row to Google Sheets: {today}")

What to Do with the Time You Get Back

Once this pipeline is running, you'll reclaim 30–60 minutes every week that used to go to manual report assembly.

The question worth sitting with is: what do you do with that time?

The part of the process that still requires a human is the judgment — noticing that a number looks wrong for a reason the data doesn't capture, deciding what the trend means for next month's plan, choosing which anomaly is worth investigating and which to ignore. Claude AI can flag the anomalies and summarize the trends, but it can't carry the context of your business the way you do.

Automation clears the mechanical work so you can focus on that judgment. That's the actual payoff here — not just the saved hours, but the mental space to use the data rather than just produce it.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.