Scheduling Python Scripts: How to Run Your Data Pipelines Automatically

You've built the Python script. It cleans the data, hits the API, loads the result into Azure SQL. But you're still running it manually every morning. Here are the 4 scheduling approaches — from simple to production-grade.

🎯 Situation

A data analyst at a logistics client built a solid Python script over three weeks: it pulled daily shipment data from an API, cleaned it with pandas, and loaded it into Azure SQL. The script worked perfectly. She ran it every morning at 7 AM, manually, from her laptop. When she was on vacation, the data stopped flowing. When her laptop was in for repairs, the Power BI dashboard showed a week of blanks. The script was good. The deployment wasn't.

👉 A script that runs manually is a prototype. A script that runs on a schedule, logs its output, alerts on failure, and doesn't depend on a specific laptop is a pipeline. The code is the same. The deployment is the difference.

⚠️ Challenge

📋 Option 1 & 2: Local scheduling

Windows Task Scheduler — right-click your script → Create Task → Triggers → daily at 7 AM. Works while the machine is on. Zero cost.
macOS/Linux cron — add: 0 7 * * * /usr/bin/python3 /path/to/script.py to crontab. Same limitation: requires the machine to be running.
Right for: personal scripts, development machines, low-stakes data that can tolerate occasional missed runs.
Not right for: production pipelines where Power BI depends on the data being current every morning.

⛅ Option 3 & 4: Cloud scheduling — production grade

Azure Functions (Consumption plan) — deploy your Python script as a Function, set a Timer trigger (cron syntax), runs in the cloud on schedule regardless of any laptop. Cost: ~$0.20/month for daily runs.
Azure Data Factory (ADF) — pipeline orchestration with a Python activity. More complex to set up, but includes dependency management, retry logic, and monitoring built in. Right for multi-step pipelines.
GitHub Actions — free for public repos, $0/month for 2,000 minutes/month on private. Schedule a workflow to pull your repo and run the script. Logs are automatic. Good middle ground between local and full Azure.

🔍 Analysis

The three production requirements for any scheduled script:

Logging — write every run's output to a file or database: start time, end time, rows processed, errors. If something breaks, you need to know what happened and when.
Error alerting — if the script fails, an email or Teams notification fires immediately. Not the next morning when someone notices the dashboard is blank.
Idempotency — if the script runs twice (e.g., a retry after a failure), it should produce the correct result, not double-insert data. Use UPSERT (INSERT ... ON CONFLICT UPDATE) or truncate-and-reload patterns.

✓️ Best Practice

The minimal production wrapper for any Python pipeline script:

import logging, sys
from datetime import datetime

logging.basicConfig(
    filename='pipeline.log',
    level=logging.INFO,
    format='%(asctime)s %(levelname)s %(message)s'
)

def run_pipeline():
    logging.info("Pipeline started")
    try:
        # --- your pipeline code here ---
        rows = extract_from_api()
        cleaned = transform(rows)
        load_to_sql(cleaned)
        logging.info(f"Success: {len(cleaned)} rows loaded")
    except Exception as e:
        logging.error(f"Pipeline failed: {e}")
        send_alert(str(e))  # Teams / email notification
        sys.exit(1)

if __name__ == "__main__":
    run_pipeline()

💡 Summary

Scheduling is the last 10% of building a pipeline — and the part that determines whether it actually works in production. A manually-run script is a dependency: it depends on a person, a laptop, and a routine. A scheduled, logged, alerting script is infrastructure.

👉 The script is done when it runs itself.

Logging, alerting, scheduling — that's the 20% that makes the 80% reliable.