🎯 Situation

Every Monday morning, an operations analyst at one of our clients opened four Excel files — one per region — copied the data into a master spreadsheet, cleaned up the column names that never matched, removed the blank rows at the top of each file, and formatted the result before loading it into Power BI manually.

It took about 90 minutes. Every single week. For two years.

👉 The task was repetitive, error-prone, and completely automatable. It was also the kind of task that people do for years without questioning — because it's always been done that way, and it works well enough.

One Python script changed all of that.

⚠️ Challenge

The manual process had three real problems that compounded over time.

📷 The manual process

  • 90 minutes every Monday before any analysis
  • Column names varied by region and by person
  • Blank header rows in two of the four files
  • Revenue sometimes text, sometimes number
  • If the analyst was away — no report

❌ What went wrong

  • 2 copy-paste errors found in Q1 — caught only because a manager noticed the total didn't match
  • One region had changed their column header — the master file broke silently for 3 weeks
  • New analyst joining = 3-hour onboarding just for the Excel consolidation process
  • Power BI refresh was manual — forgot twice in 6 months

None of these were catastrophic individually. Together, they represented a fragile, human-dependent process sitting at the foundation of the company's weekly reporting.

🔍 The Script

Here's the actual code — slightly simplified, but close to what we deployed. It uses pandas, Python's standard data manipulation library, and glob to find files automatically:

import pandas as pd
import glob

# Read all 4 Excel files from a folder
files = glob.glob('data/*.xlsx')
df = pd.concat([pd.read_excel(f) for f in files], ignore_index=True)

# Standardize column names
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')

# Clean and filter
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df['revenue'] = pd.to_numeric(df['revenue'], errors='coerce').fillna(0)
df = df[df['date'].notna()]

# Aggregate by month and region
summary = df.groupby([df['date'].dt.to_period('M'), 'region'])['revenue'].sum().reset_index()
summary.columns = ['month', 'region', 'total_revenue']

# Export clean file for Power BI
summary.to_csv('output/monthly_revenue.csv', index=False)
print(f"Done. {len(summary)} rows exported.")

That's 28 lines. What it does:

  • Finds all Excel files automatically — drop a new file in the folder and it picks it up next run
  • Standardizes column names — lowercase, no spaces, no trailing whitespace — regardless of who created the file
  • Handles dirty data — non-numeric revenue values become 0, unparseable dates are removed cleanly
  • Aggregates and exports — one clean CSV that Power BI reads on a scheduled refresh
The script runs every Sunday night via Windows Task Scheduler — zero human involvement. Monday morning, Power BI refreshes from the clean CSV automatically. The analyst arrives to a dashboard that's already current. The 90-minute task is gone.

✓️ What It Cost and What It Gained

Writing and testing this script took about 3 hours. That's a one-time investment.

  • Time saved: 90 min/week × 52 weeks = 78 hours/year returned to the analyst
  • Error rate: dropped to zero for consolidation errors — the script either runs correctly or fails loudly
  • Resilience: the analyst can be on vacation. The report still runs.
  • Onboarding: a new analyst needs to understand one script, not a 12-step Excel process
  • Extensibility: adding a 5th region means dropping a new file in the folder — nothing else changes

The only ongoing cost: if the source file structure changes significantly (new columns, renamed sheets), the script needs a small update. In two years, that happened once.

💡 Summary

Python doesn't require a data engineering background to be useful. For most data consolidation tasks that happen on a schedule, 20 to 50 lines of pandas is enough to automate the entire process.

The pattern that works:

  • Identify the repetitive task — anything done weekly or monthly with the same files and same steps is a candidate
  • Write the script once — focus on reading, cleaning, transforming, and exporting. Don't over-engineer.
  • Schedule it — Windows Task Scheduler, macOS cron, or a cloud trigger. Run it the night before the data is needed.
  • Connect Power BI to the output — a clean CSV or database table, refreshed automatically
  • Add basic error handling — the script should send an alert if something goes wrong, not silently produce an empty file

The question isn't whether Python is worth learning. It's whether 90 minutes of manual work every week — for years — is worth more than 3 hours of automation once.

👉 The best automation is the one you build once and forget about.

30 lines of Python running every Sunday night is exactly that.