🎯 Situation
Every Monday morning, an operations analyst at one of our clients opened four Excel files — one per region — copied the data into a master spreadsheet, cleaned up the column names that never matched, removed the blank rows at the top of each file, and formatted the result before loading it into Power BI manually.
It took about 90 minutes. Every single week. For two years.
One Python script changed all of that.
⚠️ Challenge
The manual process had three real problems that compounded over time.
📷 The manual process
- 90 minutes every Monday before any analysis
- Column names varied by region and by person
- Blank header rows in two of the four files
- Revenue sometimes text, sometimes number
- If the analyst was away — no report
❌ What went wrong
- 2 copy-paste errors found in Q1 — caught only because a manager noticed the total didn't match
- One region had changed their column header — the master file broke silently for 3 weeks
- New analyst joining = 3-hour onboarding just for the Excel consolidation process
- Power BI refresh was manual — forgot twice in 6 months
None of these were catastrophic individually. Together, they represented a fragile, human-dependent process sitting at the foundation of the company's weekly reporting.
🔍 The Script
Here's the actual code — slightly simplified, but close to what we deployed. It uses pandas, Python's standard data manipulation library, and glob to find files automatically:
import pandas as pd
import glob
# Read all 4 Excel files from a folder
files = glob.glob('data/*.xlsx')
df = pd.concat([pd.read_excel(f) for f in files], ignore_index=True)
# Standardize column names
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
# Clean and filter
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df['revenue'] = pd.to_numeric(df['revenue'], errors='coerce').fillna(0)
df = df[df['date'].notna()]
# Aggregate by month and region
summary = df.groupby([df['date'].dt.to_period('M'), 'region'])['revenue'].sum().reset_index()
summary.columns = ['month', 'region', 'total_revenue']
# Export clean file for Power BI
summary.to_csv('output/monthly_revenue.csv', index=False)
print(f"Done. {len(summary)} rows exported.")
That's 28 lines. What it does:
- Finds all Excel files automatically — drop a new file in the folder and it picks it up next run
- Standardizes column names — lowercase, no spaces, no trailing whitespace — regardless of who created the file
- Handles dirty data — non-numeric revenue values become 0, unparseable dates are removed cleanly
- Aggregates and exports — one clean CSV that Power BI reads on a scheduled refresh
✓️ What It Cost and What It Gained
Writing and testing this script took about 3 hours. That's a one-time investment.
- Time saved: 90 min/week × 52 weeks = 78 hours/year returned to the analyst
- Error rate: dropped to zero for consolidation errors — the script either runs correctly or fails loudly
- Resilience: the analyst can be on vacation. The report still runs.
- Onboarding: a new analyst needs to understand one script, not a 12-step Excel process
- Extensibility: adding a 5th region means dropping a new file in the folder — nothing else changes
The only ongoing cost: if the source file structure changes significantly (new columns, renamed sheets), the script needs a small update. In two years, that happened once.
💡 Summary
Python doesn't require a data engineering background to be useful. For most data consolidation tasks that happen on a schedule, 20 to 50 lines of pandas is enough to automate the entire process.
The pattern that works:
- Identify the repetitive task — anything done weekly or monthly with the same files and same steps is a candidate
- Write the script once — focus on reading, cleaning, transforming, and exporting. Don't over-engineer.
- Schedule it — Windows Task Scheduler, macOS cron, or a cloud trigger. Run it the night before the data is needed.
- Connect Power BI to the output — a clean CSV or database table, refreshed automatically
- Add basic error handling — the script should send an alert if something goes wrong, not silently produce an empty file
The question isn't whether Python is worth learning. It's whether 90 minutes of manual work every week — for years — is worth more than 3 hours of automation once.
👉 The best automation is the one you build once and forget about.
30 lines of Python running every Sunday night is exactly that.