All projects
Featured ProjectDec 2025 – Jan 2026

Corim Distribution: Route Optimization Pipeline

Built a Python automation pipeline processing 1,916 real-world customer records for a live distribution company, calculating driving distances and travel times via the Google Maps Distance Matrix API. Engineered robust dirty-data handling for malformed and invalid address entries, exponential backoff retry logic with a 3-attempt ceiling on failed geocoding calls, and a checkpoint-resume pattern writing incremental Excel output every 100 records to prevent data loss and allow mid-run restarts across multi-hour processing jobs.

1,916 records processed · checkpoint-resume on multi-hour runs · production-grade error recovery
1,916 Records100-Record CheckpointsExponential BackoffMulti-Hour RunsReal Client Deployment
PythonpandasGoogle Maps APIopenpyxlProduction Pipeline
Private repo · available on request

Customer records flow in · addresses are normalized · Distance Matrix calls go out in batches · failures retry with backoff · output is checkpointed every 100 records · final result is an Excel-ready distance matrix.

Customer CSV1,916 recordsAddress NormalizeDirty-data handlingMaps API BatchDistance MatrixRetry + BackoffExp · 3-attempt capCheckpoint SaveEvery 100 rowsExcel OutputDriving × time matrix

Each stage is explained twice, first for the finance reader, then for the engineer.

1. Dirty-Data Address Normalization

Finance lens

Real-world business data is messy. Distribution customer lists have typos, missing zip codes, abbreviated street names, and entries that look right but aren't. Before any optimization runs, the pipeline cleans every record so downstream API calls don't waste budget on garbage inputs and so unsalvageable records are flagged rather than silently corrupting the result.

Engineering lens

A normalization layer strips typos, expands abbreviations, infers missing fields where possible, and routes unsalvageable records to a separate exception queue rather than failing the whole batch. Validation happens before the first paid API call, not after.

2. Distance Matrix Computation at Scale

Finance lens

For each of the 1,916 customers, the model needs driving distance and travel time to every other relevant location, that's tens of thousands of pairwise queries to Google Maps. The output is the substrate for any route optimization the company runs on top of it (last-mile planning, capacity allocation, contract bid responses).

Engineering lens

Asynchronous batched calls to Google Maps Distance Matrix API. Calls are throttled to stay under quota and batched for cost efficiency, Google charges per element, so packing matters. Results are stored as a sparse matrix in Excel for downstream consumption by non-technical stakeholders.

3. Exponential Backoff & Resilient Retries

Finance lens

APIs fail. Networks flap. Across multi-hour runs, a single transient error shouldn't kill an 8-hour job. The pipeline retries failed calls with progressively longer waits before giving up after 3 attempts, and logs every failure with full context so the operator can review them after the fact instead of losing the data silently.

Engineering lens

Exponential backoff (1s, 2s, 4s) with a hard 3-attempt ceiling. Failures past that are logged with full context (input row, response code, message) to a separate exception table; the run continues. Success/failure rates are tracked across error categories so systemic issues surface quickly.

4. Checkpoint-Resume Pattern

Finance lens

If the run crashes at hour 5, you don't want to start over. Every 100 records, the pipeline saves its progress to disk so it can pick up exactly where it left off, no wasted API calls, no lost work, no second invoice from Google for the same data.

Engineering lens

Incremental Excel writes every 100 records via openpyxl. On restart, the pipeline reads the existing output, resumes from the next unprocessed record, and continues. Idempotent, safe to restart any number of times. The same pattern handles partial-failure runs and operator-induced kills.

Address normalization runs before any paid API call, dirty inputs are routed to an exception queue rather than corrupting the main run

Asynchronous batched calls to Google Maps Distance Matrix API, throttled under quota and packed for per-element cost efficiency

Exponential backoff (1s / 2s / 4s) with a 3-attempt ceiling; failures are logged with full context to a separate exception table while the run continues

Checkpoint-resume pattern: incremental Excel writes every 100 records via openpyxl; the pipeline is idempotent and safe to restart at any point in a multi-hour run