Data Platform

Batch, stream, and lakehouse infrastructure.

Batch, stream, and lakehouse infrastructure that a data-platform engineer owns end to end. Kafka is the shared backbone; everything else answers to scale, latency, and correctness.

Who this track is for

Same curriculum. Two different interview loops.

New grad: Targeting a first data or platform engineer role at FAANG, or a specialist role at a data-infra company (Snowflake, Databricks, Confluent, dbt Labs). Same capstone — the interview loop leans on SQL depth, DSA, and one focused system-design round.
Experienced: Targeting a platform or staff-level move — often from application-side data work into real infrastructure ownership. The capstone becomes the artifact that proves you can own multi-workload tenancy, cost, and freshness SLAs.

How the 12 months line up

Recruiting is seasonal. Your pace is yours.

US tech hiring runs hot in September–November and January–March, and is quiet the rest of the year. Your offer loop is timed to whichever window lands inside your program — we don't walk students into a dead market.

The 12-month arc is a default, not a contract. Students arriving with solid production fundamentals compress Phases 01 and 02 into weeks; students new to distributed systems take the full arc. Either way, Phase 04 starts the moment you're interview-ready — often halfway through the capstone, not after it.

Active recruiting windowQuiet — build phase

Four focus areas · 12-month default

A plan built entirely around your job-search timeline.

Durations below describe a default arc. Students with stronger foundations move through Phases 01 and 02 faster, and Phase 04 runs in parallel with the capstone once onsites start landing.

Phase 01
~3 months
Storage and compute primitives
- File formats, partitioning, and lakehouse table primitives.
- Query engines: planning, push-down, vectorized execution.
- Schema evolution, backfills, and data contracts that don't rot.
- Lineage and catalog fundamentals.
Phase 02
~4 months
Batch and stream at production scale
- Batch compute engines and the shuffle-heavy patterns they handle.
- Stream processing: exactly-once, event time, watermarks, stateful operators.
- Orchestration: DAGs, retries, idempotency, and real SLAs.
- Platform observability: freshness, volume, schema drift, cost attribution.
Phase 03
~3 months
Capstone
- Ship a platform-grade pipeline that handles multiple workloads cleanly.
- Own the cost story, the freshness SLA, and the failure playbook.
- Defend schema and architectural decisions at the level you're interviewing for.
Phase 04
Parallel once you're ready · timed to the next window
Offer loop
- Platform system design rehearsed at the level you're targeting.
- Behavioral loop grounded in real ownership stories from capstone.
- OA and VO drill sets for data-infra-specific question patterns.
- Offer negotiation for data-platform roles — leveled and paid on their own curve.
New grad
Target Sep–Nov with data-infra-first companies (Snowflake, Databricks, Confluent) where the funnel is less crowded than general SWE.
Experienced
Align onsites inside Sep–Nov or Jan–Mar — platform roles move slower than product SWE, so starting recruiter outreach 2–3 weeks earlier is standard.