ABX.
Summer cohort · 3 spots remaining
All tracks
Track 3 · Roadmap

AI Infrastructure

Serving, retrieval, and agents — backend for AI.

The engineering discipline around LLMs in production — serving, retrieval, agents, evaluation. Backend fundamentals applied to the 2026 hiring boom. This track trains AI infrastructure engineers, not model researchers — no training loops, no gradient math.

Who this track is for

Same curriculum. Two different interview loops.

New grad
L3 / L4 · new-grad premium on AI-infra roles
Targeting a first AI-infrastructure engineer role. Base comp for new-grad AI-infra often runs 20–40% above generalist SWE. The capstone is the artifact that says you've actually served inference traffic, not built a demo.
Experienced
L4 / L5 / L6 · frontier lab or AI-infra lateral
Targeting a move into AI infrastructure at a frontier lab (OpenAI, Anthropic, xAI) or an AI-infra company (Scale, Together, Fireworks). Backend experience translates directly; the capstone fills the AI-specific resume gap.
How the 12 months line up

Recruiting is seasonal. Your pace is yours.

US tech hiring runs hot in September–November and January–March, and is quiet the rest of the year. Your offer loop is timed to whichever window lands inside your program — we don't walk students into a dead market.

The 12-month arc is a default, not a contract. Students arriving with solid production fundamentals compress Phases 01 and 02 into weeks; students new to distributed systems take the full arc. Either way, Phase 04 starts the moment you're interview-ready — often halfway through the capstone, not after it.

Active recruiting windowQuiet — build phase
Four focus areas · 12-month default

What you build, at your pace.

Durations below describe a default arc. Students with stronger foundations move through Phases 01 and 02 faster, and Phase 04 runs in parallel with the capstone once onsites start landing.

  1. Phase 01
    ~3 months

    Inference primitives

    • How modern LLM serving actually works: tokenization, attention, KV cache.
    • GPU scheduling, batching, and quantization — the levers behind latency and cost.
    • Python for production: typed, async, instrumented.
    • Kubernetes for GPU workloads.
  2. Phase 02
    ~4 months

    Agentic systems at production scale

    • Vector search, retrieval ranking, and embedding-refresh discipline.
    • RAG pipelines: ingestion, chunking, re-ranking, answer synthesis.
    • Agent orchestration, tool use, and long-running workflows.
    • Evaluation: offline, online, and human-in-the-loop.
  3. Phase 03
    ~3 months

    Capstone

    • Ship a production LLM application with real traffic characteristics.
    • Tune latency, cost, and quality as three linked knobs.
    • Build the eval harness that catches regressions before users do.
  4. Phase 04
    Parallel once you're ready · timed to the next window

    Offer loop

    • AI-infra system design rehearsed at the level you're targeting.
    • Behavioral loop grounded in what you built on capstone.
    • OA and VO drill sets tailored to AI-infra interview panels.
    • Negotiation for AI-infra roles where the comp curve runs hotter than classical SWE.
    New grad
    Target Sep–Nov with AI-infra companies that open new-grad reqs late — the market for AI-native new grads is still forming, which means less competition and faster loops.
    Experienced
    Frontier labs hire year-round but cluster offers around Sep–Nov or Jan–Mar. Stack your onsites inside one window to maximize negotiation leverage.
Outcome

You walk in as an AI-infra engineer who has served real inference traffic, not a prompt-engineering hobbyist.

Stack recap

Every technology you'll touch.

Core
  • Python
  • Ray
  • vLLM
  • LangChain
  • PyTorch
  • Qdrant
  • Kubernetes
  • NVIDIA
Exposure
  • Hugging Face
  • OpenAI
  • Anthropic
  • Temporal
  • MLflow
  • ONNX
  • Ollama
  • Milvus
Ready for this track?

Apply to the summer cohort.

Start application