AI Infrastructure
Serving, retrieval, and agents — backend for AI.
The engineering discipline around LLMs in production — serving, retrieval, agents, evaluation. Backend fundamentals applied to the 2026 hiring boom. This track trains AI infrastructure engineers, not model researchers — no training loops, no gradient math.
Same curriculum. Two different interview loops.
- New grad
- Targeting a first AI-infrastructure engineer role. Base comp for new-grad AI-infra often runs 20–40% above generalist SWE. The capstone is the artifact that says you've actually served inference traffic, not built a demo.
- Experienced
- Targeting a move into AI infrastructure at a frontier lab (OpenAI, Anthropic, xAI) or an AI-infra company (Scale, Together, Fireworks). Backend experience translates directly; the capstone fills the AI-specific resume gap.
Recruiting is seasonal. Your pace is yours.
US tech hiring runs hot in September–November and January–March, and is quiet the rest of the year. Your offer loop is timed to whichever window lands inside your program — we don't walk students into a dead market.
The 12-month arc is a default, not a contract. Students arriving with solid production fundamentals compress Phases 01 and 02 into weeks; students new to distributed systems take the full arc. Either way, Phase 04 starts the moment you're interview-ready — often halfway through the capstone, not after it.
What you build, at your pace.
Durations below describe a default arc. Students with stronger foundations move through Phases 01 and 02 faster, and Phase 04 runs in parallel with the capstone once onsites start landing.
- Phase 01~3 months
Inference primitives
- How modern LLM serving actually works: tokenization, attention, KV cache.
- GPU scheduling, batching, and quantization — the levers behind latency and cost.
- Python for production: typed, async, instrumented.
- Kubernetes for GPU workloads.
- Phase 02~4 months
Agentic systems at production scale
- Vector search, retrieval ranking, and embedding-refresh discipline.
- RAG pipelines: ingestion, chunking, re-ranking, answer synthesis.
- Agent orchestration, tool use, and long-running workflows.
- Evaluation: offline, online, and human-in-the-loop.
- Phase 03~3 months
Capstone
- Ship a production LLM application with real traffic characteristics.
- Tune latency, cost, and quality as three linked knobs.
- Build the eval harness that catches regressions before users do.
- Phase 04Parallel once you're ready · timed to the next window
Offer loop
- AI-infra system design rehearsed at the level you're targeting.
- Behavioral loop grounded in what you built on capstone.
- OA and VO drill sets tailored to AI-infra interview panels.
- Negotiation for AI-infra roles where the comp curve runs hotter than classical SWE.
New gradTarget Sep–Nov with AI-infra companies that open new-grad reqs late — the market for AI-native new grads is still forming, which means less competition and faster loops.ExperiencedFrontier labs hire year-round but cluster offers around Sep–Nov or Jan–Mar. Stack your onsites inside one window to maximize negotiation leverage.
You walk in as an AI-infra engineer who has served real inference traffic, not a prompt-engineering hobbyist.
Every technology you'll touch.
Python
Ray
vLLM
LangChain
PyTorch
Qdrant
Kubernetes
NVIDIA
Hugging Face
OpenAI
Anthropic
Temporal
MLflow
ONNX
Ollama
Milvus