AI Hardware & Compute¶

The silicon and systems that make modern AI possible — and the single biggest practical constraint on what gets built.

Modern AI runs on special computer chips. A regular computer chip is like a single brilliant chef who cooks one dish start-to-finish, very fast. But AI does the same simple sum billions of times over, so it needs the opposite: a huge kitchen crammed with thousands of ordinary cooks, all chopping at once. That kind of chip is called a GPU, and it does thousands of small calculations in parallel instead of a few big ones in sequence. Because there are so many cooks, the real traffic jam is often getting ingredients to them fast enough, moving numbers between memory and the chip. Chips are expensive, scarce, and hungry for electricity, so hardware is usually the single biggest limit on what AI anyone can actually build.

The main ideas¶

GPUs, TPUs & accelerators — Why massively parallel hardware dominates deep learning, and the chips that run it.
The memory wall — HBM, bandwidth, and why memory — not raw FLOPs — is often the real bottleneck.
CUDA & kernels — The software stack that maps math onto hardware; fused kernels like FlashAttention.
Quantization & precision — FP16, BF16, INT8 and 4-bit — trading numerical precision for speed and memory.
Inference optimization — Batching, KV-caching, speculative decoding, and serving models efficiently.
Scaling laws & cost — How compute, data, and model size trade off — and what a training run actually costs.

Deep Learning · Data & MLOps · Edge & On-Device AI

Want to make things?

Head to AI School — AI camps where kids build their own games.

AI Hardware & Compute¶

The main ideas¶

Related areas¶