Computer Vision¶

Teaching machines to interpret images and video — from recognition to generation.

Computer vision is how we teach computers to make sense of pictures and video. A camera can capture an image easily, but the machine only sees a giant grid of numbers describing the brightness and colour of each dot. Computer vision is the set of methods that turn those numbers into meaning: this is a cat, there's a stop sign here, this scan looks unusual.

Think of it like learning to read. At first a page of text is just squiggles of ink; after seeing thousands of examples, your brain instantly recognises letters, words, and whole sentences without effort. Computer-vision systems learn the same way — by studying millions of labelled images until patterns like edges, shapes, and textures become recognisable. Modern systems can go further and create brand-new images too, not just recognise them.

The main ideas¶

Image classification — Assign labels to images; the task that kicked off the deep-learning era (ImageNet).
Object detection & segmentation — Locate and outline objects (bounding boxes, pixel masks).
Image & video generation — Create visual content with diffusion and transformer models.
3D & scene understanding — Depth, pose, neural radiance fields, and reconstructing the world from images.
Multimodal vision-language — Models that jointly understand images and text (captioning, visual question answering).

Deep Learning · Generative AI · Robotics & Embodied AI

Want to make things?

Head to AI School — AI camps where kids build their own games.

Computer Vision¶

The main ideas¶

Related areas¶