Computer Vision¶
Teaching machines to interpret images and video โ from recognition to generation.
Computer vision is how we teach computers to make sense of pictures and video. A camera can capture an image easily, but the machine only sees a giant grid of numbers describing the brightness and colour of each dot. Computer vision is the set of methods that turn those numbers into meaning: this is a cat, there's a stop sign here, this scan looks unusual.
Think of it like learning to read. At first a page of text is just squiggles of ink; after seeing thousands of examples, your brain instantly recognises letters, words, and whole sentences without effort. Computer-vision systems learn the same way โ by studying millions of labelled images until patterns like edges, shapes, and textures become recognisable. Modern systems can go further and create brand-new images too, not just recognise them.
The main ideas¶
- Image classification โ Assign labels to images; the task that kicked off the deep-learning era (ImageNet).
- Object detection & segmentation โ Locate and outline objects (bounding boxes, pixel masks).
- Image & video generation โ Create visual content with diffusion and transformer models.
- 3D & scene understanding โ Depth, pose, neural radiance fields, and reconstructing the world from images.
- Multimodal vision-language โ Models that jointly understand images and text (captioning, visual question answering).
Related areas¶
Deep Learning ยท Generative AI ยท Robotics & Embodied AI
Want to make things?
Head to AI School โ AI camps where kids build their own games.