Skip to content

Computer Vision

Teaching machines to interpret images and video โ€” from recognition to generation.

Computer vision is how we teach computers to make sense of pictures and video. A camera can capture an image easily, but the machine only sees a giant grid of numbers describing the brightness and colour of each dot. Computer vision is the set of methods that turn those numbers into meaning: this is a cat, there's a stop sign here, this scan looks unusual.

Think of it like learning to read. At first a page of text is just squiggles of ink; after seeing thousands of examples, your brain instantly recognises letters, words, and whole sentences without effort. Computer-vision systems learn the same way โ€” by studying millions of labelled images until patterns like edges, shapes, and textures become recognisable. Modern systems can go further and create brand-new images too, not just recognise them.

The main ideas

  • Image classification โ€” Assign labels to images; the task that kicked off the deep-learning era (ImageNet).
  • Object detection & segmentation โ€” Locate and outline objects (bounding boxes, pixel masks).
  • Image & video generation โ€” Create visual content with diffusion and transformer models.
  • 3D & scene understanding โ€” Depth, pose, neural radiance fields, and reconstructing the world from images.
  • Multimodal vision-language โ€” Models that jointly understand images and text (captioning, visual question answering).

Deep Learning ยท Generative AI ยท Robotics & Embodied AI


Want to make things?

Head to AI School โ€” AI camps where kids build their own games.