Speech & Audio AI¶

Understanding and generating sound — speech, music, and everything in between.

When you talk, your voice pushes tiny waves through the air. A microphone measures how strong that wave is thousands of times every second, turning your voice into a long list of numbers a computer can study. Speech and audio AI is the set of tools that make sense of those numbers — or invent new ones.

Think of it like a flip-book: each page is a single frozen snapshot, but flip through them fast and you see smooth motion. Audio works the same way — thousands of tiny snapshots per second that, played back in order, become a voice, a song, or a slammed door.

With these tools a computer can listen (write down what you said), speak (read text aloud in a lifelike voice), recognise who is talking, and even compose original music.

The main ideas¶

Speech recognition (ASR) — Turning spoken audio into text.
Text-to-speech (TTS) — Generating natural-sounding speech from text.
Voice & speaker tech — Speaker identification, diarization, and voice cloning (and its ethics).
Music & audio generation — Composing and synthesizing music and sound effects.

Deep Learning · Generative AI

Want to make things?

Head to AI School — AI camps where kids build their own games.

Speech & Audio AI¶

The main ideas¶

Related areas¶