Speech & Audio AI¶
Understanding and generating sound โ speech, music, and everything in between.
When you talk, your voice pushes tiny waves through the air. A microphone measures how strong that wave is thousands of times every second, turning your voice into a long list of numbers a computer can study. Speech and audio AI is the set of tools that make sense of those numbers โ or invent new ones.
Think of it like a flip-book: each page is a single frozen snapshot, but flip through them fast and you see smooth motion. Audio works the same way โ thousands of tiny snapshots per second that, played back in order, become a voice, a song, or a slammed door.
With these tools a computer can listen (write down what you said), speak (read text aloud in a lifelike voice), recognise who is talking, and even compose original music.
The main ideas¶
- Speech recognition (ASR) โ Turning spoken audio into text.
- Text-to-speech (TTS) โ Generating natural-sounding speech from text.
- Voice & speaker tech โ Speaker identification, diarization, and voice cloning (and its ethics).
- Music & audio generation โ Composing and synthesizing music and sound effects.
Related areas¶
Deep Learning ยท Generative AI
Want to make things?
Head to AI School โ AI camps where kids build their own games.