Speech

4 minutes

Note

See the Text and images tab for more details!

Speech capabilities in AI applications and agents enable users to interact with them through spoken language.

Speech recognition

Diagram of a user's spoken input being converted to text.

Speech recognition is the ability of AI to "hear" and interpret speech. Usually this capability takes the form of speech-to-text (where the audio signal for the speech is transcribed into text).

Speech synthesis

Diagram of text being converted to audible speech.

Speech synthesis is the ability of AI to vocalize words as spoken language. Usually this capability takes the form of text-to-speech in which information in text format is converted into an audible signal.

AI speech technology is evolving rapidly to handle challenges like ignoring background noise, detecting interruptions, and generating increasingly expressive and human-like voices.

AI speech scenarios

Common uses of AI speech technologies include:

Personal AI assistants in phones, computers, or household devices with which you interact by talking.
AI agents that understand spoken input, perform tasks, and respond with spoken results.
Automated transcription of calls or meetings.
Automating audio descriptions of video or text.
Automated speech translation between languages.

Feedback

Was this page helpful?