Speech recognition in the context of "Deep learning"

Play Trivia Questions online!

or

Skip to study material about Speech recognition in the context of "Deep learning"

Ad spacer

⭐ Core Definition: Speech recognition

Speech recognition (automatic speech recognition (ASR), computer speech recognition, or speech-to-text (STT)) is a sub-field of computational linguistics concerned with methods and technologies that translate spoken language into text or other interpretable forms.

Speech recognition applications include voice user interfaces, where the user speaks to a device, which "listens" and processes the audio. Common voice applications include interpreting commands for calling, call routing, home automation, and aircraft control. These applications are called direct voice input. Productivity applications include searching audio recordings, creating transcripts, and dictation.

↓ Menu

>>>PUT SHARE BUTTONS HERE<<<

👉 Speech recognition in the context of Deep learning

In machine learning, deep learning focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and revolves around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers (ranging from three to several hundred or thousands) in the network. Methods used can be supervised, semi-supervised or unsupervised.

Some common deep learning network architectures include fully connected networks, deep belief networks, recurrent neural networks, convolutional neural networks, generative adversarial networks, transformers, and neural radiance fields. These architectures have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, climate science, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.

↓ Explore More Topics
In this Dossier

Speech recognition in the context of Natural language processing

Natural language processing (NLP) is the processing of natural language information by a computer. The study of NLP, a subfield of computer science, is generally associated with artificial intelligence. NLP is related to information retrieval, knowledge representation, computational linguistics, and more broadly with linguistics.

Major processing tasks in an NLP system include: speech recognition, text classification, natural language understanding, and natural language generation.

↑ Return to Menu

Speech recognition in the context of Machine learning

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalise to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances in the field of deep learning have allowed neural networks, a class of statistical algorithms, to surpass many previous machine learning approaches in performance.

ML finds application in many fields, including natural language processing, computer vision, speech recognition, email filtering, agriculture, and medicine. The application of ML to business problems is known as predictive analytics.

↑ Return to Menu

Speech recognition in the context of Whispering

Whispering is an unvoiced mode of phonation in which the vocal cords are abducted so that they do not vibrate; air passes between the arytenoid cartilages to create audible turbulence during speech. Supralaryngeal articulation remains the same as in normal speech.

In normal speech, the vocal cords alternate between states of voice and voicelessness. In whispering, only the voicing segments change, so that the vocal cords alternate between whisper and voicelessness (though the acoustic difference between the two states is minimal). Because of this, implementing speech recognition for whispered speech is more difficult, as the characteristic spectral range needed to detect syllables and words is not given through the total absence of tone. More advanced techniques such as neural networks may be used, however, as is done by Amazon Alexa.

↑ Return to Menu

Speech recognition in the context of Typing

Typing is the process of entering or inputting text by pressing keys on a typewriter, computer keyboard, mobile phone, or calculator. It can be distinguished from other means of text input, such as handwriting and speech recognition; text can be in the form of letters, numbers and other symbols. The world's first typist was Lillian Sholes from Wisconsin in the United States, the daughter of Christopher Latham Sholes, who invented the first practical typewriter.

User interface features such as spell checker and autocomplete serve to facilitate and speed up typing and to prevent or correct errors the typist may make.

↑ Return to Menu

Speech recognition in the context of Speech synthesizer

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.

↑ Return to Menu

Speech recognition in the context of Apple Vision Pro

The Apple Vision Pro is a mixed-reality headset developed by Apple. It was announced on June 5, 2023, at Apple's Worldwide Developers Conference (WWDC) and was released first in the US, then in global territories throughout 2024. Apple Vision Pro is Apple's first new major product category since the release of the Apple Watch in 2015.

Apple markets Apple Vision Pro as a spatial computer where digital media is integrated with the real world. Physical inputs—such as motion gestures, eye tracking, and speech recognition—can be used to interact with the system. Apple has avoided marketing the device as a virtual reality headset when discussing the product in presentations and marketing.

↑ Return to Menu

Speech recognition in the context of Conversational commerce

Conversational commerce is e-commerce done via various means of conversation (live support on e-commerce Web sites, online chat using messaging apps, chatbots on messaging apps or websites, voice assistants) and using technology such as: speech recognition, speaker recognition (voice biometrics), natural language processing and artificial intelligence.

↑ Return to Menu

Speech recognition in the context of Digital signal processor

A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on metal–oxide–semiconductor (MOS) integrated circuit chips. They are widely used in audio signal processing, telecommunications, digital image processing, radar, sonar and speech recognition systems, and in common consumer electronic devices such as mobile phones, disk drives and high-definition television (HDTV) products.

The goal of a DSP is usually to measure, filter or compress continuous real-world analog signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but may not be able to keep up with such processing continuously in real-time. Also, dedicated DSPs usually have better power efficiency, thus they are more suitable in portable devices such as mobile phones because of power consumption constraints. DSPs often use special memory architectures that are able to fetch multiple data or instructions at the same time.

↑ Return to Menu