Speech coding in the context of GSM (codec)

⭐ Core Definition: Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Common applications of speech coding are mobile telephony and voice over IP (VoIP). The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques.

↓ Menu

HINT:

👉 Speech coding in the context of GSM (codec)

Full Rate (FR), also known as GSM-FR or GSM 06.10 (sometimes simply GSM), was the first digital speech coding standard used in the GSM digital mobile phone system. It uses linear predictive coding (LPC). The bit rate of the codec is 13 kbit/s, or 1.625 bits/audio sample (often padded out to 33 bytes/20 ms or 13.2 kbit/s). The quality of the coded speech is quite poor by modern standards, but at the time of development (early 1990s) it was a good compromise between computational complexity and quality, requiring only on the order of a million additions and multiplications per second. The codec is still widely used in networks around the world. Gradually FR will be replaced by Enhanced Full Rate (EFR) and Adaptive Multi-Rate (AMR) standards, which provide much higher speech quality with lower bit rate.

↓ Explore More Topics

In this Dossier

⭐ Core Definition: Speech coding
👉 Speech coding in the context of GSM (codec)
Speech coding in the context of Opus (audio format)
Speech coding in the context of Discrete cosine transform
Speech coding in the context of Linear predictive coding
Speech coding in the context of Modified discrete cosine transform
Speech coding in the context of Voice activity detection
Speech coding in the context of Vocoder

Speech coding in the context of Opus (audio format)

Opus is a free and open source lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed for efficient low-latency encoding of both speech and general audio. Due to its lower latency relative to other standard codecs, Opus finds specific use cases in real-time interactive communication for low-end embedded processors. Opus replaces both Vorbis and Speex for new applications.

Opus combines the speech-oriented LPC-based SILK algorithm and the lower-latency MDCT-based CELT algorithm, switching between or combining them as needed. Bitrate, audio bandwidth, complexity, and algorithm choice can be adjusted for each individual frame. Opus has low algorithmic delay (26.5 ms by default) ideal for use as part of a real-time communication link, networked music performances, and live lip sync; by trading off quality or bitrate, the delay can be further reduced down to 5 ms. Its delay thus is significantly lower compared to competing codecs, which require well over 100 ms. Opus remains competitive with these formats in terms of quality per bitrate.

View the full Wikipedia page for Opus (audio format)

↑ Return to Menu

Speech coding in the context of Discrete cosine transform

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images (such as JPEG and HEIF), digital video (such as MPEG and H.26x), digital audio (such as Dolby Digital, MP3 and AAC), digital television (such as SDTV, HDTV and VOD), digital radio (such as AAC+ and DAB+), and speech coding (such as AAC-LD, Siren and Opus). DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

A DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. The DCTs are generally related to Fourier series coefficients of a periodically and symmetrically extended sequence whereas DFTs are related to Fourier series coefficients of only periodically extended sequences. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), whereas in some variants the input or output data are shifted by half a sample.

View the full Wikipedia page for Discrete cosine transform

↑ Return to Menu

Speech coding in the context of Linear predictive coding

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

LPC is the most widely used method in speech coding and speech synthesis. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

View the full Wikipedia page for Linear predictive coding

↑ Return to Menu

Speech coding in the context of Modified discrete cosine transform

The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used lossy compression technique in audio data compression. It is employed in most modern audio coding standards, including MP3, Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC, Cook, Advanced Audio Coding (AAC), High-Definition Coding (HDC), LDAC, Dolby AC-4, and MPEG-H 3D Audio, as well as speech coding standards such as AAC-LD (LD-MDCT), G.722.1, G.729.1, CELT, and Opus.

The discrete cosine transform (DCT) was first proposed by Nasir Ahmed in 1972, and demonstrated by Ahmed with T. Natarajan and K. R. Rao in 1974. The MDCT was later proposed by John P. Princen, A.W. Johnson and Alan B. Bradley at the University of Surrey in 1987, following earlier work by Princen and Bradley (1986) to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below. (There also exists an analogous transform, the MDST, based on the discrete sine transform, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)

View the full Wikipedia page for Modified discrete cosine transform

↑ Return to Menu

Speech coding in the context of Voice activity detection

Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speaker diarization, speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth.

VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language.

View the full Wikipedia page for Voice activity detection

↑ Return to Menu

Speech coding in the context of Vocoder

A vocoder (/ˈvoʊkoʊdər/, a portmanteau of voice and encoder) is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

The vocoder was invented in 1938 by Homer Dudley at Bell Labs as a means of synthesizing human speech. This work was developed into the channel vocoder which was used as a voice codec for telecommunications for speech coding to conserve bandwidth in transmission.

View the full Wikipedia page for Vocoder

↑ Return to Menu

Speech coding in the context of GSM (codec)

Speech coding Study page number 1 of 1

Play TriviaQuestions Online!

Skip to study material about Speech coding in the context of "GSM (codec)"

⭐ Core Definition: Speech coding

👉 Speech coding in the context of GSM (codec)

Speech coding in the context of Opus (audio format)

Speech coding in the context of Discrete cosine transform

Speech coding in the context of Linear predictive coding

Speech coding in the context of Modified discrete cosine transform

Speech coding in the context of Voice activity detection

Speech coding in the context of Vocoder