Character encoding in the context of Telegraph code


Character encoding in the context of Telegraph code

Character encoding Study page number 1 of 2

Play TriviaQuestions Online!

or

Skip to study material about Character encoding in the context of "Telegraph code"


⭐ Core Definition: Character encoding

Character encoding is a convention of using a numeric value to represent each character of a writing script. Not only can a character set include natural language symbols, but it can also include codes that have meanings or functions outside of language, such as control characters and whitespace. Character encodings have also been defined for some constructed languages. When encoded, character data can be stored, transmitted, and transformed by a computer. The numerical values that make up a character encoding are known as code points and collectively comprise a code space or a code page.

Early character encodings that originated with optical or electrical telegraphy and in early computers could only represent a subset of the characters used in languages, sometimes restricted to upper case letters, numerals and limited punctuation. Over time, encodings capable of representing more characters were created, such as ASCII, ISO/IEC 8859, and Unicode encodings such as UTF-8 and UTF-16.

↓ Menu
HINT:

In this Dossier

Character encoding in the context of The Unicode Standard

Unicode (also known as The Unicode Standard and TUS) is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 17.0 defines 159,801 characters and 172 scripts used in various ordinary, literary, academic, and technical contexts.

Unicode has largely supplanted the previous environment of myriad incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters.

View the full Wikipedia page for The Unicode Standard
↑ Return to Menu

Character encoding in the context of Slate and stylus

The slate and stylus are tools used by blind people to write text that they can read without assistance. Invented by Charles Barbier as the tool for writing letters that could be read by touch, the slate and stylus allow for a quick, easy, convenient and constant method of making embossed printing for Braille character encoding. Prior methods of making raised printing for the blind required a movable type printing press.

View the full Wikipedia page for Slate and stylus
↑ Return to Menu

Character encoding in the context of File format

A file format is the way that information is encoded for storage in a computer file. It may describe the encoding at various levels of abstraction including low-level bit and byte layout as well high-level organization such as markup and tabular structure. A file format may be standarized (which can be proprietary or open) or it can be an ad hoc convention.

Some file formats are designed for very particular types of data: PNG files, for example, store bitmapped images using lossless data compression. Other file formats, however, are designed for storage of several different types of data: the Ogg format can act as a container for different types of multimedia including any combination of audio and video, with or without text (such as subtitles), and metadata. A text file can contain any stream of characters, including possible control characters, and is encoded in one of various character encoding schemes. Some file formats, such as HTML, scalable vector graphics, and the source code of computer software are text files with defined syntaxes that allow them to be used for specific purposes.

View the full Wikipedia page for File format
↑ Return to Menu

Character encoding in the context of String (computer science)

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. More general, string may also denote a sequence (or list) of data other than just characters.

Depending on the programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements.

View the full Wikipedia page for String (computer science)
↑ Return to Menu

Character encoding in the context of Arabic chat alphabet

The Arabic chat alphabet, also known as Arabizi, Arabeezi, Arabish, Franco-Arabic or simply Franco (from French: franco-arabe) refer to the romanized alphabets for informal Arabic dialects in which Arabic script is transcribed or encoded into a combination of Latin script and Western Arabic numerals. These informal chat alphabets were originally used primarily by youth in the Arab world in very informal settings—especially for communicating over the Internet or for sending messages via cellular phones—though use is not necessarily restricted by age anymore and these chat alphabets have been used in other media such as advertising.

These chat alphabets differ from more formal and academic Arabic transliteration systems, in that they use numerals and multigraphs instead of diacritics for letters such as ṭāʾ (ط) or ḍād (ض) that do not exist in the basic Latin script (ASCII), and in that what is being transcribed is an informal dialect and not Standard Arabic. These Arabic chat alphabets also differ from each other, as each is influenced by the particular phonology of the Arabic dialect being transcribed and the orthography of the dominant European language in the area—typically the language of the former colonists, and typically either French or English.

View the full Wikipedia page for Arabic chat alphabet
↑ Return to Menu

Character encoding in the context of Character (computing)

In computing and telecommunications, a character is the encoded representation of a natural language character (including letter, numeral and punctuation), whitespace (space or tab), or a control character (controls computer hardware that consumes character-based data). A sequence of characters is called a string.

Some character encoding systems represent each character using a fixed number of bits whereas other systems use varying sizes. Various fixed-length sizes were used for now obsolete systems such as the six-bit character code, the five-bit Baudot code and even 4-bit systems (with only 16 possible values). The more modern ASCII system uses the 8-bit byte for each character. Today, the Unicode-based UTF-8 encoding uses a varying number of byte-sized code units to define a code point which combine to encode a character.

View the full Wikipedia page for Character (computing)
↑ Return to Menu

Character encoding in the context of Morse code

Morse code is a telecommunications method which encodes text characters as standardized sequences of two different signal durations, called dots and dashes, or dits and dahs. It is named after Samuel Morse, one of several developers of the system. Morse's preliminary proposal for a telegraph code was replaced by an alphabet-based code developed by Alfred Vail, the engineer working with Morse. Vail's version was used for commercial telegraphy in North America. Friedrich Gerke simplified Vail's code to produce the code adopted in Europe, and most of the alphabetic part of the (ITU) "Morse" is copied from Gerke's revision.

The ITU International Morse code encodes the 26 basic Latin letters A to Z, one accented Latin letter (É), the Indo-Arabic numerals 0 to 9, and some punctuation and messaging procedural signals (prosigns). There is no distinction between upper and lower case letters. Each code symbol is formed by a sequence of dits and dahs. The dit duration can vary for signal clarity and operator skill, but for any one message, once the rhythm is established, a half-beat is the basic unit of time measurement. The duration of a dah is three times the duration of a dit. Each dit or dah within an encoded character is followed by a period of signal absence, called a space, equal to the dit duration. The letters of a word are separated by a space of duration equal to three dits, and words are separated by a space equal to seven dits.

View the full Wikipedia page for Morse code
↑ Return to Menu

Character encoding in the context of Control character

In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters (or printable characters), except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which might ring a bell.

View the full Wikipedia page for Control character
↑ Return to Menu

Character encoding in the context of ASCII

ASCII (/ˈæski/ ASS-kee), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable and 33 control characters – a total of 128 code points. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers; for example, the first 128 code points of Unicode are the same as ASCII.

ASCII encodes each code-point as a value from 0 to 127 – storable as a seven-bit integer. Ninety-five code-points are printable, including digits 0 to 9, lowercase letters a to z, uppercase letters A to Z, and commonly used punctuation symbols. For example, the letter i is represented as 105 (decimal). Also, ASCII specifies 33 non-printing control codes which originated with Teletype devices; most of which are now obsolete. The control characters that are still commonly used include carriage return, line feed, and tab.

View the full Wikipedia page for ASCII
↑ Return to Menu

Character encoding in the context of Six-bit character code

A six-bit character code is a character encoding designed for use on computers with word lengths a multiple of 6. Six bits can only encode 64 distinct characters, so these codes generally include only the upper-case letters, the numerals, some punctuation characters, and sometimes control characters. The 7-track magnetic tape format was developed to store data in such codes, along with an additional parity bit.

View the full Wikipedia page for Six-bit character code
↑ Return to Menu

Character encoding in the context of Baudot code

The Baudot code (French pronunciation: [bodo]) is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use before ASCII. Each character in the alphabet is represented by a series of five bits, sent over a communication channel such as a telegraph wire or a radio signal by asynchronous serial communication. The symbol rate measurement is known as baud, and is derived from the same name.

View the full Wikipedia page for Baudot code
↑ Return to Menu

Character encoding in the context of UTF-8

UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit. As of July 2025, almost every webpage is transmitted as UTF-8.

UTF-8 supports all 1,112,064 valid Unicode code points using a variable-width encoding of one to four one-byte (8-bit) code units.

View the full Wikipedia page for UTF-8
↑ Return to Menu

Character encoding in the context of 8-bit computing

In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses of that size. Memory addresses (and thus address buses) for 8-bit CPUs are generally larger than 8-bit, usually 16-bit. 8-bit microcomputers are microcomputers that use 8-bit microprocessors.

The term '8-bit' is also applied to the character sets that could be used on computers with 8-bit bytes, the best known being various forms of extended ASCII, including the ISO/IEC 8859 series of national character sets – especially Latin 1 for English and Western European languages.

View the full Wikipedia page for 8-bit computing
↑ Return to Menu