Character set in the context of Variable-width encoding

⭐ Core Definition: Character set

Character encoding is a convention of using a numeric value to represent each character of a writing script. Not only can a character set include natural language symbols, but it can also include codes that have meanings or functions outside of language, such as control characters and whitespace. Character encodings have also been defined for some constructed languages. When encoded, character data can be stored, transmitted, and transformed by a computer. The numerical values that make up a character encoding are known as code points and collectively comprise a code space or a code page.

Early character encodings that originated with optical or electrical telegraphy and in early computers could only represent a subset of the characters used in languages, sometimes restricted to upper case letters, numerals and limited punctuation. Over time, encodings capable of representing more characters were created, such as ASCII, ISO/IEC 8859, and Unicode encodings such as UTF-8 and UTF-16.

↓ Menu

HINT:

👉 Character set in the context of Variable-width encoding

In coding theory, variable-length encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation in a computer. The equivalent concept in computer science is bit string.

Variable-length codes can allow sources to be compressed and decompressed with zero error (lossless data compression) and still be read back symbol by symbol. An independent and identically-distributed source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed-length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure.

↓ Explore More Topics

In this Dossier

⭐ Core Definition: Character set
👉 Character set in the context of Variable-width encoding
Character set in the context of Text mode
Character set in the context of Overstrike
Character set in the context of Precomposed character
Character set in the context of MIME
Character set in the context of Code page 437
Character set in the context of Telegraph code

Character set in the context of Text mode

Text mode is a computer display mode in which content is internally represented on a computer screen in terms of characters rather than individual pixels. Typically, the screen consists of a uniform rectangular grid of character cells, each of which contains one of the characters of a character set; at the same time, contrasted to graphics mode or other kinds of computer graphics modes.

Text mode applications communicate with the user by using command-line interfaces and text user interfaces. Many character sets used in text mode applications also contain a limited set of predefined semi-graphical characters usable for drawing boxes and other rudimentary graphics, which can be used to highlight the content or to simulate widget or control interface objects found in GUI programs. A typical example is the IBM code page 437 character set.

View the full Wikipedia page for Text mode

↑ Return to Menu

Character set in the context of Overstrike

In typography, overstrike is a method of printing characters that are missing from the printer's character set. The character is created by placing one character on another one – for example, overstriking ⟨L⟩ with ⟨-⟩ results in printing a ⟨Ł⟩ (L with stroke) character.

The ASCII code supports six different diacritics. These are: grave accent, tilde, acute accent (approximated by the apostrophe), diaeresis (double quote), cedilla (comma), and circumflex accent. Each is typed by typing the preceding character, then backspace, and then the 'related character', which is ⟨`⟩, ⟨~⟩, ⟨'⟩, ⟨"⟩, or ⟨^⟩, respectively for the above-mentioned accents.

View the full Wikipedia page for Overstrike

↑ Return to Menu

Character set in the context of Precomposed character

A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacritical mark, such as é (Latin small letter e with acute accent). Technically, é (U+00E9) is a character that can be decomposed into an equivalent string of the base letter e (U+0065) and combining acute accent (U+0301). Similarly, ligatures are precompositions of their constituent letters or graphemes.

Precomposed characters are the legacy solution for representing many special letters in various character sets. In Unicode, they are included primarily to aid computer systems with incomplete Unicode support, where equivalent decomposed characters may render incorrectly.

View the full Wikipedia page for Precomposed character

↑ Return to Menu

Character set in the context of MIME

Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

MIME is an Internet standard – specified in the following request for comments (RFC) publications: RFC 2045,RFC 2046,RFC 2047,RFC 4288,RFC 4289 and RFC 2049. The integration with SMTP email is specified in RFC 1521 and RFC 1522.

View the full Wikipedia page for MIME

↑ Return to Menu

Character set in the context of Code page 437

Code page 437 (CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or MS-DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII" (one of many mutually incompatible ASCII extensions).

This character set remains the primary set in the core of any EGA and VGA-compatible graphics card. As such, text shown when a PC reboots, before fonts can be loaded and rendered, is typically rendered using this character set. Many file formats developed at the time of the IBM PC are based on code page 437 as well.

View the full Wikipedia page for Code page 437

↑ Return to Menu

Character set in the context of Telegraph code

A telegraph code is one of the character encodings used to transmit information by telegraphy. Morse code is the best-known such code. Telegraphy usually refers to the electrical telegraph, but telegraph systems using the optical telegraph were in use before that. A code consists of a number of code points, each corresponding to a letter of the alphabet, a numeral, or some other character. In codes intended for machines rather than humans, code points for control characters, such as carriage return, are required to control the operation of the mechanism. Each code point is made up of a number of elements arranged in a unique way for that character. There are usually two types of element (a binary code), but more element types were employed in some codes not intended for machines. For instance, American Morse code had about five elements, rather than the two (dot and dash) of International Morse Code.

Codes meant for human interpretation were designed so that the characters that occurred most often had the fewest elements in the corresponding code point. For instance, Morse code for E, the most common letter in English, is a single dot ( ▄ ), whereas Q is ▄▄▄ ▄▄▄ ▄ ▄▄▄ . These arrangements meant the message could be sent more quickly and it would take longer for the operator to become fatigued. Telegraphs were always operated by humans until late in the 19th century. When automated telegraph messages came in, codes with variable-length code points were inconvenient for machine design of the period. Instead, codes with a fixed length were used. The first of these was the Baudot code, a five-bit code. Baudot has only enough code points to print in upper case. Later codes had more bits (ASCII has seven) so that both upper and lower case could be printed. Beyond the telegraph age, modern computers require a very large number of code points (Unicode has 21 bits) so that multiple languages and alphabets (character sets) can be handled without having to change the character encoding. Modern computers can easily handle variable-length codes such as UTF-8 and UTF-16 which have now become ubiquitous.

View the full Wikipedia page for Telegraph code

↑ Return to Menu

Character set in the context of Variable-width encoding

Character set Study page number 1 of 1

Play TriviaQuestions Online!

Skip to study material about Character set in the context of "Variable-width encoding"

⭐ Core Definition: Character set

👉 Character set in the context of Variable-width encoding

Character set in the context of Text mode

Character set in the context of Overstrike

Character set in the context of Precomposed character

Character set in the context of MIME

Character set in the context of Code page 437

Character set in the context of Telegraph code