Character encoding in the context of "ASCII"

Play Trivia Questions online!

or

Skip to study material about Character encoding in the context of "ASCII"

Ad spacer

⭐ Core Definition: Character encoding

Character encoding is a convention of using a numeric value to represent each character of a writing script. Not only can a character set include natural language symbols, but it can also include codes that have meanings or functions outside of language, such as control characters and whitespace. Character encodings have also been defined for some constructed languages. When encoded, character data can be stored, transmitted, and transformed by a computer. The numerical values that make up a character encoding are known as code points and collectively comprise a code space or a code page.

Early character encodings that originated with optical or electrical telegraphy and in early computers could only represent a subset of the characters used in languages, sometimes restricted to upper case letters, numerals and limited punctuation. Over time, encodings capable of representing more characters were created, such as ASCII, ISO/IEC 8859, and Unicode encodings such as UTF-8 and UTF-16.

↓ Menu

>>>PUT SHARE BUTTONS HERE<<<
In this Dossier

Character encoding in the context of The Unicode Standard

Unicode (also known as The Unicode Standard and TUS) is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 17.0 defines 159,801 characters and 172 scripts used in various ordinary, literary, academic, and technical contexts.

Unicode has largely supplanted the previous environment of myriad incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters.

↑ Return to Menu

Character encoding in the context of Slate and stylus

The slate and stylus are tools used by blind people to write text that they can read without assistance. Invented by Charles Barbier as the tool for writing letters that could be read by touch, the slate and stylus allow for a quick, easy, convenient and constant method of making embossed printing for Braille character encoding. Prior methods of making raised printing for the blind required a movable type printing press.

↑ Return to Menu

Character encoding in the context of File format

A file format is the way that information is encoded for storage in a computer file. It may describe the encoding at various levels of abstraction including low-level bit and byte layout as well high-level organization such as markup and tabular structure. A file format may be standarized (which can be proprietary or open) or it can be an ad hoc convention.

Some file formats are designed for very particular types of data: PNG files, for example, store bitmapped images using lossless data compression. Other file formats, however, are designed for storage of several different types of data: the Ogg format can act as a container for different types of multimedia including any combination of audio and video, with or without text (such as subtitles), and metadata. A text file can contain any stream of characters, including possible control characters, and is encoded in one of various character encoding schemes. Some file formats, such as HTML, scalable vector graphics, and the source code of computer software are text files with defined syntaxes that allow them to be used for specific purposes.

↑ Return to Menu

Character encoding in the context of String (computer science)

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. More general, string may also denote a sequence (or list) of data other than just characters.

Depending on the programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements.

↑ Return to Menu

Character encoding in the context of Arabic chat alphabet

The Arabic chat alphabet, also known as Arabizi, Arabeezi, Arabish, Franco-Arabic or simply Franco (from French: franco-arabe) refer to the romanized alphabets for informal Arabic dialects in which Arabic script is transcribed or encoded into a combination of Latin script and Western Arabic numerals. These informal chat alphabets were originally used primarily by youth in the Arab world in very informal settings—especially for communicating over the Internet or for sending messages via cellular phones—though use is not necessarily restricted by age anymore and these chat alphabets have been used in other media such as advertising.

These chat alphabets differ from more formal and academic Arabic transliteration systems, in that they use numerals and multigraphs instead of diacritics for letters such as ṭāʾ (ط) or ḍād (ض) that do not exist in the basic Latin script (ASCII), and in that what is being transcribed is an informal dialect and not Standard Arabic. These Arabic chat alphabets also differ from each other, as each is influenced by the particular phonology of the Arabic dialect being transcribed and the orthography of the dominant European language in the area—typically the language of the former colonists, and typically either French or English.

↑ Return to Menu

Character encoding in the context of Character (computing)

In computing and telecommunications, a character is the encoded representation of a natural language character (including letter, numeral and punctuation), whitespace (space or tab), or a control character (controls computer hardware that consumes character-based data). A sequence of characters is called a string.

Some character encoding systems represent each character using a fixed number of bits whereas other systems use varying sizes. Various fixed-length sizes were used for now obsolete systems such as the six-bit character code, the five-bit Baudot code and even 4-bit systems (with only 16 possible values). The more modern ASCII system uses the 8-bit byte for each character. Today, the Unicode-based UTF-8 encoding uses a varying number of byte-sized code units to define a code point which combine to encode a character.

↑ Return to Menu

Character encoding in the context of Morse code

Morse code is a telecommunications method which encodes text characters as standardized sequences of two different signal durations, called dots and dashes, or dits and dahs. It is named after Samuel Morse, one of several developers of the system. Morse's preliminary proposal for a telegraph code was replaced by an alphabet-based code developed by Alfred Vail, the engineer working with Morse. Vail's version was used for commercial telegraphy in North America. Friedrich Gerke simplified Vail's code to produce the code adopted in Europe, and most of the alphabetic part of the (ITU) "Morse" is copied from Gerke's revision.

The ITU International Morse code encodes the 26 basic Latin letters A to Z, one accented Latin letter (É), the Indo-Arabic numerals 0 to 9, and some punctuation and messaging procedural signals (prosigns). There is no distinction between upper and lower case letters. Each code symbol is formed by a sequence of dits and dahs. The dit duration can vary for signal clarity and operator skill, but for any one message, once the rhythm is established, a half-beat is the basic unit of time measurement. The duration of a dah is three times the duration of a dit. Each dit or dah within an encoded character is followed by a period of signal absence, called a space, equal to the dit duration. The letters of a word are separated by a space of duration equal to three dits, and words are separated by a space equal to seven dits.

↑ Return to Menu