Text corpus in the context of "Co-occurrence"

Play Trivia Questions online!

or

Skip to study material about Text corpus in the context of "Co-occurrence"

Ad spacer

⭐ Core Definition: Text corpus

In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Annotated, they have been used in corpus linguistics for statistical hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

↓ Menu

>>>PUT SHARE BUTTONS HERE<<<
In this Dossier

Text corpus in the context of Cuneiform

Cuneiform is a logo-syllabic writing system that was used to write several languages of the ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform scripts are marked by and named for the characteristic wedge-shaped impressions (Latin: cuneus) which form their signs. Cuneiform is the earliest known writing system and was originally developed to write the Sumerian language of southern Mesopotamia (modern Iraq).

Over the course of its history, cuneiform was adapted to write a number of languages in addition to Sumerian. Akkadian texts are attested from the 24th century BC onward and make up the bulk of the cuneiform record. Akkadian cuneiform was itself adapted to write the Hittite language in the early 2nd millennium BC. The other languages with significant cuneiform corpora are Eblaite, Elamite, Hurrian, Luwian, and Urartian. The Old Persian and Ugaritic alphabets feature cuneiform-style signs; however, they are unrelated to the cuneiform logo-syllabary proper. The latest known cuneiform tablet, an astronomical almanac from Uruk, dates to AD 79/80.

↑ Return to Menu

Text corpus in the context of Linguistic prescription

Linguistic prescription is the establishment of rules defining publicly preferred usage of language, including rules of spelling, pronunciation, vocabulary, grammar, etc. Linguistic prescriptivism may aim to establish a standard language, teach what a particular society or sector of a society perceives as a correct or proper form, or advise on effective and stylistically apt communication. If usage preferences are conservative, prescription might appear resistant to language change; if radical, it may produce neologisms. Such prescriptions may be motivated by consistency (making a language simpler or more logical); rhetorical effectiveness; tradition; aesthetics or personal preferences; linguistic purism or nationalism (i.e. removing foreign influences); or to avoid causing offense (etiquette or political correctness).

Prescriptive approaches to language are often contrasted with the descriptive approach of academic linguistics, which observes and records how language is actually used (while avoiding passing judgment). The basis of linguistic research is text (corpus) analysis and field study, both of which are descriptive activities. Description may also include researchers' observations of their own language usage. In the Eastern European linguistic tradition, the discipline dealing with standard language cultivation and prescription is known as "language culture" or "speech culture".

↑ Return to Menu

Text corpus in the context of Name of Italy

The etymology of the name of Italy has been the subject of reconstructions by linguists and historians. Considerations extraneous to the specifically linguistic reconstruction of the name have formed a rich corpus of solutions that are either associated with legend (the existence of a king named Italus) or in any case strongly problematic (such as the connection of the name with the grape vine, vitis in Latin).

One theory is that the name derives from the word Italói, a term with which the ancient Greeks designated a tribe of Sicels who had crossed the Strait of Messina and who inhabited the extreme tip of the Italic Peninsula, near today's Catanzaro. This is attested by the fact that the ancient Greek peoples who colonized present-day Calabria, referred to themselves as Italiotes, that is, inhabitants of Italy. This group of Italian people had worshiped the simulacrum of a calf (vitulus, in Latin), and the name would therefore mean "inhabitants of the land of calves (young bulls)". In any case, it is known that in archaic times the name indicated the part located in the extreme south of the Italian Peninsula.

↑ Return to Menu

Text corpus in the context of Egyptian language

The Egyptian language, or Ancient Egyptian (r n kmt; 'speech of Egypt'), is an extinct branch of the Afro-Asiatic language family that was spoken in ancient Egypt. It is known today from a large corpus of surviving texts, which were made accessible to the modern world following the decipherment of the ancient Egyptian scripts in the early 19th century.

Egyptian is one of the earliest known written languages, first recorded in the hieroglyphic script in the late 4th millennium BC. It is also the longest-attested human language, with a written record spanning over 4,000 years. Its classical form, known as "Middle Egyptian," served as the vernacular of the Middle Kingdom of Egypt and remained the literary language of Egypt until the Roman period.

↑ Return to Menu

Text corpus in the context of Hippocratic Corpus

The Hippocratic Corpus (Latin: Corpus Hippocraticum), or Hippocratic Collection, is a collection of around 60 early Ancient Greek medical works closely associated with the physician Hippocrates and his teachings. The Hippocratic Corpus covers many diverse aspects of medicine, from Hippocrates' medical theories to what he devised to be ethical means of medical practice, to addressing various illnesses. Even though it is considered a singular corpus that represents Hippocratic medicine, they vary (sometimes significantly) in content, age, style, methods, and views practiced; therefore, authorship is largely unknown. The ancient commentaries on this corpus, from writers such as Attalion and Oribasius, are myriad. Hippocrates began Western society's development of medicine, through a delicate blending of the art of healing and scientific observations. What Hippocrates was sharing from within his collection of works was not only how to identify symptoms of disease and proper diagnostic practices, but more essentially, he was alluding to his personable form of art, "The art of true living and the art of fine medicine combined." The Hippocratic Corpus became the foundation upon which Western medical practice was built.

↑ Return to Menu

Text corpus in the context of Indus script

The Indus script, also known as the Harappan script and the Indus Valley script, is a corpus of symbols produced by the Indus Valley Civilisation. Most inscriptions containing these symbols are extremely short, making it difficult to judge whether or not they constituted a writing system used to record a Harappan language, any of which are yet to be identified. Despite many attempts, the "script" has not yet been deciphered. There is no known bilingual inscription to help decipher the script, which shows no significant changes over time. However, some of the syntax (if that is what it may be termed) varies depending upon location.

The first publication of a seal with Harappan symbols dates to 1875, in a drawing by Alexander Cunningham. By 1992, an estimated 4,000 inscribed objects had been discovered, some as far afield as Mesopotamia due to existing Indus–Mesopotamia relations, with over 400 distinct signs represented across known inscriptions.

↑ Return to Menu

Text corpus in the context of Ancient Egyptian literature

Ancient Egyptian literature was written with the Egyptian language from ancient Egypt's pharaonic period until the end of Roman domination. It represents the oldest corpus of Egyptian literature. Along with Sumerian literature, it is considered the world's earliest literature.

Writing in ancient Egypt—both hieroglyphic and hieratic—first appeared in the late 4th millennium BC during the late phase of predynastic Egypt. By the Old Kingdom (26th century BC to 22nd century BC), literary works included funerary texts, epistles and letters, hymns and poems, and commemorative autobiographical texts recounting the careers of prominent administrative officials. It was not until the early Middle Kingdom (21st century BC to 17th century BC) that a narrative Egyptian literature was created. This was a "media revolution" which, according to Richard B. Parkinson, was the result of the rise of an intellectual class of scribes, new cultural sensibilities about individuality, unprecedented levels of literacy, and mainstream access to written materials. The creation of literature was thus an elite exercise, monopolized by a scribal class attached to government offices and the royal court of the ruling pharaoh. However, there is no full consensus among modern scholars concerning the dependence of ancient Egyptian literature on the sociopolitical order of the royal courts.

↑ Return to Menu