Corpus linguistics in the context of "Distributionalism"

Play Trivia Questions online!

or

Skip to study material about Corpus linguistics in the context of "Distributionalism"

Ad spacer

⭐ Core Definition: Corpus linguistics

Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.

Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. Large collections of text, though corpora may also be small in terms of running words, allow linguists to run quantitative analyses on linguistic concepts that may be difficult to test in a qualitative manner.

↓ Menu

>>>PUT SHARE BUTTONS HERE<<<

👉 Corpus linguistics in the context of Distributionalism

Distributionalism is a general theory of language and a discovery procedure for establishing elements and structures of language based on observed usage. The purpose of distributionalism was to provide a scientific basis for syntax as independent of meaning. Zellig Harris defined 'distribution' as follows.

Based on this idea, an analysis of immediate constituents could be based on observing the environments in which an element, such as a word, appears in corpora.

↓ Explore More Topics
In this Dossier

Corpus linguistics in the context of Text corpus

In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Annotated, they have been used in corpus linguistics for statistical hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

↑ Return to Menu

Corpus linguistics in the context of A Dictionary of Modern English Usage

A Dictionary of Modern English Usage (1926), by H. W. Fowler (1858–1933), is a style guide to British English usage and writing. It covers a wide range of topics that relate to usage, including: plurals, nouns, verbs, punctuation, cases, parentheses, quotation marks, the use of foreign terms, and so on. The dictionary became the standard for other style guides to writing in English. The 1926 first edition remains in print, along with the 1965 second edition, which was edited by Ernest Gowers, and was reprinted in 1983 and 1987. The 1996 third edition, re-titled The New Fowler's Modern English Usage, and revised in 2004, was mostly rewritten by Robert W. Burchfield as a usage dictionary that incorporated corpus linguistics data. The 2015 fourth edition, re-titled Fowler's Dictionary of Modern English Usage, was edited by Jeremy Butterfield as a usage dictionary. Informally, readers refer to the style guide and dictionary as Fowler's Modern English Usage, Fowler, and Fowler's.

↑ Return to Menu

Corpus linguistics in the context of Collocation

In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.

There are about seven main types of collocations: adjective + noun, noun + noun (such as collective nouns), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb.

↑ Return to Menu

Corpus linguistics in the context of Hapax legomenon

In corpus linguistics, a hapax legomenon (/ˈhæpəks lɪˈɡɒmɪnɒn/ also /ˈhæpæks/ or /ˈhpæks/; pl. hapax legomena; sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "said once".

The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively (/ˈdɪs/, /ˈtrɪs/, /ˈtɛtrəkɪs/) refer to double, triple, or quadruple occurrences, but are far less commonly used.

↑ Return to Menu

Corpus linguistics in the context of New Oxford American Dictionary

The New Oxford American Dictionary (NOAD) is a single-volume dictionary of American English compiled by American editors at the Oxford University Press.

NOAD is based upon the New Oxford Dictionary of English (NODE), published in the United Kingdom in 1998, although with substantial editing, additional entries, and the inclusion of illustrations. It is based on a corpus linguistics analysis of Oxford's 200 million word database of contemporary American English.

↑ Return to Menu