Lemmatisation in the context of Computational linguistics

⭐ Core Definition: Lemmatisation

Lemmatization (or less commonly lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.

In computational linguistics, lemmatization is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatization depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighbouring sentences or even an entire document. As a result, developing efficient lemmatization algorithms is an open area of research.

↓ Menu

Lemmatisation in the context of Lemma (morphology)

In morphology and lexicography, a lemma (pl.: lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of word forms. In English, for example, break, breaks, broke, broken and breaking are forms of the same lexeme, with break as the lemma by which they are indexed. Lexeme, in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and lemma refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish, and Russian. The process of determining the lemma for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary.

View the full Wikipedia page for Lemma (morphology)

↑ Return to Menu

Lemmatisation in the context of Computational linguistics

Lemmatisation Study page number 1 of 1

Play TriviaQuestions Online!

Skip to study material about Lemmatisation in the context of "Computational linguistics"

⭐ Core Definition: Lemmatisation

Lemmatisation in the context of Lemma (morphology)