scikit

Lemmatizer
class
Assign the base forms of words.

The Lemmatizer supports simple part-of-speech-sensitive suffix rules and lookup tables.

Lemmatizer.__init__
method

Create a Lemmatizer.

NameTypeDescription
indexdict / NoneInventory of lemmas in the language.
exceptionsdict / NoneMapping of string forms to lemmas that bypass the rules.
rulesdict / NoneList of suffix rewrite rules.
lookupdict / NoneLookup table mapping string to their lemmas.
returnsLemmatizerThe newly created object.

Lemmatizer.__call__
method

Lemmatize a string.

NameTypeDescription
stringunicodeThe string to lemmatize, e.g. the token text.
univ_posunicode / intThe token's universal part-of-speech tag.
morphologydict / None Morphological features following the Universal Dependencies scheme.
returnslistThe available lemmas for the string.

Lemmatizer.lookup
method
v2.0 This feature is new and was introduced in spaCy v2.0

Look up a lemma in the lookup table, if available. If no lemma is found, the original string is returned. Languages can provide a lookup table via the lemma_lookup variable, set on the individual Language class.

NameTypeDescription
stringunicodeThe string to look up.
returnsunicodeThe lemma if the string was found, otherwise the original string.

Lemmatizer.is_base_form
method

Check whether we're dealing with an uninflected paradigm, so we can avoid lemmatization entirely.

NameTypeDescription
univ_posunicode / intThe token's universal part-of-speech tag.
morphologydictThe token's morphological features.
returnsbool Whether the token's part-of-speech tag and morphological features describe a base form.

Attributes

NameTypeDescription
indexdict / NoneInventory of lemmas in the language.
excdict / NoneMapping of string forms to lemmas that bypass the rules.
rulesdict / NoneList of suffix rewrite rules.
lookup_table
v2.0 This feature is new and was introduced in spaCy v2.0
dict / NoneThe lemma lookup table, if available.