Containers

Lexeme

class
An entry in the vocabulary

A Lexeme has no string context – it’s a word type, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse, or lemma (if lemmatization depends on the part-of-speech tag).

Lexeme.__init__ method

Create a Lexeme object.

NameDescription
vocabThe parent vocabulary. Vocab
orthThe orth id of the lexeme. int

Lexeme.set_flag method

Change the value of a boolean flag.

NameDescription
flag_idThe attribute ID of the flag to set. int
valueThe new value of the flag. bool

Lexeme.check_flag method

Check the value of a boolean flag.

NameDescription
flag_idThe attribute ID of the flag to query. int

Lexeme.similarity methodNeeds model

Compute a semantic similarity estimate. Defaults to cosine over vectors.

NameDescription
otherThe object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc,Span,Token,Lexeme]

Lexeme.has_vector propertyNeeds model

A boolean value indicating whether a word vector is associated with the lexeme.

NameDescription

Lexeme.vector propertyNeeds model

A real-valued meaning representation.

NameDescription

Lexeme.vector_norm propertyNeeds model

The L2 norm of the lexeme’s vector representation.

NameDescription

Attributes

NameDescription
vocabThe lexeme’s vocabulary. Vocab
textVerbatim text content. str
orthID of the verbatim text content. int
orth_Verbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes. str
rankSequential ID of the lexeme’s lexical type, used to index into tables, e.g. for word vectors. int
flagsContainer of the lexeme’s binary flags. int
normThe lexeme’s norm, i.e. a normalized form of the lexeme text. int
norm_The lexeme’s norm, i.e. a normalized form of the lexeme text. str
lowerLowercase form of the word. int
lower_Lowercase form of the word. str
shapeTransform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". int
shape_Transform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". str
prefixLength-N substring from the start of the word. Defaults to N=1. int
prefix_Length-N substring from the start of the word. Defaults to N=1. str
suffixLength-N substring from the end of the word. Defaults to N=3. int
suffix_Length-N substring from the end of the word. Defaults to N=3. str
is_alphaDoes the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha(). bool
is_asciiDoes the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)]. bool
is_digitDoes the lexeme consist of digits? Equivalent to lexeme.text.isdigit(). bool
is_lowerIs the lexeme in lowercase? Equivalent to lexeme.text.islower(). bool
is_upperIs the lexeme in uppercase? Equivalent to lexeme.text.isupper(). bool
is_titleIs the lexeme in titlecase? Equivalent to lexeme.text.istitle(). bool
is_punctIs the lexeme punctuation? bool
is_left_punctIs the lexeme a left punctuation mark, e.g. (? bool
is_right_punctIs the lexeme a right punctuation mark, e.g. )? bool
is_spaceDoes the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace(). bool
is_bracketIs the lexeme a bracket? bool
is_quoteIs the lexeme a quotation mark? bool
is_currencyIs the lexeme a currency symbol? bool
like_urlDoes the lexeme resemble a URL? bool
like_numDoes the lexeme represent a number? e.g. “10.9”, “10”, “ten”, etc. bool
like_emailDoes the lexeme resemble an email address? bool
is_oovIs the lexeme out-of-vocabulary (i.e. does it not have a word vector)? bool
is_stopIs the lexeme part of a “stop list”? bool
langLanguage of the parent vocabulary. int
lang_Language of the parent vocabulary. str
probSmoothed log probability estimate of the lexeme’s word type (context-independent entry in the vocabulary). float
clusterBrown cluster ID. int
sentimentA scalar value indicating the positivity or negativity of the lexeme. float