scikit

Lexeme
class

An entry in the vocabulary. A Lexeme has no string context – it's a word type, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse, or lemma (if lemmatization depends on the part-of-speech tag).

Lexeme.__init__
method

Create a Lexeme object.

NameTypeDescription
vocabVocabThe parent vocabulary.
orthintThe orth id of the lexeme.
returnsLexemeThe newly constructed object.

Lexeme.set_flag
method

Change the value of a boolean flag.

NameTypeDescription
flag_idintThe attribute ID of the flag to set.
valueboolThe new value of the flag.

Lexeme.check_flag
method

Check the value of a boolean flag.

NameTypeDescription
flag_idintThe attribute ID of the flag to query.
returnsboolThe value of the flag.

Lexeme.similarity
method
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

Compute a semantic similarity estimate. Defaults to cosine over vectors.

NameTypeDescription
other- The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects.
returnsfloatA scalar similarity score. Higher is more similar.

Lexeme.has_vector
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

A boolean value indicating whether a word vector is associated with the lexeme.

NameTypeDescription
returnsboolWhether the lexeme has a vector data attached.

Lexeme.vector
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

A real-valued meaning representation.

NameTypeDescription
returnsnumpy.ndarray[ndim=1, dtype='float32']A 1D numpy array representing the lexeme's semantics.

Lexeme.vector_norm
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

The L2 norm of the lexeme's vector representation.

NameTypeDescription
returnsfloatThe L2 norm of the vector representation.

Attributes

NameTypeDescription
vocabVocabThe lexeme's vocabulary.
textunicodeVerbatim text content.
orthintID of the verbatim text content.
orth_unicode Verbatim text content (identical to Lexeme.text). Existst mostly for consistency with the other attributes.
lex_idintID of the lexeme's lexical type.
rankint Sequential ID of the lexemes's lexical type, used to index into tables, e.g. for word vectors.
flagsintContainer of the lexeme's binary flags.
normintThe lexemes's norm, i.e. a normalised form of the lexeme text.
norm_unicodeThe lexemes's norm, i.e. a normalised form of the lexeme text.
lowerintLowercase form of the word.
lower_unicodeLowercase form of the word.
shapeintTransform of the word's string, to show orthographic features.
shape_unicodeTransform of the word's string, to show orthographic features.
prefixint Length-N substring from the start of the word. Defaults to N=1.
prefix_unicode Length-N substring from the start of the word. Defaults to N=1.
suffixint Length-N substring from the end of the word. Defaults to N=3.
suffix_unicode Length-N substring from the start of the word. Defaults to N=3.
is_alphabool Does the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha().
is_asciibool Does the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)].
is_digitbool Does the lexeme consist of digits? Equivalent to lexeme.text.isdigit().
is_lowerbool Is the lexeme in lowercase? Equivalent to lexeme.text.islower().
is_upperbool Is the lexeme in uppercase? Equivalent to lexeme.text.isupper().
is_titlebool Is the lexeme in titlecase? Equivalent to lexeme.text.istitle().
is_punctboolIs the lexeme punctuation?
is_left_punctboolIs the lexeme a left punctuation mark, e.g. (?
is_right_punctboolIs the lexeme a right punctuation mark, e.g. ]?
is_spacebool Does the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace().
is_bracketboolIs the lexeme a bracket?
is_quoteboolIs the lexeme a quotation mark?
is_currency
v2.0.8 This feature is new and was introduced in spaCy v2.0.8
boolIs the lexeme a currency symbol?
like_urlboolDoes the lexeme resemble a URL?
like_numboolDoes the lexeme represent a number? e.g. "10.9", "10", "ten", etc.
like_emailboolDoes the lexeme resemble an email address?
is_oovboolIs the lexeme out-of-vocabulary?
is_stopboolIs the lexeme part of a "stop list"?
langintLanguage of the parent vocabulary.
lang_unicodeLanguage of the parent vocabulary.
probfloatSmoothed log probability estimate of the lexeme's type.
clusterintBrown cluster ID.
sentimentfloat A scalar value indicating the positivity or negativity of the lexeme.