Token attributes

Token attributes are specified using internal IDs in many places including:

All methods automatically convert between the string version of an ID ("DEP") and the internal integer symbols (DEP). The internal IDs can be imported from spacy.attrs or retrieved from the StringStore. A map from string attribute names to internal attribute IDs is stored in spacy.attrs.IDS.

The corresponding Token object attributes can be accessed using the same names in lowercase, e.g. token.orth or token.length. For attributes that represent string values, the internal integer ID is accessed as Token.attr, e.g. token.dep, while the string value can be retrieved by appending _ as in token.dep_.

DEPThe token’s dependency label. str
ENT_IDThe token’s entity ID (ent_id). str
ENT_IOBThe IOB part of the token’s entity tag. Uses custom integer vaues rather than the string store: unset is 0, I is 1, O is 2, and B is 3. str
ENT_KB_IDThe token’s entity knowledge base ID. str
ENT_TYPEThe token’s entity label. str
IS_ALPHAToken text consists of alphabetic characters. bool
IS_ASCIIToken text consists of ASCII characters. bool
IS_DIGITToken text consists of digits. bool
IS_LOWERToken text is in lowercase. bool
IS_PUNCTToken is punctuation. bool
IS_SPACEToken is whitespace. bool
IS_STOPToken is a stop word. bool
IS_TITLEToken text is in titlecase. bool
IS_UPPERToken text is in uppercase. bool
LEMMAThe token’s lemma. str
LENGTHThe length of the token text. int
LIKE_EMAILToken text resembles an email address. bool
LIKE_NUMToken text resembles a number. bool
LIKE_URLToken text resembles a URL. bool
LOWERThe lowercase form of the token text. str
MORPHThe token’s morphological analysis. MorphAnalysis
NORMThe normalized form of the token text. str
ORTHThe exact verbatim text of a token. str
POSThe token’s universal part of speech (UPOS). str
SENT_STARTToken is start of sentence. bool
SHAPEThe token’s shape. str
SPACYToken has a trailing space. bool
TAGThe token’s fine-grained part of speech. str