Cython

Cython Classes

Doc cdef class

The Doc object holds an array of TokenC structs.

Attributes

NameDescription
memA memory pool. Allocated memory will be freed once the Doc object is garbage collected. cymem.Pool
vocabA reference to the shared Vocab object. Vocab
cA pointer to a TokenC struct. TokenC*
lengthThe number of tokens in the document. int
max_lengthThe underlying size of the Doc.c array. int

Doc.push_back method

Append a token to the Doc. The token can be provided as a LexemeC or TokenC pointer, using Cython’s fused types.

NameDescription
lex_or_tokThe word to append to the Doc. LexemeOrToken
has_spaceWhether the word has trailing whitespace. bint

Token cdef class

A Cython class providing access and methods for a TokenC struct. Note that the Token object does not own the struct. It only receives a pointer to it.

Attributes

NameDescription
vocabA reference to the shared Vocab object. Vocab
cA pointer to a TokenC struct. TokenC*
iThe offset of the token within the document. int
docThe parent document. Doc

Token.cinit method

Create a Token object from a TokenC* pointer.

NameDescription
vocabA reference to the shared Vocab. Vocab
cA pointer to a TokenC struct. TokenC*
offsetThe offset of the token within the document. int
docThe parent document. int

Span cdef class

A Cython class providing access and methods for a slice of a Doc object.

Attributes

NameDescription
docThe parent document. Doc
startThe index of the first token of the span. int
endThe index of the first token after the span. int
start_charThe index of the first character of the span. int
end_charThe index of the last character of the span. int
labelA label to attach to the span, e.g. for named entities. attr_t (uint64_t)

Lexeme cdef class

A Cython class providing access and methods for an entry in the vocabulary.

Attributes

NameDescription
cA pointer to a LexemeC struct. LexemeC*
vocabA reference to the shared Vocab object. Vocab
orthID of the verbatim text content. attr_t (uint64_t)

Vocab cdef class

A Cython class providing access and methods for a vocabulary and other data shared across a language.

Attributes

NameDescription
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
stringsA StringStore that maps string to hash values and vice versa. StringStore
lengthThe number of entries in the vocabulary. int

Vocab.get method

Retrieve a LexemeC* pointer from the vocabulary.

NameDescription
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
stringThe string of the word to look up. str

Vocab.get_by_orth method

Retrieve a LexemeC* pointer from the vocabulary.

NameDescription
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
orthID of the verbatim text content. attr_t (uint64_t)

StringStore cdef class

A lookup table to retrieve strings by 64-bit hashes.

Attributes

NameDescription
memA memory pool. Allocated memory will be freed once the StringStore object is garbage collected. cymem.Pool
keysA list of hash values in the StringStore. vector[hash_t] (vector[uint64_t])