Containers

Span

class

A slice from a Doc object.

Span.__init__ method

Create a Span object from the slice doc[start : end].

NameTypeDescription
docDocThe parent document.
startintThe index of the first token of the span.
endintThe index of the first token after the span.
labelint / unicodeA label to attach to the span, e.g. for named entities. As of v2.1, the label can also be a unicode string.
vectornumpy.ndarray[ndim=1, dtype='float32']A meaning representation of the span.

Span.__getitem__ method

Get a Token object.

NameTypeDescription
iintThe index of the token within the span.

Get a Span object.

NameTypeDescription
start_endtupleThe slice of the span to get.

Span.__iter__ method

Iterate over Token objects.

NameTypeDescription

Span.__len__ method

Get the number of tokens in the span.

NameTypeDescription

Span.set_extension classmethodv2.0

Define a custom attribute on the Span which becomes available via Span._. For details, see the documentation on custom attributes.

NameTypeDescription
nameunicodeName of the attribute to set by the extension. For example, 'my_attr' will be available as span._.my_attr.
default-Optional default value of the attribute if no getter or method is defined.
methodcallableSet a custom method on the object, for example span._.compare(other_span).
gettercallableGetter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute.
settercallableSetter function that takes the Span and a value, and modifies the object. Is called when the user writes to the Span._ attribute.

Span.get_extension classmethodv2.0

Look up a previously registered extension by name. Returns a 4-tuple (default, method, getter, setter) if the extension is registered. Raises a KeyError otherwise.

NameTypeDescription
nameunicodeName of the extension.

Span.has_extension classmethodv2.0

Check whether an extension has been registered on the Span class.

NameTypeDescription
nameunicodeName of the extension to check.

Span.remove_extension classmethodv2.0.12

Remove a previously registered extension.

NameTypeDescription
nameunicodeName of the extension.

Span.similarity methodNeeds model

Make a semantic similarity estimate. The default estimate is cosine similarity using an average of word vectors.

NameTypeDescription
other-The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects.

Span.get_lca_matrix method

Calculates the lowest common ancestor matrix for a given Span. Returns LCA matrix containing the integer index of the ancestor, or -1 if no common ancestor is found, e.g. if span excludes a necessary ancestor.

NameTypeDescription

Span.to_array methodv2.0

Given a list of M attribute IDs, export the tokens to a numpy ndarray of shape (N, M), where N is the length of the document. The values will be 32-bit integers.

NameTypeDescription
attr_idslistA list of attribute ID ints.

Span.merge method

Retokenize the document, such that the span is merged into a single token.

NameTypeDescription
**attributes-Attributes to assign to the merged token. By default, attributes are inherited from the syntactic root token of the span.

Span.ents propertyv2.0.12Needs model

The named entities in the span. Returns a tuple of named entity Span objects, if the entity recognizer has been applied.

NameTypeDescription

Span.as_doc method

Create a new Doc object corresponding to the Span, with a copy of the data.

NameTypeDescription

Span.root propertyNeeds model

The token with the shortest path to the root of the sentence (or the root itself). If multiple tokens are equally high in the tree, the first token is taken.

NameTypeDescription

Span.conjuncts propertyNeeds model

A tuple of tokens coordinated to span.root.

NameTypeDescription

Span.lefts propertyNeeds model

Tokens that are to the left of the span, whose heads are within the span.

NameTypeDescription

Span.rights propertyNeeds model

Tokens that are to the right of the span, whose heads are within the span.

NameTypeDescription

Span.n_lefts propertyNeeds model

The number of tokens that are to the left of the span, whose heads are within the span.

NameTypeDescription

Span.n_rights propertyNeeds model

The number of tokens that are to the right of the span, whose heads are within the span.

NameTypeDescription

Span.subtree propertyNeeds model

Tokens within the span and tokens which descend from them.

NameTypeDescription

Span.has_vector propertyNeeds model

A boolean value indicating whether a word vector is associated with the object.

NameTypeDescription

Span.vector propertyNeeds model

A real-valued meaning representation. Defaults to an average of the token vectors.

NameTypeDescription

Span.vector_norm propertyNeeds model

The L2 norm of the span’s vector representation.

NameTypeDescription

Attributes

NameTypeDescription
docDocThe parent document.
sentSpanThe sentence span that this span is a part of.
startintThe token offset for the start of the span.
endintThe token offset for the end of the span.
start_charintThe character offset for the start of the span.
end_charintThe character offset for the end of the span.
textunicodeA unicode representation of the span text.
text_with_wsunicodeThe text content of the span with a trailing whitespace character if the last token has one.
orthintID of the verbatim text content.
orth_unicodeVerbatim text content (identical to Span.text). Exists mostly for consistency with the other attributes.
labelintThe span’s label.
label_unicodeThe span’s label.
lemma_unicodeThe span’s lemma.
ent_idintThe hash value of the named entity the token is an instance of.
ent_id_unicodeThe string ID of the named entity the token is an instance of.
sentimentfloatA scalar value indicating the positivity or negativity of the span.
_UnderscoreUser space for adding custom attribute extensions.