Containers

Span

class

A slice from a Doc object.

Span.__init__ method

Create a Span object from the slice doc[start : end].

NameDescription
docThe parent document. Doc
startThe index of the first token of the span. int
endThe index of the first token after the span. int
labelA label to attach to the span, e.g. for named entities. Union[str, int]
vectorA meaning representation of the span. numpy.ndarray[ndim=1, dtype=float32]
vector_normThe L2 norm of the document’s vector representation. float
kb_idA knowledge base ID to attach to the span, e.g. for named entities. Union[str, int]
span_idAn ID to associate with the span. Union[str, int]

Span.__getitem__ method

Get a Token object.

NameDescription
iThe index of the token within the span. int

Get a Span object.

NameDescription
start_endThe slice of the span to get. Tuple[int, int]

Span.__iter__ method

Iterate over Token objects.

NameDescription

Span.__len__ method

Get the number of tokens in the span.

NameDescription

Span.set_extension classmethod

Define a custom attribute on the Span which becomes available via Span._. For details, see the documentation on custom attributes.

NameDescription
nameName of the attribute to set by the extension. For example, "my_attr" will be available as span._.my_attr. str
defaultOptional default value of the attribute if no getter or method is defined. Optional[Any]
methodSet a custom method on the object, for example span._.compare(other_span). Optional[Callable[[Span,], Any]]
getterGetter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute. Optional[Callable[[Span], Any]]
setterSetter function that takes the Span and a value, and modifies the object. Is called when the user writes to the Span._ attribute. Optional[Callable[[Span, Any], None]]
forceForce overwriting existing attribute. bool

Span.get_extension classmethod

Look up a previously registered extension by name. Returns a 4-tuple (default, method, getter, setter) if the extension is registered. Raises a KeyError otherwise.

NameDescription
nameName of the extension. str

Span.has_extension classmethod

Check whether an extension has been registered on the Span class.

NameDescription
nameName of the extension to check. str

Span.remove_extension classmethod

Remove a previously registered extension.

NameDescription
nameName of the extension. str

Span.char_span method

Create a Span object from the slice span.text[start:end]. Returns None if the character indices don’t map to a valid span.

NameDescription
startThe index of the first character of the span. int
endThe index of the last character after the span. int
labelA label to attach to the span, e.g. for named entities. Union[int, str]
kb_idAn ID from a knowledge base to capture the meaning of a named entity. Union[int, str]
vectorA meaning representation of the span. numpy.ndarray[ndim=1, dtype=float32]
idUnused. Union[int, str]
alignment_mode v3.5.1How character indices snap to token boundaries. Options: "strict" (no snapping), "contract" (span of all tokens completely within the character span), "expand" (span of all tokens at least partially covered by the character span). Defaults to "strict". str
span_id v3.5.1An identifier to associate with the span. Union[int, str]

Span.similarity methodNeeds model

Make a semantic similarity estimate. The default estimate is cosine similarity using an average of word vectors.

NameDescription
otherThe object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc,Span,Token,Lexeme]

Span.get_lca_matrix method

Calculates the lowest common ancestor matrix for a given Span. Returns LCA matrix containing the integer index of the ancestor, or -1 if no common ancestor is found, e.g. if span excludes a necessary ancestor.

NameDescription

Span.to_array method

Given a list of M attribute IDs, export the tokens to a numpy ndarray of shape (N, M), where N is the length of the document. The values will be 32-bit integers.

NameDescription
attr_idsA list of attributes (int IDs or string names) or a single attribute (int ID or string name). Union[int, str, List[Union[int, str]]]

Span.ents propertyNeeds model

The named entities that fall completely within the span. Returns a tuple of Span objects.

NameDescription

Span.noun_chunks propertyNeeds model

Iterate over the base noun phrases in the span. Yields base noun-phrase Span objects, if the document has been syntactically parsed. A base noun phrase, or “NP chunk”, is a noun phrase that does not permit other NPs to be nested within it – so no NP-level coordination, no prepositional phrases, and no relative clauses.

If the noun_chunk syntax iterator has not been implemented for the given language, a NotImplementedError is raised.

NameDescription

Span.as_doc method

Create a new Doc object corresponding to the Span, with a copy of the data.

When calling this on many spans from the same doc, passing in a precomputed array representation of the doc using the array_head and array args can save time.

NameDescription
copy_user_dataWhether or not to copy the original doc’s user data. bool
array_headPrecomputed array attributes (headers) of the original doc, as generated by Doc._get_array_attrs(). Tuple
arrayPrecomputed array version of the original doc as generated by Doc.to_array. numpy.ndarray

Span.root propertyNeeds model

The token with the shortest path to the root of the sentence (or the root itself). If multiple tokens are equally high in the tree, the first token is taken.

NameDescription

Span.conjuncts propertyNeeds model

A tuple of tokens coordinated to span.root.

NameDescription

Span.lefts propertyNeeds model

Tokens that are to the left of the span, whose heads are within the span.

NameDescription

Span.rights propertyNeeds model

Tokens that are to the right of the span, whose heads are within the span.

NameDescription

Span.n_lefts propertyNeeds model

The number of tokens that are to the left of the span, whose heads are within the span.

NameDescription

Span.n_rights propertyNeeds model

The number of tokens that are to the right of the span, whose heads are within the span.

NameDescription

Span.subtree propertyNeeds model

Tokens within the span and tokens which descend from them.

NameDescription

Span.has_vector propertyNeeds model

A boolean value indicating whether a word vector is associated with the object.

NameDescription

Span.vector propertyNeeds model

A real-valued meaning representation. Defaults to an average of the token vectors.

NameDescription

Span.vector_norm propertyNeeds model

The L2 norm of the span’s vector representation.

NameDescription

Span.sent propertyNeeds model

The sentence span that this span is a part of. This property is only available when sentence boundaries have been set on the document by the parser, senter, sentencizer or some custom function. It will raise an error otherwise.

If the span happens to cross sentence boundaries, only the first sentence will be returned. If it is required that the sentence always includes the full span, the result can be adjusted as such:

NameDescription

Span.sents propertyv3.2.1Needs model

Returns a generator over the sentences the span belongs to. This property is only available when sentence boundaries have been set on the document by the parser, senter, sentencizer or some custom function. It will raise an error otherwise.

If the span happens to cross sentence boundaries, all sentences the span overlaps with will be returned.

NameDescription

Attributes

NameDescription
docThe parent document. Doc
tensorThe span’s slice of the parent Doc’s tensor. numpy.ndarray
startThe token offset for the start of the span. int
endThe token offset for the end of the span. int
start_charThe character offset for the start of the span. int
end_charThe character offset for the end of the span. int
textA string representation of the span text. str
text_with_wsThe text content of the span with a trailing whitespace character if the last token has one. str
orthID of the verbatim text content. int
orth_Verbatim text content (identical to Span.text). Exists mostly for consistency with the other attributes. str
labelThe hash value of the span’s label. int
label_The span’s label. str
lemma_The span’s lemma. Equivalent to "".join(token.text_with_ws for token in span). str
kb_idThe hash value of the knowledge base ID referred to by the span. int
kb_id_The knowledge base ID referred to by the span. str
ent_idThe hash value of the named entity the root token is an instance of. int
ent_id_The string ID of the named entity the root token is an instance of. str
idThe hash value of the span’s ID. int
id_The span’s ID. str
sentimentA scalar value indicating the positivity or negativity of the span. float
_User space for adding custom attribute extensions. Underscore