Span

A slice from a Doc object.

Attributes

NameTypeDescription
docDocThe parent document.
startintThe token offset for the start of the span.
endintThe token offset for the end of the span.
start_charintThe character offset for the start of the span.
end_charintThe character offset for the end of the span.
labelintThe span's label.
label_unicodeThe span's label.
lemma_unicodeThe span's lemma.
ent_idintThe integer ID of the named entity the token is an instance of.
ent_id_unicodeThe string ID of the named entity the token is an instance of.

Span.__init__

Create a Span object from the slice doc[start : end].

NameTypeDescription
docDocThe parent document.
startintThe index of the first token of the span.
endintThe index of the first token after the span.
labelintA label to attach to the span, e.g. for named entities.
vectornumpy.ndarray[ndim=1, dtype='float32']A meaning representation of the span.
returnSpanThe newly constructed object.

Span.__getitem__

Get a Token object.

NameTypeDescription
iintThe index of the token within the span.
returnTokenThe token at span[i].

Get a Span object.

NameTypeDescription
start_endtupleThe slice of the span to get.
returnSpanThe span at span[start : end].

Span.__iter__

Iterate over Token objects.

NameTypeDescription
yieldTokenA Token object.

Span.__len__

Get the number of tokens in the span.

NameTypeDescription
returnintThe number of tokens in the span.

Span.similarity

Make a semantic similarity estimate. The default estimate is cosine similarity using an average of word vectors.

NameTypeDescription
other- The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects.
returnfloatA scalar similarity score. Higher is more similar.

Span.merge

Retokenize the document, such that the span is merged into a single token.

NameTypeDescription
**attributes- Attributes to assign to the merged token. By default, attributes are inherited from the syntactic root token of the span.
returnTokenThe newly merged token.

Span.text

A unicode representation of the span text.

NameTypeDescription
returnunicodeThe original verbatim text of the span.

Span.text_with_ws

The text content of the span with a trailing whitespace character if the last token has one.

NameTypeDescription
returnunicodeThe text content of the span (with trailing whitespace).

Span.sent

The sentence span that this span is a part of.

NameTypeDescription
returnSpanThe sentence this is part of.

Span.root

The token within the span that's highest in the parse tree. If there's a tie, the earlist is prefered.

NameTypeDescription
returnTokenThe root token.

Span.lefts

Tokens that are to the left of the span, whose head is within the span.

NameTypeDescription
yieldTokenA left-child of a token of the span.

Span.rights

Tokens that are to the right of the span, whose head is within the span.

NameTypeDescription
yieldTokenA right-child of a token of the span.

Span.subtree

Tokens that descend from tokens in the span, but fall outside it.

NameTypeDescription
yieldTokenA descendant of a token within the span.