scikit

Span
class

A slice from a Doc object.

Span.__init__
method

Create a Span object from the slice doc[start : end].

NameTypeDescription
docDocThe parent document.
startintThe index of the first token of the span.
endintThe index of the first token after the span.
labelintA label to attach to the span, e.g. for named entities.
vectornumpy.ndarray[ndim=1, dtype='float32']A meaning representation of the span.
returnsSpanThe newly constructed object.

Span.__getitem__
method

Get a Token object.

NameTypeDescription
iintThe index of the token within the span.
returnsTokenThe token at span[i].

Get a Span object.

NameTypeDescription
start_endtupleThe slice of the span to get.
returnsSpanThe span at span[start : end].

Span.__iter__
method

Iterate over Token objects.

NameTypeDescription
yieldsTokenA Token object.

Span.__len__
method

Get the number of tokens in the span.

NameTypeDescription
returnsintThe number of tokens in the span.

Span.set_extension
classmethod
v2.0 This feature is new and was introduced in spaCy v2.0

Define a custom attribute on the Span which becomes available via Span._. For details, see the documentation on custom attributes.

NameTypeDescription
nameunicode Name of the attribute to set by the extension. For example, 'my_attr' will be available as span._.my_attr.
default- Optional default value of the attribute if no getter or method is defined.
methodcallable Set a custom method on the object, for example span._.compare(other_span).
gettercallable Getter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute.
settercallable Setter function that takes the Span and a value, and modifies the object. Is called when the user writes to the Span._ attribute.

Span.get_extension
classmethod
v2.0 This feature is new and was introduced in spaCy v2.0

Look up a previously registered extension by name. Returns a 4-tuple (default, method, getter, setter) if the extension is registered. Raises a KeyError otherwise.

NameTypeDescription
nameunicodeName of the extension.
returnstuple A (default, method, getter, setter) tuple of the extension.

Span.has_extension
classmethod
v2.0 This feature is new and was introduced in spaCy v2.0

Check whether an extension has been registered on the Span class.

NameTypeDescription
nameunicodeName of the extension to check.
returnsboolWhether the extension has been registered.

Span.remove_extension
classmethod
v2.0.12 This feature is new and was introduced in spaCy v2.0.12

Remove a previously registered extension.

NameTypeDescription
nameunicodeName of the extension.
returnstuple A (default, method, getter, setter) tuple of the removed extension.

Span.similarity
method
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

Make a semantic similarity estimate. The default estimate is cosine similarity using an average of word vectors.

NameTypeDescription
other- The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects.
returnsfloatA scalar similarity score. Higher is more similar.

Span.get_lca_matrix
method

Calculates the lowest common ancestor matrix for a given Span. Returns LCA matrix containing the integer index of the ancestor, or -1 if no common ancestor is found, e.g. if span excludes a necessary ancestor.

NameTypeDescription
returnsnumpy.ndarray[ndim=2, dtype='int32']The lowest common ancestor matrix of the Span.

Span.to_array
method
v2.0 This feature is new and was introduced in spaCy v2.0

Given a list of M attribute IDs, export the tokens to a numpy ndarray of shape (N, M), where N is the length of the document. The values will be 32-bit integers.

NameTypeDescription
attr_idslistA list of attribute ID ints.
returnsnumpy.ndarray[long, ndim=2] A feature matrix, with one row per word, and one column per attribute indicated in the input attr_ids.

Span.merge
method

Retokenize the document, such that the span is merged into a single token.

NameTypeDescription
**attributes- Attributes to assign to the merged token. By default, attributes are inherited from the syntactic root token of the span.
returnsTokenThe newly merged token.

Span.ents
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: NER.

Iterate over the entities in the span. Yields named-entity Span objects, if the entity recognizer has been applied to the parent document.

NameTypeDescription
yieldsSpanEntities in the document.

Span.as_doc

Create a Doc object view of the Span's data. Mostly useful for C-typed interfaces.

NameTypeDescription
returnsDocA Doc object of the Span's content.

Span.root
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: parse.

The token within the span that's highest in the parse tree. If there's a tie, the earliest is preferred.

NameTypeDescription
returnsTokenThe root token.

Span.lefts
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: parse.

Tokens that are to the left of the span, whose heads are within the span.

NameTypeDescription
yieldsTokenA left-child of a token of the span.

Span.rights
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: parse.

Tokens that are to the right of the span, whose heads are within the span.

NameTypeDescription
yieldsTokenA right-child of a token of the span.

Span.n_lefts
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: parse.

The number of tokens that are to the left of the span, whose heads are within the span.

NameTypeDescription
returnsintThe number of left-child tokens.

Span.n_rights
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: parse.

The number of tokens that are to the right of the span, whose heads are within the span.

NameTypeDescription
returnsintThe number of right-child tokens.

Span.subtree
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: parse.

Tokens that descend from tokens in the span, but fall outside it.

NameTypeDescription
yieldsTokenA descendant of a token within the span.

Span.has_vector
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

A boolean value indicating whether a word vector is associated with the object.

NameTypeDescription
returnsboolWhether the span has a vector data attached.

Span.vector
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

A real-valued meaning representation. Defaults to an average of the token vectors.

NameTypeDescription
returnsnumpy.ndarray[ndim=1, dtype='float32']A 1D numpy array representing the span's semantics.

Span.vector_norm
property
Needs model To use this functionality, spaCy needs a model to be installed that supports the following capabilities: vectors.

The L2 norm of the span's vector representation.

NameTypeDescription
returnsfloatThe L2 norm of the vector representation.

Attributes

NameTypeDescription
docDocThe parent document.
sentSpanThe sentence span that this span is a part of.
startintThe token offset for the start of the span.
endintThe token offset for the end of the span.
start_charintThe character offset for the start of the span.
end_charintThe character offset for the end of the span.
textunicodeA unicode representation of the span text.
text_with_wsunicode The text content of the span with a trailing whitespace character if the last token has one.
orthintID of the verbatim text content.
orth_unicode Verbatim text content (identical to Span.text). Exists mostly for consistency with the other attributes.
labelintThe span's label.
label_unicodeThe span's label.
lemma_unicodeThe span's lemma.
ent_idintThe hash value of the named entity the token is an instance of.
ent_id_unicodeThe string ID of the named entity the token is an instance of.
sentimentfloat A scalar value indicating the positivity or negativity of the span.
_Underscore User space for adding custom attribute extensions.