Containers

Example

classv3
A training instance

An Example holds the information for one training instance. It stores two Doc objects: one for holding the gold-standard reference data, and one for holding the predictions of the pipeline. An Alignment object stores the alignment between these two documents, as they can differ in tokenization.

Example.__init__ method

Construct an Example object from the predicted document and the reference document. If alignment is None, it will be initialized from the words in both documents.

NameDescription
predictedThe document containing (partial) predictions. Cannot be None. Doc
referenceThe document containing gold-standard annotations. Cannot be None. Doc
keyword-only
alignmentAn object holding the alignment between the tokens of the predicted and reference documents. Optional[Alignment]

Example.from_dict classmethod

Construct an Example object from the predicted document and the reference annotations provided as a dictionary. For more details on the required format, see the training format documentation.

NameDescription
predictedThe document containing (partial) predictions. Cannot be None. Doc
example_dictThe gold-standard annotations as a dictionary. Cannot be None. Dict[str, Any]

Example.text property

The text of the predicted document in this Example.

NameDescription

Example.predicted property

The Doc holding the predictions. Occasionally also referred to as example.x.

NameDescription

Example.reference property

The Doc holding the gold-standard annotations. Occasionally also referred to as example.y.

NameDescription

Example.alignment property

The Alignment object mapping the tokens of the predicted document to those of the reference document.

NameDescription

Example.get_aligned method

Get the aligned view of a certain token attribute, denoted by its int ID or string name.

NameDescription
fieldAttribute ID or string name. Union[int, str]
as_stringWhether or not to return the list of values as strings. Defaults to False. bool

Example.get_aligned_parse method

Get the aligned view of the dependency parse. If projectivize is set to True, non-projective dependency trees are made projective through the Pseudo-Projective Dependency Parsing algorithm by Nivre and Nilsson (2005).

NameDescription
projectivizeWhether or not to projectivize the dependency trees. Defaults to True. bool

Example.get_aligned_ner method

Get the aligned view of the NER BILUO tags.

NameDescription

Example.get_aligned_spans_y2x method

Get the aligned view of any set of Span objects defined over Example.reference. The resulting span indices will align to the tokenization in Example.predicted.

NameDescription
y_spansSpan objects aligned to the tokenization of reference. Iterable[Span]
allow_overlapWhether the resulting Span objects may overlap or not. Set to False by default. bool

Example.get_aligned_spans_x2y method

Get the aligned view of any set of Span objects defined over Example.predicted. The resulting span indices will align to the tokenization in Example.reference. This method is particularly useful to assess the accuracy of predicted entities against the original gold-standard annotation.

NameDescription
x_spansSpan objects aligned to the tokenization of predicted. Iterable[Span]
allow_overlapWhether the resulting Span objects may overlap or not. Set to False by default. bool

Example.to_dict method

Return a dictionary representation of the reference annotation contained in this Example.

NameDescription

Example.split_sents method

Split one Example into multiple Example objects, one for each sentence.

NameDescription

Alignment v3.0

Calculate alignment tables between two tokenizations.

Alignment attributes

Alignment attributes are managed using AlignmentArray, which is a simplified version of Thinc’s Ragged type that only supports the data and length attributes.

NameDescription
x2yThe AlignmentArray object holding the alignment from x to y. AlignmentArray
y2xThe AlignmentArray object holding the alignment from y to x. AlignmentArray

Alignment.from_strings function

NameDescription
AString values of candidate tokens to align. List[str]
BString values of reference tokens to align. List[str]