Example
An Example holds the information for one training instance. It stores two
Doc objects: one for holding the gold-standard reference data, and one for
holding the predictions of the pipeline. An
Alignment object stores the alignment between
these two documents, as they can differ in tokenization.
Example.__init__ method
Construct an Example object from the predicted document and the reference
document. If alignment is None, it will be initialized from the words in
both documents.
| Name | Description |
|---|---|
predicted | The document containing (partial) predictions. Cannot be None. Doc |
reference | The document containing gold-standard annotations. Cannot be None. Doc |
| keyword-only | |
alignment | An object holding the alignment between the tokens of the predicted and reference documents. Optional[Alignment] |
Example.from_dict classmethod
Construct an Example object from the predicted document and the reference
annotations provided as a dictionary. For more details on the required format,
see the training format documentation.
| Name | Description |
|---|---|
predicted | The document containing (partial) predictions. Cannot be None. Doc |
example_dict | The gold-standard annotations as a dictionary. Cannot be None. Dict[str, Any] |
| RETURNS | The newly constructed object. Example |
Example.text property
The text of the predicted document in this Example.
| Name | Description |
|---|---|
| RETURNS | The text of the predicted document. str |
Example.predicted property
The Doc holding the predictions. Occasionally also referred to as example.x.
| Name | Description |
|---|---|
| RETURNS | The document containing (partial) predictions. Doc |
Example.reference property
The Doc holding the gold-standard annotations. Occasionally also referred to
as example.y.
| Name | Description |
|---|---|
| RETURNS | The document containing gold-standard annotations. Doc |
Example.alignment property
The Alignment object mapping the tokens of
the predicted document to those of the reference document.
| Name | Description |
|---|---|
| RETURNS | The document containing gold-standard annotations. Alignment |
Example.get_aligned method
Get the aligned view of a certain token attribute, denoted by its int ID or string name.
| Name | Description |
|---|---|
field | Attribute ID or string name. Union[int, str] |
as_string | Whether or not to return the list of values as strings. Defaults to False. bool |
| RETURNS | List of integer values, or string values if as_string is True. Union[List[int], List[str]] |
Example.get_aligned_parse method
Get the aligned view of the dependency parse. If projectivize is set to
True, non-projective dependency trees are made projective through the
Pseudo-Projective Dependency Parsing algorithm by Nivre and Nilsson (2005).
| Name | Description |
|---|---|
projectivize | Whether or not to projectivize the dependency trees. Defaults to True. bool |
| RETURNS | List of integer values, or string values if as_string is True. Union[List[int], List[str]] |
Example.get_aligned_ner method
Get the aligned view of the NER BILUO tags.
| Name | Description |
|---|---|
| RETURNS | List of BILUO values, denoting whether tokens are part of an NER annotation or not. List[str] |
Example.get_aligned_spans_y2x method
Get the aligned view of any set of Span objects defined over
Example.reference. The resulting span indices will
align to the tokenization in Example.predicted.
| Name | Description |
|---|---|
y_spans | Span objects aligned to the tokenization of reference. Iterable[Span] |
allow_overlap | Whether the resulting Span objects may overlap or not. Set to False by default. bool |
| RETURNS | Span objects aligned to the tokenization of predicted. List[Span] |
Example.get_aligned_spans_x2y method
Get the aligned view of any set of Span objects defined over
Example.predicted. The resulting span indices will
align to the tokenization in Example.reference. This
method is particularly useful to assess the accuracy of predicted entities
against the original gold-standard annotation.
| Name | Description |
|---|---|
x_spans | Span objects aligned to the tokenization of predicted. Iterable[Span] |
allow_overlap | Whether the resulting Span objects may overlap or not. Set to False by default. bool |
| RETURNS | Span objects aligned to the tokenization of reference. List[Span] |
Example.to_dict method
Return a dictionary representation of the
reference annotation contained in this Example.
| Name | Description |
|---|---|
| RETURNS | Dictionary representation of the reference annotation. Dict[str, Any] |
Example.split_sents method
Split one Example into multiple Example objects, one for each sentence.
| Name | Description |
|---|---|
| RETURNS | List of Example objects, one for each original sentence. List[Example] |
Alignment v3.0
Calculate alignment tables between two tokenizations.
Alignment attributes
Alignment attributes are managed using AlignmentArray, which is a simplified
version of Thinc’s Ragged type that
only supports the data and length attributes.
| Name | Description |
|---|---|
x2y | The AlignmentArray object holding the alignment from x to y. AlignmentArray |
y2x | The AlignmentArray object holding the alignment from y to x. AlignmentArray |
Alignment.from_strings function
| Name | Description |
|---|---|
A | String values of candidate tokens to align. List[str] |
B | String values of reference tokens to align. List[str] |
| RETURNS | An Alignment object describing the alignment. Alignment |