Example
An Example
holds the information for one training instance. It stores two
Doc
objects: one for holding the gold-standard reference data, and one for
holding the predictions of the pipeline. An
Alignment
object stores the alignment between
these two documents, as they can differ in tokenization.
Example.__init__ method
Construct an Example
object from the predicted
document and the reference
document. If alignment
is None
, it will be initialized from the words in
both documents.
Name | Description |
---|---|
predicted | The document containing (partial) predictions. Cannot be None . Doc |
reference | The document containing gold-standard annotations. Cannot be None . Doc |
keyword-only | |
alignment | An object holding the alignment between the tokens of the predicted and reference documents. Optional[Alignment] |
Example.from_dict classmethod
Construct an Example
object from the predicted
document and the reference
annotations provided as a dictionary. For more details on the required format,
see the training format documentation.
Name | Description |
---|---|
predicted | The document containing (partial) predictions. Cannot be None . Doc |
example_dict | The gold-standard annotations as a dictionary. Cannot be None . Dict[str, Any] |
RETURNS | The newly constructed object. Example |
Example.text property
The text of the predicted
document in this Example
.
Name | Description |
---|---|
RETURNS | The text of the predicted document. str |
Example.predicted property
The Doc
holding the predictions. Occasionally also referred to as example.x
.
Name | Description |
---|---|
RETURNS | The document containing (partial) predictions. Doc |
Example.reference property
The Doc
holding the gold-standard annotations. Occasionally also referred to
as example.y
.
Name | Description |
---|---|
RETURNS | The document containing gold-standard annotations. Doc |
Example.alignment property
The Alignment
object mapping the tokens of
the predicted
document to those of the reference
document.
Name | Description |
---|---|
RETURNS | The document containing gold-standard annotations. Alignment |
Example.get_aligned method
Get the aligned view of a certain token attribute, denoted by its int ID or string name.
Name | Description |
---|---|
field | Attribute ID or string name. Union[int, str] |
as_string | Whether or not to return the list of values as strings. Defaults to False . bool |
RETURNS | List of integer values, or string values if as_string is True . Union[List[int], List[str]] |
Example.get_aligned_parse method
Get the aligned view of the dependency parse. If projectivize
is set to
True
, non-projective dependency trees are made projective through the
Pseudo-Projective Dependency Parsing algorithm by Nivre and Nilsson (2005).
Name | Description |
---|---|
projectivize | Whether or not to projectivize the dependency trees. Defaults to True . bool |
RETURNS | List of integer values, or string values if as_string is True . Union[List[int], List[str]] |
Example.get_aligned_ner method
Get the aligned view of the NER BILUO tags.
Name | Description |
---|---|
RETURNS | List of BILUO values, denoting whether tokens are part of an NER annotation or not. List[str] |
Example.get_aligned_spans_y2x method
Get the aligned view of any set of Span
objects defined over
Example.reference
. The resulting span indices will
align to the tokenization in Example.predicted
.
Name | Description |
---|---|
y_spans | Span objects aligned to the tokenization of reference . Iterable[Span] |
allow_overlap | Whether the resulting Span objects may overlap or not. Set to False by default. bool |
RETURNS | Span objects aligned to the tokenization of predicted . List[Span] |
Example.get_aligned_spans_x2y method
Get the aligned view of any set of Span
objects defined over
Example.predicted
. The resulting span indices will
align to the tokenization in Example.reference
. This
method is particularly useful to assess the accuracy of predicted entities
against the original gold-standard annotation.
Name | Description |
---|---|
x_spans | Span objects aligned to the tokenization of predicted . Iterable[Span] |
allow_overlap | Whether the resulting Span objects may overlap or not. Set to False by default. bool |
RETURNS | Span objects aligned to the tokenization of reference . List[Span] |
Example.to_dict method
Return a dictionary representation of the
reference annotation contained in this Example
.
Name | Description |
---|---|
RETURNS | Dictionary representation of the reference annotation. Dict[str, Any] |
Example.split_sents method
Split one Example
into multiple Example
objects, one for each sentence.
Name | Description |
---|---|
RETURNS | List of Example objects, one for each original sentence. List[Example] |
Alignment v3.0
Calculate alignment tables between two tokenizations.
Alignment attributes
Alignment attributes are managed using AlignmentArray
, which is a simplified
version of Thinc’s Ragged type that
only supports the data
and length
attributes.
Name | Description |
---|---|
x2y | The AlignmentArray object holding the alignment from x to y . AlignmentArray |
y2x | The AlignmentArray object holding the alignment from y to x . AlignmentArray |
Alignment.from_strings function
Name | Description |
---|---|
A | String values of candidate tokens to align. List[str] |
B | String values of reference tokens to align. List[str] |
RETURNS | An Alignment object describing the alignment. Alignment |