The default config is defined by the pipeline component factory and describes
how the component should be configured. You can override its settings via the
config argument on nlp.add_pipe or in your
config.cfg for training. See the
model architectures documentation for details on the
architectures and their arguments and hyperparameters.
Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when the nlp object is called on a text
and all pipeline components are applied to the Doc in order. Both
__call__ and pipe
delegate to the predict and
Apply the pipe to a stream of documents. This usually happens under the hood
when the nlp object is called on a text and all pipeline components are
applied to the Doc in order. Both __call__ and
pipe delegate to the
Initialize the component for training. get_examples should be a function that
returns an iterable of Example objects. The data examples are
used to initialize the model of the component and can either be the full
training data or a representative sample. Initialization includes validating the
inferring missing shapes and
setting up the label scheme based on the data. This method is typically called
by Language.initialize and lets you customize
arguments it receives via the
[initialize.components] block in the
Function that returns gold-standard annotations in the form of Example objects. Callable[, Iterable[Example]]
The current nlp object. Defaults to None. Optional[Language]
The label information to add to the component, as provided by the label_data property after initialization. To generate a reusable JSON file from your data, you should run the init labels command. If no labels are provided, the get_examples callback is used to extract the labels from the data, which may be a lot slower. Optional[dict]
Add a new label to the pipe. If the Morphologizer should set annotations for
both pos and morph, the label should include the UPOS as the feature POS.
Raises an error if the output dimension is already set, or if the model has
already been fully initialized. Note that you don’t have to call
this method if you provide a representative data sample to the
initialize method. In this case, all labels found in the sample
will be automatically added to the model, and the output dimension will be
The label to add. str
0 if the label is already present, otherwise 1. int
The labels currently added to the component in the Universal Dependencies
format. Note that even for a blank component, this will always include the
internal empty label _. If POS features are used, the labels will include the
coarse-grained POS as the feature POS.
The labels currently added to the component and their internal meta information.
This is the data generated by init labels and used by
Morphologizer.initialize to initialize the
model with a pre-defined label set.
During serialization, spaCy will export several data fields used to restore
different aspects of the object. If needed, you can exclude them from
serialization by passing in the string names via the exclude argument.