Language
Usually you’ll load this once per process as nlp
and pass the instance around
your application. The Language
class is created when you call
spacy.load
and contains the shared vocabulary and
language data, optional binary
weights, e.g. provided by a trained pipeline, and the
processing pipeline containing components like
the tagger or parser that are called on a document in order. You can also add
your own processing pipeline components that take a Doc
object, modify it and
return it.
Language.__init__ method
Initialize a Language
object. Note that the meta
is only used for meta
information in Language.meta
and not to configure the
nlp
object or to override the config. To initialize from a config, use
Language.from_config
instead.
Name | Description |
---|---|
vocab | A Vocab object. If True , a vocab is created using the default language data settings. Vocab |
keyword-only | |
max_length | Maximum number of characters allowed in a single text. Defaults to 10 ** 6 . int |
meta | Meta data overrides. Dict[str, Any] |
create_tokenizer | Optional function that receives the nlp object and returns a tokenizer. Callable[[Language], Callable[[str],Doc]] |
batch_size | Default batch size for pipe and evaluate . Defaults to 1000 . int |
Language.from_config classmethodv3.0
Create a Language
object from a loaded config. Will set up the tokenizer and
language data, add pipeline components based on the pipeline and add pipeline
components based on the definitions specified in the config. If no config is
provided, the default config of the given language is used. This is also how
spaCy loads a model under the hood based on its
config.cfg
.
Name | Description |
---|---|
config | The loaded config. Union[Dict[str, Any],Config] |
keyword-only | |
vocab | A Vocab object. If True , a vocab is created using the default language data settings. Vocab |
disable | Name(s) of pipeline component(s) to disable. Disabled pipes will be loaded but they won’t be run unless you explicitly enable them by calling nlp.enable_pipe. Is merged with the config entry nlp.disabled . Union[str, Iterable[str]] |
enable v3.4 | Name(s) of pipeline component(s) to enable. All other pipes will be disabled, but can be enabled again using nlp.enable_pipe. Union[str, Iterable[str]] |
exclude | Name(s) of pipeline component(s) to exclude. Excluded components won’t be loaded. Union[str, Iterable[str]] |
meta | Meta data overrides. Dict[str, Any] |
auto_fill | Whether to automatically fill in missing values in the config, based on defaults and function argument annotations. Defaults to True . bool |
validate | Whether to validate the component config and arguments against the types expected by the factory. Defaults to True . bool |
RETURNS | The initialized object. Language |
Language.component classmethodv3.0
Register a custom pipeline component under a given name. This allows
initializing the component by name using
Language.add_pipe
and referring to it in
config files. This classmethod and decorator is
intended for simple stateless functions that take a Doc
and return it. For
more complex stateful components that allow settings and need access to the
shared nlp
object, use the Language.factory
decorator. For more details and examples, see the
usage documentation.
Name | Description |
---|---|
name | The name of the component factory. str |
keyword-only | |
assigns | Doc or Token attributes assigned by this component, e.g. ["token.ent_id"] . Used for pipe analysis. Iterable[str] |
requires | Doc or Token attributes required by this component, e.g. ["token.ent_id"] . Used for pipe analysis. Iterable[str] |
retokenizes | Whether the component changes tokenization. Used for pipe analysis. bool |
func | Optional function if not used as a decorator. Optional[Callable[[Doc],Doc]] |
Language.factory classmethod
Register a custom pipeline component factory under a given name. This allows
initializing the component by name using
Language.add_pipe
and referring to it in
config files. The registered factory function needs to
take at least two named arguments which spaCy fills in automatically: nlp
for the current nlp
object and name
for the component instance name. This
can be useful to distinguish multiple instances of the same component and allows
trainable components to add custom losses using the component instance name. The
default_config
defines the default values of the remaining factory arguments.
It’s merged into the nlp.config
. For more details and
examples, see the
usage documentation.
Name | Description |
---|---|
name | The name of the component factory. str |
keyword-only | |
default_config | The default config, describing the default values of the factory arguments. Dict[str, Any] |
assigns | Doc or Token attributes assigned by this component, e.g. ["token.ent_id"] . Used for pipe analysis. Iterable[str] |
requires | Doc or Token attributes required by this component, e.g. ["token.ent_id"] . Used for pipe analysis. Iterable[str] |
retokenizes | Whether the component changes tokenization. Used for pipe analysis. bool |
default_score_weights | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to 1.0 per component and will be combined and normalized for the whole pipeline. If a weight is set to None , the score will not be logged or weighted. Dict[str, Optional[float]] |
func | Optional function if not used as a decorator. Optional[Callable[[…], Callable[[Doc],Doc]]] |
Language.__call__ method
Apply the pipeline to some text. The text can span multiple sentences, and can contain arbitrary whitespace. Alignment into the original string is preserved.
Instead of text, a Doc
can be passed as input, in which case tokenization is
skipped, but the rest of the pipeline is run.
Name | Description |
---|---|
text | The text to be processed, or a Doc. Union[str,Doc] |
keyword-only | |
disable | Names of pipeline components to disable. List[str] |
component_cfg | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to None . Optional[Dict[str, Dict[str, Any]]] |
RETURNS | A container for accessing the annotations. Doc |
Language.pipe method
Process texts as a stream, and yield Doc
objects in order. This is usually
more efficient than processing texts one-by-one.
Instead of text, a Doc
object can be passed as input. In this case
tokenization is skipped but the rest of the pipeline is run.
Name | Description |
---|---|
texts | A sequence of strings (or Doc objects). Iterable[Union[str,Doc]] |
keyword-only | |
as_tuples | If set to True , inputs should be a sequence of (text, context) tuples. Output will then be a sequence of (doc, context) tuples. Defaults to False . bool |
batch_size | The number of texts to buffer. Optional[int] |
disable | Names of pipeline components to disable. List[str] |
component_cfg | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to None . Optional[Dict[str, Dict[str, Any]]] |
n_process | Number of processors to use. Defaults to 1 . int |
YIELDS | Documents in the order of the original text. Doc |
Language.set_error_handler methodv3.0
Define a callback that will be invoked when an error is thrown during processing
of one or more documents. Specifically, this function will call
set_error_handler
on all the pipeline
components that define that function. The error handler will be invoked with the
original component’s name, the component itself, the list of documents that was
being processed, and the original error.
Name | Description |
---|---|
error_handler | A function that performs custom error handling. Callable[[str, Callable[[Doc],Doc], List[Doc], Exception] |
Language.initialize methodv3.0
Initialize the pipeline for training and return an
Optimizer
. Under the hood, it uses the
settings defined in the [initialize]
config block to set up the vocabulary, load in vectors and tok2vec weights and
pass optional arguments to the initialize
methods implemented by pipeline
components or the tokenizer. This method is typically called automatically when
you run spacy train
. See the usage guide on the
config lifecycle and
initialization for details.
get_examples
should be a function that returns an iterable of
Example
objects. The data examples can either be the full
training data or a representative sample. They are used to initialize the
models of trainable pipeline components and are passed each component’s
initialize
method, if available. Initialization
includes validating the network,
inferring missing shapes
and setting up the label scheme based on the data.
If no get_examples
function is provided when calling nlp.initialize
, the
pipeline components will be initialized with generic data. In this case, it is
crucial that the output dimension of each component has already been defined
either in the config, or by calling
pipe.add_label
for each possible output label (e.g. for
the tagger or textcat).
Name | Description |
---|---|
get_examples | Optional function that returns gold-standard annotations in the form of Example objects. Optional[Callable[[], Iterable[Example]]] |
keyword-only | |
sgd | An optimizer. Will be created via create_optimizer if not set. Optional[Optimizer] |
RETURNS | The optimizer. Optimizer |
Language.resume_training methodexperimentalv3.0
Continue training a trained pipeline. Create and return an optimizer, and
initialize “rehearsal” for any pipeline component that has a rehearse
method.
Rehearsal is used to prevent models from “forgetting” their initialized
“knowledge”. To perform rehearsal, collect samples of text you want the models
to retain performance on, and call nlp.rehearse
with
a batch of Example objects.
Name | Description |
---|---|
keyword-only | |
sgd | An optimizer. Will be created via create_optimizer if not set. Optional[Optimizer] |
RETURNS | The optimizer. Optimizer |
Language.update method
Update the models in the pipeline.
Name | Description |
---|---|
examples | A batch of Example objects to learn from. Iterable[Example] |
keyword-only | |
drop | The dropout rate. float |
sgd | An optimizer. Will be created via create_optimizer if not set. Optional[Optimizer] |
losses | Dictionary to update with the loss, keyed by pipeline component. Optional[Dict[str, float]] |
component_cfg | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to None . Optional[Dict[str, Dict[str, Any]]] |
RETURNS | The updated losses dictionary. Dict[str, float] |
Language.rehearse methodexperimentalv3.0
Perform a “rehearsal” update from a batch of data. Rehearsal updates teach the current model to make predictions similar to an initial model, to try to address the “catastrophic forgetting” problem. This feature is experimental.
Name | Description |
---|---|
examples | A batch of Example objects to learn from. Iterable[Example] |
keyword-only | |
drop | The dropout rate. float |
sgd | An optimizer. Will be created via create_optimizer if not set. Optional[Optimizer] |
losses | Dictionary to update with the loss, keyed by pipeline component. Optional[Dict[str, float]] |
RETURNS | The updated losses dictionary. Dict[str, float] |
Language.evaluate method
Evaluate a pipeline’s components.
Name | Description |
---|---|
examples | A batch of Example objects to learn from. Iterable[Example] |
keyword-only | |
batch_size | The batch size to use. Optional[int] |
scorer | Optional Scorer to use. If not passed in, a new one will be created. Optional[Scorer] |
component_cfg | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to None . Optional[Dict[str, Dict[str, Any]]] |
scorer_cfg | Optional dictionary of keyword arguments for the Scorer . Defaults to None . Optional[Dict[str, Any]] |
per_component v3.6 | Whether to return the scores keyed by component name. Defaults to False . bool |
RETURNS | A dictionary of evaluation scores. Dict[str, Union[float, Dict[str, float]]] |
Language.use_params contextmanagermethod
Replace weights of models in the pipeline with those provided in the params dictionary. Can be used as a context manager, in which case, models go back to their original weights after the block.
Name | Description |
---|---|
params | A dictionary of parameters keyed by model ID. dict |
Language.add_pipe method
Add a component to the processing pipeline. Expects a name that maps to a
component factory registered using
@Language.component
or
@Language.factory
. Components should be callables
that take a Doc
object, modify it and return it. Only one of before
,
after
, first
or last
can be set. Default behavior is last=True
.
Name | Description |
---|---|
factory_name | Name of the registered component factory. str |
name | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. Optional[str] |
keyword-only | |
before | Component name or index to insert component directly before. Optional[Union[str, int]] |
after | Component name or index to insert component directly after. Optional[Union[str, int]] |
first | Insert component first / not first in the pipeline. Optional[bool] |
last | Insert component last / not last in the pipeline. Optional[bool] |
config v3.0 | Optional config parameters to use for this component. Will be merged with the default_config specified by the component factory. Dict[str, Any] |
source v3.0 | Optional source pipeline to copy component from. If a source is provided, the factory_name is interpreted as the name of the component in the source pipeline. Make sure that the vocab, vectors and settings of the source pipeline match the target pipeline. Optional[Language] |
validate v3.0 | Whether to validate the component config and arguments against the types expected by the factory. Defaults to True . bool |
RETURNS | The pipeline component. Callable[[Doc],Doc] |
Language.create_pipe method
Create a pipeline component from a factory.
Name | Description |
---|---|
factory_name | Name of the registered component factory. str |
name | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. Optional[str] |
keyword-only | |
config v3.0 | Optional config parameters to use for this component. Will be merged with the default_config specified by the component factory. Dict[str, Any] |
validate v3.0 | Whether to validate the component config and arguments against the types expected by the factory. Defaults to True . bool |
RETURNS | The pipeline component. Callable[[Doc],Doc] |
Language.has_factory classmethodv3.0
Check whether a factory name is registered on the Language
class or subclass.
Will check for
language-specific factories
registered on the subclass, as well as general-purpose factories registered on
the Language
base class, available to all subclasses.
Name | Description |
---|---|
name | Name of the pipeline factory to check. str |
RETURNS | Whether a factory of that name is registered on the class. bool |
Language.has_pipe method
Check whether a component is present in the pipeline. Equivalent to
name in nlp.pipe_names
.
Name | Description |
---|---|
name | Name of the pipeline component to check. str |
RETURNS | Whether a component of that name exists in the pipeline. bool |
Language.get_pipe method
Get a pipeline component for a given component name.
Name | Description |
---|---|
name | Name of the pipeline component to get. str |
RETURNS | The pipeline component. Callable[[Doc],Doc] |
Language.replace_pipe method
Replace a component in the pipeline and return the new component.
Name | Description |
---|---|
name | Name of the component to replace. str |
component | The factory name of the component to insert. str |
keyword-only | |
config v3.0 | Optional config parameters to use for the new component. Will be merged with the default_config specified by the component factory. Optional[Dict[str, Any]] |
validate v3.0 | Whether to validate the component config and arguments against the types expected by the factory. Defaults to True . bool |
RETURNS | The new pipeline component. Callable[[Doc],Doc] |
Language.rename_pipe method
Rename a component in the pipeline. Useful to create custom names for
pre-defined and pre-loaded components. To change the default name of a component
added to the pipeline, you can also use the name
argument on
add_pipe
.
Name | Description |
---|---|
old_name | Name of the component to rename. str |
new_name | New name of the component. str |
Language.remove_pipe method
Remove a component from the pipeline. Returns the removed component name and component function.
Name | Description |
---|---|
name | Name of the component to remove. str |
RETURNS | A (name, component) tuple of the removed component. Tuple[str, Callable[[Doc],Doc]] |
Language.disable_pipe methodv3.0
Temporarily disable a pipeline component so it’s not run as part of the
pipeline. Disabled components are listed in
nlp.disabled
and included in
nlp.components
, but not in
nlp.pipeline
, so they’re not run when you process a
Doc
with the nlp
object. If the component is already disabled, this method
does nothing.
Name | Description |
---|---|
name | Name of the component to disable. str |
Language.enable_pipe methodv3.0
Enable a previously disabled component (e.g. via
Language.disable_pipes
) so it’s run as part of
the pipeline, nlp.pipeline
. If the component is
already enabled, this method does nothing.
Name | Description |
---|---|
name | Name of the component to enable. str |
Language.select_pipes contextmanagermethodv3.0
Disable one or more pipeline components. If used as a context manager, the
pipeline will be restored to the initial state at the end of the block.
Otherwise, a DisabledPipes
object is returned, that has a .restore()
method
you can use to undo your changes. You can specify either disable
(as a list or
string), or enable
. In the latter case, all components not in the enable
list will be disabled. Under the hood, this method calls into
disable_pipe
and
enable_pipe
.
Name | Description |
---|---|
keyword-only | |
disable | Name(s) of pipeline component(s) to disable. Optional[Union[str, Iterable[str]]] |
enable | Name(s) of pipeline component(s) that will not be disabled. Optional[Union[str, Iterable[str]]] |
RETURNS | The disabled pipes that can be restored by calling the object’s .restore() method. DisabledPipes |
Language.get_factory_meta classmethodv3.0
Get the factory meta information for a given pipeline component name. Expects
the name of the component factory. The factory meta is an instance of the
FactoryMeta
dataclass and contains the
information about the component and its default provided by the
@Language.component
or
@Language.factory
decorator.
Name | Description |
---|---|
name | The factory name. str |
RETURNS | The factory meta. FactoryMeta |
Language.get_pipe_meta methodv3.0
Get the factory meta information for a given pipeline component name. Expects
the name of the component instance in the pipeline. The factory meta is an
instance of the FactoryMeta
dataclass and
contains the information about the component and its default provided by the
@Language.component
or
@Language.factory
decorator.
Name | Description |
---|---|
name | The pipeline component name. str |
RETURNS | The factory meta. FactoryMeta |
Language.analyze_pipes methodv3.0
Analyze the current pipeline components and show a summary of the attributes
they assign and require, and the scores they set. The data is based on the
information provided in the @Language.component
and
@Language.factory
decorator. If requirements aren’t
met, e.g. if a component specifies a required property that is not set by a
previous component, a warning is shown.
Structured
Pretty
Name | Description |
---|---|
keyword-only | |
keys | The values to display in the table. Corresponds to attributes of the FactoryMeta . Defaults to ["assigns", "requires", "scores", "retokenizes"] . List[str] |
pretty | Pretty-print the results as a table. Defaults to False . bool |
RETURNS | Dictionary containing the pipe analysis, keyed by "summary" (component meta by pipe), "problems" (attribute names by pipe) and "attrs" (pipes that assign and require an attribute, keyed by attribute). Optional[Dict[str, Any]] |
Language.replace_listeners methodv3.0
Find listener layers
(connecting to a shared token-to-vector embedding component) of a given pipeline
component model and replace them with a standalone copy of the token-to-vector
layer. The listener layer allows other components to connect to a shared
token-to-vector embedding component like Tok2Vec
or
Transformer
. Replacing listeners can be useful when
training a pipeline with components sourced from an existing pipeline: if
multiple components (e.g. tagger, parser, NER) listen to the same
token-to-vector component, but some of them are frozen and not updated, their
performance may degrade significantly as the token-to-vector component is updated
with new data. To prevent this, listeners can be replaced with a standalone
token-to-vector layer that is owned by the component and doesn’t change if the
component isn’t updated.
This method is typically not called directly and only executed under the hood
when loading a config with
sourced components that define
replace_listeners
.
Name | Description |
---|---|
tok2vec_name | Name of the token-to-vector component, typically "tok2vec" or "transformer" .str |
pipe_name | Name of pipeline component to replace listeners for. str |
listeners | The paths to the listeners, relative to the component config, e.g. ["model.tok2vec"] . Typically, implementations will only connect to one tok2vec component, model.tok2vec , but in theory, custom models can use multiple listeners. The value here can either be an empty list to not replace any listeners, or a complete list of the paths to all listener layers used by the model that should be replaced.Iterable[str] |
Language.memory_zone contextmanagerv3.8
Begin a block where all resources allocated during the block will be freed at the end of it. If a resources was created within the memory zone block, accessing it outside the block is invalid. Behavior of this invalid access is undefined. Memory zones should not be nested. The memory zone is helpful for services that need to process large volumes of text with a defined memory budget.
Name | Description |
---|---|
mem | Optional cymem.Pool object to own allocations (created if not provided). This argument is not required for ordinary usage. Defaults to None . Optional[cymem.Pool] |
RETURNS | The memory pool that owns the allocations. This object is not required for ordinary usage. Iterator[cymem.Pool] |
Language.meta property
Meta data for the Language
class, including name, version, data sources,
license, author information and more. If a trained pipeline is loaded, this
contains meta data of the pipeline. The Language.meta
is also what’s
serialized as the meta.json
when you save an nlp
object to disk. See the
meta data format for more details.
Name | Description |
---|---|
RETURNS | The meta data. Dict[str, Any] |
Language.config propertyv3.0
Export a trainable config.cfg
for the current
nlp
object. Includes the current pipeline, all configs used to create the
currently active pipeline components, as well as the default training config
that can be used with spacy train
. Language.config
returns
a Thinc Config
object, which is a
subclass of the built-in dict
. It supports the additional methods to_disk
(serialize the config to a file) and to_str
(output the config as a string).
Name | Description |
---|---|
RETURNS | The config. Config |
Language.to_disk method
Save the current state to a directory. Under the hood, this method delegates to
the to_disk
methods of the individual pipeline components, if available. This
means that if a trained pipeline is loaded, all components and their weights
will be saved to disk.
Name | Description |
---|---|
path | A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path -like objects. Union[str,Path] |
keyword-only | |
exclude | Names of pipeline components or serialization fields to exclude. Iterable[str] |
Language.from_disk method
Loads state from a directory, including all data that was saved with the
Language
object. Modifies the object in place and returns it.
Name | Description |
---|---|
path | A path to a directory. Paths may be either strings or Path -like objects. Union[str,Path] |
keyword-only | |
exclude | Names of pipeline components or serialization fields to exclude. Iterable[str] |
RETURNS | The modified Language object. Language |
Language.to_bytes method
Serialize the current state to a binary string.
Name | Description |
---|---|
keyword-only | |
exclude | Names of pipeline components or serialization fields to exclude. iterable |
RETURNS | The serialized form of the Language object. bytes |
Language.from_bytes method
Load state from a binary string. Note that this method is commonly used via the
subclasses like English
or German
to make language-specific functionality
like the lexical attribute getters
available to the loaded object.
Note that if you want to serialize and reload a whole pipeline, using this alone won’t work, you also need to handle the config. See “Serializing the pipeline” for details.
Name | Description |
---|---|
bytes_data | The data to load from. bytes |
keyword-only | |
exclude | Names of pipeline components or serialization fields to exclude. Iterable[str] |
RETURNS | The Language object. Language |
Attributes
Name | Description |
---|---|
vocab | A container for the lexical types. Vocab |
tokenizer | The tokenizer. Tokenizer |
make_doc | Callable that takes a string and returns a Doc . Callable[[str],Doc] |
pipeline | List of (name, component) tuples describing the current processing pipeline, in order. List[Tuple[str, Callable[[Doc],Doc]]] |
pipe_names | List of pipeline component names, in order. List[str] |
pipe_labels | List of labels set by the pipeline components, if available, keyed by component name. Dict[str, List[str]] |
pipe_factories | Dictionary of pipeline component names, mapped to their factory names. Dict[str, str] |
factories | All available factory functions, keyed by name. Dict[str, Callable[[…], Callable[[Doc],Doc]]] |
factory_names v3.0 | List of all available factory names. List[str] |
components v3.0 | List of all available (name, component) tuples, including components that are currently disabled. List[Tuple[str, Callable[[Doc],Doc]]] |
component_names v3.0 | List of all available component names, including components that are currently disabled. List[str] |
disabled v3.0 | Names of components that are currently disabled and don’t run as part of the pipeline. List[str] |
path | Path to the pipeline data directory, if a pipeline is loaded from a path or package. Otherwise None . Optional[Path] |
Class attributes
Name | Description |
---|---|
Defaults | Settings, data and factory methods for creating the nlp object and processing pipeline. Defaults |
lang | IETF language tag, such as ‘en’ for English. str |
default_config | Base config to use for Language.config. Defaults to default_config.cfg . Config |
Defaults
The following attributes can be set on the Language.Defaults
class to
customize the default language data:
Name | Description |
---|---|
stop_words | List of stop words, used for Token.is_stop .Example: stop_words.py Set[str] |
tokenizer_exceptions | Tokenizer exception rules, string mapped to list of token attributes. Example: de/tokenizer_exceptions.py Dict[str, List[dict]] |
prefixes , suffixes , infixes | Prefix, suffix and infix rules for the default tokenizer. Example: puncutation.py Optional[Sequence[Union[str,Pattern]]] |
token_match | Optional regex for matching strings that should never be split, overriding the infix rules. Example: fr/tokenizer_exceptions.py Optional[Callable] |
url_match | Regular expression for matching URLs. Prefixes and suffixes are removed before applying the match. Example: tokenizer_exceptions.py Optional[Callable] |
lex_attr_getters | Custom functions for setting lexical attributes on tokens, e.g. like_num .Example: lex_attrs.py Dict[int, Callable[[str], Any]] |
syntax_iterators | Functions that compute views of a Doc object based on its syntax. At the moment, only used for noun chunks.Example: syntax_iterators.py . Dict[str, Callable[[Union[Doc,Span]], Iterator[Span]]] |
writing_system | Information about the language’s writing system, available via Vocab.writing_system . Defaults to: {"direction": "ltr", "has_case": True, "has_letters": True}. .Example: zh/__init__.py Dict[str, Any] |
config | Default config added to nlp.config . This can include references to custom tokenizers or lemmatizers.Example: zh/__init__.py Config |
Serialization fields
During serialization, spaCy will export several data fields used to restore
different aspects of the object. If needed, you can exclude them from
serialization by passing in the string names via the exclude
argument.
Name | Description |
---|---|
vocab | The shared Vocab . |
tokenizer | Tokenization rules and exceptions. |
meta | The meta data, available as Language.meta . |
… | String names of pipeline components, e.g. "ner" . |
FactoryMeta dataclassv3.0
The FactoryMeta
contains the information about the component and its default
provided by the @Language.component
or
@Language.factory
decorator. It’s created whenever a
component is defined and stored on the Language
class for each component
instance and factory instance.
Name | Description |
---|---|
factory | The name of the registered component factory. str |
default_config | The default config, describing the default values of the factory arguments. Dict[str, Any] |
assigns | Doc or Token attributes assigned by this component, e.g. ["token.ent_id"] . Used for pipe analysis. Iterable[str] |
requires | Doc or Token attributes required by this component, e.g. ["token.ent_id"] . Used for pipe analysis. Iterable[str] |
retokenizes | Whether the component changes tokenization. Used for pipe analysis. bool |
default_score_weights | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to 1.0 per component and will be combined and normalized for the whole pipeline. If a weight is set to None , the score will not be logged or weighted. Dict[str, Optional[float]] |
scores | All scores set by the components if it’s trainable, e.g. ["ents_f", "ents_r", "ents_p"] . Based on the default_score_weights and used for pipe analysis. Iterable[str] |