scikit

Pipe
class
v2.0 This feature is new and was introduced in spaCy v2.0
Abstract base class defining the API for pipeline components.

This class is not instantiated directly. Components inherit from it, and it defines the interface that components should follow to function as components in a spaCy analysis pipeline.

Pipe.Model
classmethod

Initialise a model for the pipe. The model should implement the thinc.neural.Model API. Wrappers are under development for most major machine learning libraries.

NameTypeDescription
**kwargs-Parameters for initialising the model
returnsobjectThe initialised model.

Pipe.__init__
method

Create a new pipeline instance.

NameTypeDescription
vocabVocabThe shared vocabulary.
modelthinc.neural.Model or True The model powering the pipeline component. If no model is supplied, the model is created when you call begin_training, from_disk or from_bytes.
**cfg-Configuration parameters.
returnsPipeThe newly constructed object.

Pipe.__call__
method

Apply the pipe to one document. The document is modified in place, and returned. Both Pipe.__call__ and Pipe.pipe should delegate to the Pipe.predict and Pipe.set_annotations methods.

NameTypeDescription
docDocThe document to process.
returnsDocThe processed document.

Pipe.pipe
method

Apply the pipe to a stream of documents. Both Pipe.__call__ and Pipe.pipe should delegate to the Pipe.predict and Pipe.set_annotations methods.

NameTypeDescription
streamiterableA stream of documents.
batch_sizeintThe number of texts to buffer. Defaults to 128.
n_threadsint The number of worker threads to use. If -1, OpenMP will decide how many to use at run time. Default is -1.
yieldsDocProcessed documents in the order of the original text.

Pipe.predict
method

Apply the pipeline's model to a batch of docs, without modifying them.

NameTypeDescription
docsiterableThe documents to predict.
returns-Scores from the model.

Pipe.set_annotations
method

Modify a batch of documents, using pre-computed scores.

NameTypeDescription
docsiterableThe documents to modify.
scores-The scores to set, produced by Pipe.predict.

Pipe.update
method

Learn from a batch of documents and gold-standard information, updating the pipe's model. Delegates to Pipe.predict and Pipe.get_loss.

NameTypeDescription
docsiterableA batch of documents to learn from.
goldsiterableThe gold-standard data. Must have the same length as docs.
dropintThe dropout rate.
sgdcallable The optimizer. Should take two arguments weights and gradient, and an optional ID.
lossesdict Optional record of the loss during training. The value keyed by the model's name is updated.

Pipe.get_loss
method

Find the loss and gradient of loss for the batch of documents and their predicted scores.

NameTypeDescription
docsiterableThe batch of documents.
goldsiterableThe gold-standard data. Must have the same length as docs.
scores-Scores representing the model's predictions.
returnstupleThe loss and the gradient, i.e. (loss, gradient).

Pipe.begin_training
method

Initialise the pipe for training, using data exampes if available. If no model has been initialised yet, the model is added.

NameTypeDescription
gold_tuplesiterable Optional gold-standard annotations from which to construct GoldParse objects.
pipelinelist Optional list of Pipe components that this component is part of.
sgdcallable An optional optimizer. Should take two arguments weights and gradient, and an optional ID. Will be created via create_optimizer if not set.
returnscallableAn optimizer.

Pipe.create_optimizer
method

Create an optmizer for the pipeline component.

NameTypeDescription
returnscallableThe optimizer.

Pipe.use_params
method
contextmanager

Modify the pipe's model, to use the given parameter values.

NameTypeDescription
params- The parameter values to use in the model. At the end of the context, the original parameters are restored.

Pipe.add_label
method

Add a new label to the pipe.

NameTypeDescription
labelunicodeThe label to add.

Pipe.to_disk
method

Serialize the pipe to disk.

NameTypeDescription
pathunicode or Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

Pipe.from_disk
method

Load the pipe from disk. Modifies the object in place and returns it.

NameTypeDescription
pathunicode or Path A path to a directory. Paths may be either strings or Path-like objects.
returnsPipeThe modified Pipe object.

Pipe.to_bytes
method

Serialize the pipe to a bytestring.

NameTypeDescription
**exclude-Named attributes to prevent from being serialized.
returnsbytesThe serialized form of the Pipe object.

Pipe.from_bytes
method

Load the pipe from a bytestring. Modifies the object in place and returns it.

NameTypeDescription
bytes_databytesThe data to load from.
**exclude-Named attributes to prevent from being loaded.
returnsPipeThe Pipe object.