Pipeline

EntityLinker

classv2.2
Functionality to disambiguate a named entity in text to a unique knowledge base identifier.

This class is a subclass of Pipe and follows the same API. The pipeline component is available in the processing pipeline via the ID "entity_linker".

EntityLinker.Model classmethod

Initialize a model for the pipe. The model should implement the thinc.neural.Model API, and should contain a field tok2vec that contains the context encoder. Wrappers are under development for most major machine learning libraries.

NameTypeDescription
**kwargs-Parameters for initializing the model

EntityLinker.__init__ method

Create a new pipeline instance. In your application, you would normally use a shortcut for this and instantiate the component using its string name and nlp.create_pipe.

NameTypeDescription
vocabVocabThe shared vocabulary.
modelthinc.neural.Model / TrueThe model powering the pipeline component. If no model is supplied, the model is created when you call begin_training, from_disk or from_bytes.
hidden_widthintWidth of the hidden layer of the entity linking model, defaults to 128.
incl_priorboolWhether or not to include prior probabilities in the model. Defaults to True.
incl_contextboolWhether or not to include the local context in the model (if not: only prior probabilities are used). Defaults to True.

EntityLinker.__call__ method

Apply the pipe to one document. The document is modified in place, and returned. This usually happens under the hood when the nlp object is called on a text and all pipeline components are applied to the Doc in order. Both __call__ and pipe delegate to the predict and set_annotations methods.

NameTypeDescription
docDocThe document to process.

EntityLinker.pipe method

Apply the pipe to a stream of documents. This usually happens under the hood when the nlp object is called on a text and all pipeline components are applied to the Doc in order. Both __call__ and pipe delegate to the predict and set_annotations methods.

NameTypeDescription
streamiterableA stream of documents.
batch_sizeintThe number of texts to buffer. Defaults to 128.

EntityLinker.predict method

Apply the pipeline’s model to a batch of docs, without modifying them.

NameTypeDescription
docsiterableThe documents to predict.

EntityLinker.set_annotations method

Modify a batch of documents, using pre-computed entity IDs for a list of named entities.

NameTypeDescription
docsiterableThe documents to modify.
kb_idsiterableThe knowledge base identifiers for the entities in the docs, predicted by EntityLinker.predict.
tensorsiterableThe token representations used to predict the identifiers.

EntityLinker.update method

Learn from a batch of documents and gold-standard information, updating both the pipe’s entity linking model and context encoder. Delegates to predict and get_loss.

NameTypeDescription
docsiterableA batch of documents to learn from.
goldsiterableThe gold-standard data. Must have the same length as docs.
dropfloatThe dropout rate, used both for the EL model and the context encoder.
sgdcallableThe optimizer for the EL model. Should take two arguments weights and gradient, and an optional ID.
lossesdictOptional record of the loss during training. The value keyed by the model’s name is updated.

EntityLinker.get_loss method

Find the loss and gradient of loss for the entities in a batch of documents and their predicted scores.

NameTypeDescription
docsiterableThe batch of documents.
goldsiterableThe gold-standard data. Must have the same length as docs.
kb_idsiterableKB identifiers representing the model’s predictions.
tensorsiterableThe token representations used to predict the identifiers

EntityLinker.set_kb method

Define the knowledge base (KB) used for disambiguating named entities to KB identifiers.

NameTypeDescription
kbKnowledgeBaseThe KnowledgeBase.

EntityLinker.begin_training method

Initialize the pipe for training, using data examples if available. If no model has been initialized yet, the model is added. Before calling this method, a knowledge base should have been defined with set_kb.

NameTypeDescription
gold_tuplesiterableOptional gold-standard annotations from which to construct GoldParse objects.
pipelinelistOptional list of pipeline components that this component is part of.
sgdcallableAn optional optimizer. Should take two arguments weights and gradient, and an optional ID. Will be created via EntityLinker if not set.

EntityLinker.create_optimizer method

Create an optimizer for the pipeline component.

NameTypeDescription

EntityLinker.use_params methodcontextmanager

Modify the pipe’s EL model, to use the given parameter values.

NameTypeDescription
paramsdictThe parameter values to use in the model. At the end of the context, the original parameters are restored.

EntityLinker.to_disk method

Serialize the pipe to disk.

NameTypeDescription
pathunicode / PathA path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects.
excludelistString names of serialization fields to exclude.

EntityLinker.from_disk method

Load the pipe from disk. Modifies the object in place and returns it.

NameTypeDescription
pathunicode / PathA path to a directory. Paths may be either strings or Path-like objects.
excludelistString names of serialization fields to exclude.

Serialization fields

During serialization, spaCy will export several data fields used to restore different aspects of the object. If needed, you can exclude them from serialization by passing in the string names via the exclude argument.

NameDescription
vocabThe shared Vocab.
cfgThe config file. You usually don’t want to exclude this.
modelThe binary model data. You usually don’t want to exclude this.
kbThe knowledge base. You usually don’t want to exclude this.