KnowledgeBase
The KnowledgeBase
object is an abstract class providing a method to generate
Candidate
objects, which are plausible external
identifiers given a certain textual mention. Each such Candidate
holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.
Beyond that, KnowledgeBase
classes have to implement a number of utility
functions called by the EntityLinker
component.
KnowledgeBase.__init__ method
KnowledgeBase
is an abstract class and cannot be instantiated. Its child
classes should call __init__()
to set up some necessary attributes.
Name | Description |
---|---|
vocab | The shared vocabulary. Vocab |
entity_vector_length | Length of the fixed-size entity vectors. int |
KnowledgeBase.entity_vector_length property
The length of the fixed-size entity vectors in the knowledge base.
Name | Description |
---|---|
RETURNS | Length of the fixed-size entity vectors. int |
KnowledgeBase.get_candidates method
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate
.
Name | Description |
---|---|
mention | The textual mention or alias. Span |
RETURNS | An iterable of relevant Candidate objects. Iterable[Candidate] |
KnowledgeBase.get_candidates_batch method
Same as get_candidates()
, but for an arbitrary
number of mentions. The EntityLinker
component will call
get_candidates_batch()
instead of get_candidates()
, if the config parameter
candidates_batch_size
is greater or equal than 1.
The default implementation of get_candidates_batch()
executes
get_candidates()
in a loop. We recommend implementing a more efficient way to
retrieve candidates for multiple mentions at once, if performance is of concern
to you.
Name | Description |
---|---|
mentions | The textual mention or alias. Iterable[Span] |
RETURNS | An iterable of iterable with relevant Candidate objects. Iterable[Iterable[Candidate]] |
KnowledgeBase.get_alias_candidates method
From spaCy 3.5 on KnowledgeBase
is an abstract class (with
InMemoryLookupKB
being a drop-in replacement) to
allow more flexibility in customizing knowledge bases. Some of its methods were
moved to InMemoryLookupKB
during this refactoring,
one of those being get_alias_candidates()
. This method is now available as
InMemoryLookupKB.get_alias_candidates()
.
Note:
InMemoryLookupKB.get_candidates()
defaults to
InMemoryLookupKB.get_alias_candidates()
.
KnowledgeBase.get_vector method
Given a certain entity ID, retrieve its pretrained entity vector.
Name | Description |
---|---|
entity | The entity ID. str |
RETURNS | The entity vector. Iterable[float] |
KnowledgeBase.get_vectors method
Same as get_vector()
, but for an arbitrary number of
entity IDs.
The default implementation of get_vectors()
executes get_vector()
in a loop.
We recommend implementing a more efficient way to retrieve vectors for multiple
entities at once, if performance is of concern to you.
Name | Description |
---|---|
entities | The entity IDs. Iterable[str] |
RETURNS | The entity vectors. Iterable[Iterable[numpy.ndarray]] |
KnowledgeBase.to_disk method
Save the current state of the knowledge base to a directory.
Name | Description |
---|---|
path | A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path -like objects. Union[str,Path] |
exclude | List of components to exclude. Iterable[str] |
KnowledgeBase.from_disk method
Restore the state of the knowledge base from a given directory. Note that the
Vocab
should also be the same as the one used to create the KB.
Name | Description |
---|---|
loc | A path to a directory. Paths may be either strings or Path -like objects. Union[str,Path] |
exclude | List of components to exclude. Iterable[str] |
RETURNS | The modified KnowledgeBase object. KnowledgeBase |
Candidate class
A Candidate
object refers to a textual mention (alias) that may or may not be
resolved to a specific entity from a KnowledgeBase
. This will be used as input
for the entity linking algorithm which will disambiguate the various candidates
to the correct one. Each candidate (alias, entity)
pair is assigned to a
certain prior probability.
Candidate.__init__ method
Construct a Candidate
object. Usually this constructor is not called directly,
but instead these objects are returned by the get_candidates
method of the
entity_linker
pipe.
Name | Description |
---|---|
kb | The knowledge base that defined this candidate. KnowledgeBase |
entity_hash | The hash of the entity’s KB ID. int |
entity_freq | The entity frequency as recorded in the KB. float |
alias_hash | The hash of the textual mention or alias. int |
prior_prob | The prior probability of the alias referring to the entity . float |
Candidate attributes
Name | Description |
---|---|
entity | The entity’s unique KB identifier. int |
entity_ | The entity’s unique KB identifier. str |
alias | The alias or textual mention. int |
alias_ | The alias or textual mention. str |
prior_prob | The prior probability of the alias referring to the entity . long |
entity_freq | The frequency of the entity in a typical corpus. long |
entity_vector | The pretrained vector of the entity. numpy.ndarray |