InMemoryLookupKB
The InMemoryLookupKB
class inherits from KnowledgeBase
and
implements all of its methods. It stores all KB data in-memory and generates
Candidate
objects by exactly matching mentions with
entity names. It’s highly optimized for both a low memory footprint and speed of
retrieval.
InMemoryLookupKB.__init__ method
Create the knowledge base.
Name | Description |
---|---|
vocab | The shared vocabulary. Vocab |
entity_vector_length | Length of the fixed-size entity vectors. int |
InMemoryLookupKB.entity_vector_length property
The length of the fixed-size entity vectors in the knowledge base.
Name | Description |
---|---|
RETURNS | Length of the fixed-size entity vectors. int |
InMemoryLookupKB.add_entity method
Add an entity to the knowledge base, specifying its corpus frequency and entity
vector, which should be of length
entity_vector_length
.
Name | Description |
---|---|
entity | The unique entity identifier. str |
freq | The frequency of the entity in a typical corpus. float |
entity_vector | The pretrained vector of the entity. numpy.ndarray |
InMemoryLookupKB.set_entities method
Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.
Name | Description |
---|---|
entity_list | List of unique entity identifiers. Iterable[Union[str, int]] |
freq_list | List of entity frequencies. Iterable[int] |
vector_list | List of entity vectors. Iterable[numpy.ndarray] |
InMemoryLookupKB.add_alias method
Add an alias or mention to the knowledge base, specifying its potential KB
identifiers and their prior probabilities. The entity identifiers should refer
to entities previously added with
add_entity
or
set_entities
. The sum of the prior
probabilities should not exceed 1. Note that an empty string can not be used as
alias.
Name | Description |
---|---|
alias | The textual mention or alias. Can not be the empty string. str |
entities | The potential entities that the alias may refer to. Iterable[Union[str, int]] |
probabilities | The prior probabilities of each entity. Iterable[float] |
InMemoryLookupKB.__len__ method
Get the total number of entities in the knowledge base.
Name | Description |
---|---|
RETURNS | The number of entities in the knowledge base. int |
InMemoryLookupKB.get_entity_strings method
Get a list of all entity IDs in the knowledge base.
Name | Description |
---|---|
RETURNS | The list of entities in the knowledge base. List[str] |
InMemoryLookupKB.get_size_aliases method
Get the total number of aliases in the knowledge base.
Name | Description |
---|---|
RETURNS | The number of aliases in the knowledge base. int |
InMemoryLookupKB.get_alias_strings method
Get a list of all aliases in the knowledge base.
Name | Description |
---|---|
RETURNS | The list of aliases in the knowledge base. List[str] |
InMemoryLookupKB.get_candidates method
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate
. Wraps
get_alias_candidates()
.
Name | Description |
---|---|
mention | The textual mention or alias. Span |
RETURNS | An iterable of relevant Candidate objects. Iterable[Candidate] |
InMemoryLookupKB.get_candidates_batch method
Same as get_candidates()
, but for an
arbitrary number of mentions. The EntityLinker
component
will call get_candidates_batch()
instead of get_candidates()
, if the config
parameter candidates_batch_size
is greater or equal than 1.
The default implementation of get_candidates_batch()
executes
get_candidates()
in a loop. We recommend implementing a more efficient way to
retrieve candidates for multiple mentions at once, if performance is of concern
to you.
Name | Description |
---|---|
mentions | The textual mention or alias. Iterable[Span] |
RETURNS | An iterable of iterable with relevant Candidate objects. Iterable[Iterable[Candidate]] |
InMemoryLookupKB.get_alias_candidates method
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate
.
Name | Description |
---|---|
alias | The textual mention or alias. str |
RETURNS | The list of relevant Candidate objects. List[Candidate] |
InMemoryLookupKB.get_vector method
Given a certain entity ID, retrieve its pretrained entity vector.
Name | Description |
---|---|
entity | The entity ID. str |
RETURNS | The entity vector. numpy.ndarray |
InMemoryLookupKB.get_vectors method
Same as get_vector()
, but for an arbitrary
number of entity IDs.
The default implementation of get_vectors()
executes get_vector()
in a loop.
We recommend implementing a more efficient way to retrieve vectors for multiple
entities at once, if performance is of concern to you.
Name | Description |
---|---|
entities | The entity IDs. Iterable[str] |
RETURNS | The entity vectors. Iterable[Iterable[numpy.ndarray]] |
InMemoryLookupKB.get_prior_prob method
Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.
Name | Description |
---|---|
entity | The entity ID. str |
alias | The textual mention or alias. str |
RETURNS | The prior probability of the alias referring to the entity . float |
InMemoryLookupKB.to_disk method
Save the current state of the knowledge base to a directory.
Name | Description |
---|---|
path | A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path -like objects. Union[str,Path] |
exclude | List of components to exclude. Iterable[str] |
InMemoryLookupKB.from_disk method
Restore the state of the knowledge base from a given directory. Note that the
Vocab
should also be the same as the one used to create the KB.
Name | Description |
---|---|
loc | A path to a directory. Paths may be either strings or Path -like objects. Union[str,Path] |
exclude | List of components to exclude. Iterable[str] |
RETURNS | The modified KnowledgeBase object. KnowledgeBase |