KnowledgeBase
The KnowledgeBase
object provides a method to generate
Candidate
objects, which are plausible external
identifiers given a certain textual mention. Each such Candidate
holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.
KnowledgeBase.__init__ method
Create the knowledge base.
Name | Description |
---|---|
vocab | The shared vocabulary. Vocab |
entity_vector_length | Length of the fixed-size entity vectors. int |
KnowledgeBase.entity_vector_length property
The length of the fixed-size entity vectors in the knowledge base.
Name | Description |
---|---|
RETURNS | Length of the fixed-size entity vectors. int |
KnowledgeBase.add_entity method
Add an entity to the knowledge base, specifying its corpus frequency and entity
vector, which should be of length
entity_vector_length
.
Name | Description |
---|---|
entity | The unique entity identifier. str |
freq | The frequency of the entity in a typical corpus. float |
entity_vector | The pretrained vector of the entity. numpy.ndarray |
KnowledgeBase.set_entities method
Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.
Name | Description |
---|---|
entity_list | List of unique entity identifiers. Iterable[Union[str, int]] |
freq_list | List of entity frequencies. Iterable[int] |
vector_list | List of entity vectors. Iterable[numpy.ndarray] |
KnowledgeBase.add_alias method
Add an alias or mention to the knowledge base, specifying its potential KB
identifiers and their prior probabilities. The entity identifiers should refer
to entities previously added with add_entity
or
set_entities
. The sum of the prior probabilities
should not exceed 1. Note that an empty string can not be used as alias.
Name | Description |
---|---|
alias | The textual mention or alias. Can not be the empty string. str |
entities | The potential entities that the alias may refer to. Iterable[Union[str, int]] |
probabilities | The prior probabilities of each entity. Iterable[float] |
KnowledgeBase.__len__ method
Get the total number of entities in the knowledge base.
Name | Description |
---|---|
RETURNS | The number of entities in the knowledge base. int |
KnowledgeBase.get_entity_strings method
Get a list of all entity IDs in the knowledge base.
Name | Description |
---|---|
RETURNS | The list of entities in the knowledge base. List[str] |
KnowledgeBase.get_size_aliases method
Get the total number of aliases in the knowledge base.
Name | Description |
---|---|
RETURNS | The number of aliases in the knowledge base. int |
KnowledgeBase.get_alias_strings method
Get a list of all aliases in the knowledge base.
Name | Description |
---|---|
RETURNS | The list of aliases in the knowledge base. List[str] |
KnowledgeBase.get_alias_candidates method
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate
.
Name | Description |
---|---|
alias | The textual mention or alias. str |
RETURNS | The list of relevant Candidate objects. List[Candidate] |
KnowledgeBase.get_vector method
Given a certain entity ID, retrieve its pretrained entity vector.
Name | Description |
---|---|
entity | The entity ID. str |
RETURNS | The entity vector. numpy.ndarray |
KnowledgeBase.get_prior_prob method
Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.
Name | Description |
---|---|
entity | The entity ID. str |
alias | The textual mention or alias. str |
RETURNS | The prior probability of the alias referring to the entity . float |
KnowledgeBase.to_disk method
Save the current state of the knowledge base to a directory.
Name | Description |
---|---|
loc | A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path -like objects. Union[str, Path] |
KnowledgeBase.from_disk method
Restore the state of the knowledge base from a given directory. Note that the
Vocab
should also be the same as the one used to create the KB.
Name | Description |
---|---|
loc | A path to a directory. Paths may be either strings or Path -like objects. Union[str, Path] |
RETURNS | The modified KnowledgeBase object. KnowledgeBase |
Candidate class
A Candidate
object refers to a textual mention (alias) that may or may not be
resolved to a specific entity from a KnowledgeBase
. This will be used as input
for the entity linking algorithm which will disambiguate the various candidates
to the correct one. Each candidate (alias, entity)
pair is assigned to a
certain prior probability.
Candidate.__init__ method
Construct a Candidate
object. Usually this constructor is not called directly,
but instead these objects are returned by the get_candidates
method of the
entity_linker
pipe.
Name | Description |
---|---|
kb | The knowledge base that defined this candidate. KnowledgeBase |
entity_hash | The hash of the entity’s KB ID. int |
entity_freq | The entity frequency as recorded in the KB. float |
alias_hash | The hash of the textual mention or alias. int |
prior_prob | The prior probability of the alias referring to the entity . float |
Candidate attributes
Name | Description |
---|---|
entity | The entity’s unique KB identifier. int |
entity_ | The entity’s unique KB identifier. str |
alias | The alias or textual mention. int |
alias_ | The alias or textual mention. str |
prior_prob | The prior probability of the alias referring to the entity . long |
entity_freq | The frequency of the entity in a typical corpus. long |
entity_vector | The pretrained vector of the entity. numpy.ndarray |