Other

KnowledgeBase

class
A storage class for entities and aliases of a specific knowledge base (ontology)

The KnowledgeBase object provides a method to generate Candidate objects, which are plausible external identifiers given a certain textual mention. Each such Candidate holds information from the relevant KB entities, such as its frequency in text and possible aliases. Each entity in the knowledge base also has a pretrained entity vector of a fixed size.

KnowledgeBase.__init__ method

Create the knowledge base.

NameDescription
vocabThe shared vocabulary. Vocab
entity_vector_lengthLength of the fixed-size entity vectors. int

KnowledgeBase.entity_vector_length property

The length of the fixed-size entity vectors in the knowledge base.

NameDescription

KnowledgeBase.add_entity method

Add an entity to the knowledge base, specifying its corpus frequency and entity vector, which should be of length entity_vector_length.

NameDescription
entityThe unique entity identifier. str
freqThe frequency of the entity in a typical corpus. float
entity_vectorThe pretrained vector of the entity. numpy.ndarray

KnowledgeBase.set_entities method

Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.

NameDescription
entity_listList of unique entity identifiers. Iterable[Union[str, int]]
freq_listList of entity frequencies. Iterable[int]
vector_listList of entity vectors. Iterable[numpy.ndarray]

KnowledgeBase.add_alias method

Add an alias or mention to the knowledge base, specifying its potential KB identifiers and their prior probabilities. The entity identifiers should refer to entities previously added with add_entity or set_entities. The sum of the prior probabilities should not exceed 1. Note that an empty string can not be used as alias.

NameDescription
aliasThe textual mention or alias. Can not be the empty string. str
entitiesThe potential entities that the alias may refer to. Iterable[Union[str, int]]
probabilitiesThe prior probabilities of each entity. Iterable[float]

KnowledgeBase.__len__ method

Get the total number of entities in the knowledge base.

NameDescription

KnowledgeBase.get_entity_strings method

Get a list of all entity IDs in the knowledge base.

NameDescription

KnowledgeBase.get_size_aliases method

Get the total number of aliases in the knowledge base.

NameDescription

KnowledgeBase.get_alias_strings method

Get a list of all aliases in the knowledge base.

NameDescription

KnowledgeBase.get_alias_candidates method

Given a certain textual mention as input, retrieve a list of candidate entities of type Candidate.

NameDescription
aliasThe textual mention or alias. str

KnowledgeBase.get_vector method

Given a certain entity ID, retrieve its pretrained entity vector.

NameDescription
entityThe entity ID. str

KnowledgeBase.get_prior_prob method

Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.

NameDescription
entityThe entity ID. str
aliasThe textual mention or alias. str

KnowledgeBase.to_disk method

Save the current state of the knowledge base to a directory.

NameDescription
locA path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects. Union[str, Path]

KnowledgeBase.from_disk method

Restore the state of the knowledge base from a given directory. Note that the Vocab should also be the same as the one used to create the KB.

NameDescription
locA path to a directory. Paths may be either strings or Path-like objects. Union[str, Path]

Candidate class

A Candidate object refers to a textual mention (alias) that may or may not be resolved to a specific entity from a KnowledgeBase. This will be used as input for the entity linking algorithm which will disambiguate the various candidates to the correct one. Each candidate (alias, entity) pair is assigned to a certain prior probability.

Candidate.__init__ method

Construct a Candidate object. Usually this constructor is not called directly, but instead these objects are returned by the get_candidates method of the entity_linker pipe.

NameDescription
kbThe knowledge base that defined this candidate. KnowledgeBase
entity_hashThe hash of the entity’s KB ID. int
entity_freqThe entity frequency as recorded in the KB. float
alias_hashThe hash of the textual mention or alias. int
prior_probThe prior probability of the alias referring to the entity. float

Candidate attributes

NameDescription
entityThe entity’s unique KB identifier. int
entity_The entity’s unique KB identifier. str
aliasThe alias or textual mention. int
alias_The alias or textual mention. str
prior_probThe prior probability of the alias referring to the entity. long
entity_freqThe frequency of the entity in a typical corpus. long
entity_vectorThe pretrained vector of the entity. numpy.ndarray