Other

Lookups

class
A container for large lookup tables and dictionaries

This class allows convenient access to large lookup tables and dictionaries, e.g. lemmatization data or tokenizer exception lists using Bloom filters. Lookups are available via the Vocab as vocab.lookups, so they can be accessed before the pipeline components are applied (e.g. in the tokenizer and lemmatizer), as well as within the pipeline components via doc.vocab.lookups.

Lookups.__init__ method

Create a Lookups object.

Lookups.__len__ method

Get the current number of tables in the lookups.

NameDescription

Lookups.__contains__ method

Check if the lookups contain a table of a given name. Delegates to Lookups.has_table.

NameDescription
nameName of the table. str

Lookups.tables property

Get the names of all tables in the lookups.

NameDescription

Lookups.add_table method

Add a new table with optional data to the lookups. Raises an error if the table exists.

NameDescription
nameUnique name of the table. str
dataOptional data to add to the table. dict

Lookups.get_table method

Get a table from the lookups. Raises an error if the table doesn’t exist.

NameDescription
nameName of the table. str

Lookups.remove_table method

Remove a table from the lookups. Raises an error if the table doesn’t exist.

NameDescription
nameName of the table to remove. str

Lookups.has_table method

Check if the lookups contain a table of a given name. Equivalent to Lookups.__contains__.

NameDescription
nameName of the table. str

Lookups.to_bytes method

Serialize the lookups to a bytestring.

NameDescription

Lookups.from_bytes method

Load the lookups from a bytestring.

NameDescription
bytes_dataThe data to load from. bytes

Lookups.to_disk method

Save the lookups to a directory as lookups.bin. Expects a path to a directory, which will be created if it doesn’t exist.

NameDescription
pathA path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects. Union[str,Path]

Lookups.from_disk method

Load lookups from a directory containing a lookups.bin. Will skip loading if the file doesn’t exist.

NameDescription
pathA path to a directory. Paths may be either strings or Path-like objects. Union[str,Path]

Table classordererddict

A table in the lookups. Subclass of OrderedDict that implements a slightly more consistent and unified API and includes a Bloom filter to speed up missed lookups. Supports all other methods and attributes of OrderedDict / dict, and the customized methods listed here. Methods that get or set keys accept both integers and strings (which will be hashed before being added to the table).

Table.__init__ method

Initialize a new table.

NameDescription
nameOptional table name for reference. str

Table.from_dict classmethod

Initialize a new table from a dict.

NameDescription
dataThe dictionary. dict
nameOptional table name for reference. str

Table.set method

Set a new key / value pair. String keys will be hashed. Same as table[key] = value.

NameDescription
keyThe key. Union[str, int]
valueThe value.

Table.to_bytes method

Serialize the table to a bytestring.

NameDescription

Table.from_bytes method

Load a table from a bytestring.

NameDescription
bytes_dataThe data to load. bytes

Attributes

NameDescription
nameTable name. str
default_sizeDefault size of bloom filters if no data is provided. int
bloomThe bloom filters. preshed.BloomFilter