Other

Lookups

classv2.2
A container for large lookup tables and dictionaries

This class allows convenient access to large lookup tables and dictionaries, e.g. lemmatization data or tokenizer exception lists using Bloom filters. Lookups are available via the Vocab as vocab.lookups, so they can be accessed before the pipeline components are applied (e.g. in the tokenizer and lemmatizer), as well as within the pipeline components via doc.vocab.lookups.

Lookups.__init__ method

Create a Lookups object.

NameTypeDescription

Lookups.__len__ method

Get the current number of tables in the lookups.

NameTypeDescription

Lookups._contains__ method

Check if the lookups contain a table of a given name. Delegates to Lookups.has_table.

NameTypeDescription
nameunicodeName of the table.

Lookups.tables property

Get the names of all tables in the lookups.

NameTypeDescription

Lookups.add_table method

Add a new table with optional data to the lookups. Raises an error if the table exists.

NameTypeDescription
nameunicodeUnique name of the table.
datadictOptional data to add to the table.

Lookups.get_table method

Get a table from the lookups. Raises an error if the table doesn’t exist.

NameTypeDescription
nameunicodeName of the table.

Lookups.remove_table method

Remove a table from the lookups. Raises an error if the table doesn’t exist.

NameTypeDescription
nameunicodeName of the table to remove.

Lookups.has_table method

Check if the lookups contain a table of a given name. Equivalent to Lookups.__contains__.

NameTypeDescription
nameunicodeName of the table.

Lookups.to_bytes method

Serialize the lookups to a bytestring.

NameTypeDescription

Lookups.from_bytes method

Load the lookups from a bytestring.

NameTypeDescription
bytes_databytesThe data to load from.

Lookups.to_disk method

Save the lookups to a directory as lookups.bin. Expects a path to a directory, which will be created if it doesn’t exist.

NameTypeDescription
pathunicode / PathA path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects.

Lookups.from_disk method

Load lookups from a directory containing a lookups.bin. Will skip loading if the file doesn’t exist.

NameTypeDescription
pathunicode / PathA path to a directory. Paths may be either strings or Path-like objects.

Table classordererddict

A table in the lookups. Subclass of OrderedDict that implements a slightly more consistent and unified API and includes a Bloom filter to speed up missed lookups. Supports all other methods and attributes of OrderedDict / dict, and the customized methods listed here. Methods that get or set keys accept both integers and strings (which will be hashed before being added to the table).

Table.__init__ method

Initialize a new table.

NameTypeDescription
nameunicodeOptional table name for reference.

Table.from_dict classmethod

Initialize a new table from a dict.

NameTypeDescription
datadictThe dictionary.
nameunicodeOptional table name for reference.

Table.set method

Set a new key / value pair. String keys will be hashed. Same as table[key] = value.

NameTypeDescription
keyunicode / intThe key.
value-The value.

Table.to_bytes method

Serialize the table to a bytestring.

NameTypeDescription

Table.from_bytes method

Load a table from a bytestring.

NameTypeDescription
bytes_databytesThe data to load.

Attributes

NameTypeDescription
nameunicodeTable name.
default_sizeintDefault size of bloom filters if no data is provided.
bloompreshed.bloom.BloomFilterThe bloom filters.