Lookups
This class allows convenient access to large lookup tables and dictionaries,
e.g. lemmatization data or tokenizer exception lists using Bloom filters.
Lookups are available via the Vocab as vocab.lookups, so they
can be accessed before the pipeline components are applied (e.g. in the
tokenizer and lemmatizer), as well as within the pipeline components via
doc.vocab.lookups.
Lookups.__init__ method
Create a Lookups object.
Lookups.__len__ method
Get the current number of tables in the lookups.
| Name | Description |
|---|---|
| RETURNS | The number of tables in the lookups. int |
Lookups.__contains__ method
Check if the lookups contain a table of a given name. Delegates to
Lookups.has_table.
| Name | Description |
|---|---|
name | Name of the table. str |
| RETURNS | Whether a table of that name is in the lookups. bool |
Lookups.tables property
Get the names of all tables in the lookups.
| Name | Description |
|---|---|
| RETURNS | Names of the tables in the lookups. List[str] |
Lookups.add_table method
Add a new table with optional data to the lookups. Raises an error if the table exists.
| Name | Description |
|---|---|
name | Unique name of the table. str |
data | Optional data to add to the table. dict |
| RETURNS | The newly added table. Table |
Lookups.get_table method
Get a table from the lookups. Raises an error if the table doesn’t exist.
| Name | Description |
|---|---|
name | Name of the table. str |
| RETURNS | The table. Table |
Lookups.remove_table method
Remove a table from the lookups. Raises an error if the table doesn’t exist.
| Name | Description |
|---|---|
name | Name of the table to remove. str |
| RETURNS | The removed table. Table |
Lookups.has_table method
Check if the lookups contain a table of a given name. Equivalent to
Lookups.__contains__.
| Name | Description |
|---|---|
name | Name of the table. str |
| RETURNS | Whether a table of that name is in the lookups. bool |
Lookups.to_bytes method
Serialize the lookups to a bytestring.
| Name | Description |
|---|---|
| RETURNS | The serialized lookups. bytes |
Lookups.from_bytes method
Load the lookups from a bytestring.
| Name | Description |
|---|---|
bytes_data | The data to load from. bytes |
| RETURNS | The loaded lookups. Lookups |
Lookups.to_disk method
Save the lookups to a directory as lookups.bin. Expects a path to a directory,
which will be created if it doesn’t exist.
| Name | Description |
|---|---|
path | A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects. Union[str,Path] |
Lookups.from_disk method
Load lookups from a directory containing a lookups.bin. Will skip loading if
the file doesn’t exist.
| Name | Description |
|---|---|
path | A path to a directory. Paths may be either strings or Path-like objects. Union[str,Path] |
| RETURNS | The loaded lookups. Lookups |
Table classordererddict
A table in the lookups. Subclass of OrderedDict that implements a slightly
more consistent and unified API and includes a Bloom filter to speed up missed
lookups. Supports all other methods and attributes of OrderedDict /
dict, and the customized methods listed here. Methods that get or set keys
accept both integers and strings (which will be hashed before being added to the
table).
Table.__init__ method
Initialize a new table.
| Name | Description |
|---|---|
name | Optional table name for reference. str |
Table.from_dict classmethod
Initialize a new table from a dict.
| Name | Description |
|---|---|
data | The dictionary. dict |
name | Optional table name for reference. str |
| RETURNS | The newly constructed object. Table |
Table.set method
Set a new key / value pair. String keys will be hashed. Same as
table[key] = value.
| Name | Description |
|---|---|
key | The key. Union[str, int] |
value | The value. |
Table.to_bytes method
Serialize the table to a bytestring.
| Name | Description |
|---|---|
| RETURNS | The serialized table. bytes |
Table.from_bytes method
Load a table from a bytestring.
| Name | Description |
|---|---|
bytes_data | The data to load. bytes |
| RETURNS | The loaded table. Table |
Attributes
| Name | Description |
|---|---|
name | Table name. str |
default_size | Default size of bloom filters if no data is provided. int |
bloom | The bloom filters. preshed.BloomFilter |