Other

BaseVectors

classv3.7
Abstract class for word vectors

BaseVectors is an abstract class to support the development of custom vectors implementations.

For use in training with StaticVectors, get_batch must be implemented. For improved performance, use efficient batching in get_batch and implement to_ops to copy the vector data to the current device. See an example custom implementation for BPEmb subword embeddings.

BaseVectors.__init__ method

Create a new vector store.

NameDescription
keyword-only
stringsThe string store. A new string store is created if one is not provided. Defaults to None. Optional[StringStore]

BaseVectors.__getitem__ method

Get a vector by key. If the key is not found in the table, a KeyError should be raised.

NameDescription
keyThe key to get the vector for. Union[int, str]

BaseVectors.__len__ method

Return the number of vectors in the table.

NameDescription

BaseVectors.__contains__ method

Check whether there is a vector entry for the given key.

NameDescription
keyThe key to check. int

BaseVectors.add method

Add a key to the table, if possible. If no keys can be added, return -1.

NameDescription
keyThe key to add. Union[str, int]

BaseVectors.shape property

Get (rows, dims) tuples of number of rows and number of dimensions in the vector table.

NameDescription

BaseVectors.size property

The vector size, i.e. rows * dims.

NameDescription

BaseVectors.is_full property

Whether the vectors table is full and no slots are available for new keys.

NameDescription

BaseVectors.get_batch methodv3.2

Get the vectors for the provided keys efficiently as a batch. Required to use the vectors with StaticVectors for training.

NameDescription
keysThe keys. Iterable[Union[int, str]]

BaseVectors.to_ops method

Dummy method. Implement this to change the embedding matrix to use different Thinc ops.

NameDescription
opsThe Thinc ops to switch the embedding matrix to. Ops

BaseVectors.to_disk method

Dummy method to allow serialization. Implement to save vector data with the pipeline.

NameDescription
pathA path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects. Union[str,Path]

BaseVectors.from_disk method

Dummy method to allow serialization. Implement to load vector data from a saved pipeline.

NameDescription
pathA path to a directory. Paths may be either strings or Path-like objects. Union[str,Path]

BaseVectors.to_bytes method

Dummy method to allow serialization. Implement to serialize vector data to a binary string.

NameDescription

BaseVectors.from_bytes method

Dummy method to allow serialization. Implement to load vector data from a binary string.

NameDescription
dataThe data to load from. bytes