scikit

StringStore
class

Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of integer IDs. This ensures that strings always map to the same ID, even from different StringStores.

StringStore.__init__
method

Create the StringStore.

NameTypeDescription
stringsiterableA sequence of unicode strings to add to the store.
returnsStringStoreThe newly constructed object.

StringStore.__len__
method

Get the number of strings in the store.

NameTypeDescription
returnsintThe number of strings in the store.

StringStore.__getitem__
method

Retrieve a string from a given hash, or vice versa.

NameTypeDescription
string_or_idbytes, unicode or uint64The value to encode.
returnsunicode or intThe value to be retrieved.

StringStore.__contains__
method

Check whether a string is in the store.

NameTypeDescription
stringunicodeThe string to check.
returnsboolWhether the store contains the string.

StringStore.__iter__
method

Iterate over the strings in the store, in order. Note that a newly initialised store will always include an empty string '' at position 0.

NameTypeDescription
yieldsunicodeA string in the store.

StringStore.add
method
v2.0 This feature is new and was introduced in spaCy v2.0

Add a string to the StringStore.

NameTypeDescription
stringunicodeThe string to add.
returnsuint64The string's hash value.

StringStore.to_disk
method
v2.0 This feature is new and was introduced in spaCy v2.0

Save the current state to a directory.

NameTypeDescription
pathunicode or Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

StringStore.from_disk
method
v2.0 This feature is new and was introduced in spaCy v2.0

Loads state from a directory. Modifies the object in place and returns it.

NameTypeDescription
pathunicode or Path A path to a directory. Paths may be either strings or Path-like objects.
returnsStringStoreThe modified StringStore object.

StringStore.to_bytes
method

Serialize the current state to a binary string.

NameTypeDescription
**exclude-Named attributes to prevent from being serialized.
returnsbytesThe serialized form of the StringStore object.

StringStore.from_bytes
method

Load state from a binary string.

NameTypeDescription
bytes_databytesThe data to load from.
**exclude-Named attributes to prevent from being loaded.
returnsStringStoreThe StringStore object.

Utilities

strings.hash_string
function

Get a 64-bit hash for a given string.

NameTypeDescription
stringunicodeThe string to hash.
returnsuint64The hash.