# Using word vectors and semantic similarities

Dense, real valued vectors representing distributional similarity information are now a cornerstone of practical NLP. The most common way to train these vectors is the word2vec family of algorithms.

spaCy makes using word vectors very easy. The `Lexeme`

, `Token`

, `Span`

and `Doc`

classes all have a `.vector`

property, which is a 1-dimensional numpy array of 32-bit floats:

```
import numpy
apples, and_, oranges = nlp(u'apples and oranges')
print(apples.vector.shape)
# (1,)
apples.similarity(oranges)
```

By default, `Token.vector`

returns the vector for its underlying lexeme, while `Doc.vector`

and `Span.vector`

return an average of the vectors of their tokens. You can customize these behaviours by modifying the `doc.user_hooks`

, `doc.user_span_hooks`

and `doc.user_token_hooks`

dictionaries.

The default English model installs vectors for one million vocabulary entries, using the 300-dimensional vectors trained on the Common Crawl corpus using the GloVe algorithm. The GloVe common crawl vectors have become a de facto standard for practical NLP.

You can load new word vectors from a file-like buffer using the `vocab.load_vectors()`

method. The file should be a whitespace-delimited text file, where the word is in the first column,
and subsequent columns provide the vector data. For faster loading, you can use the `vocab.vectors_from_bin_loc()`

method, which accepts a path to a binary file written by `vocab.dump_vectors()`

.

You can also load vectors from memory, by writing to the `lexeme.vector`

property. If the vectors you are writing are of different dimensionality
from the ones currently loaded, you should first call `vocab.resize_vectors(new_size)`

.