Models

As of v1.7.0, models for spaCy can be installed as Python packages. This means that they're a component of your application, just like any other module. They're versioned and can be defined as a dependency in your requirements.txt. Models can be installed from a download URL or a local directory, manually or via pip. Their data can be located anywhere on your file system. To make a model available to spaCy, all you need to do is create a "shortcut link", an internal alias that tells spaCy where to find the data files for a specific model name.

Available models

NameSizeDescription
en_core_web_sm50 MBVocab, syntax, entities, word vectors
en_core_web_md1 GBVocab, syntax, entities, word vectors
en_depent_web_md328 MBVocab, syntax, entities
en_vectors_glove_md727 MB GloVe Common Crawl vectors
de_core_news_md645 MBVocab, syntax, entities, word vectors

Models are now available as .tar.gz archives from GitHub, attached to individual releases. They can be downloaded and loaded manually, or using spaCy's download and link commands. All models follow the naming convention of [language]_[type]_[genre]_[size].

View models

Downloading models

The easiest way to download a model is via spaCy's download command. It takes care of finding the best-matching model compatible with your spaCy installation.

# out-of-the-box: download best-matching default model
python -m spacy download en
python -m spacy download de

# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_md

# download exact model version (doesn't create shortcut link)
python -m spacy download en_core_web_md-1.2.0 --direct

The download command will install the model via pip, place the package in your site-packages directory and create a shortcut link that lets you load the model by name. The shortcut link will be the same as the model name used in spacy.download.

pip install spacy
python -m spacy download en
import spacy
nlp = spacy.load('en')
doc = nlp(u'This is a sentence.')

Installation via pip

To download a model directly using pip, simply point pip install to the URL or local path of the archive file. To find the direct link to a model, head over to the model releases, right click on the archive link and copy it to your clipboard.

# with external URL
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-1.2.0/en_core_web_md-1.2.0.tar.gz

# with local file
pip install /Users/you/en_core_web_md-1.2.0.tar.gz

By default, this will install the model into your site-packages directory. You can then create a shortcut link for your model to load it via spacy.load(), or import it as a Python module.

Manual download and installation

In some cases, you might prefer downloading the data manually, for example to place it into a custom directory. You can download the model via your browser from the latest releases, or configure your own download script using the URL of the archive file. The archive consists of a model directory that contains another directory with the model data.

Directory structure

└── en_core_web_md-1.2.0.tar.gz # downloaded archive ├── meta.json # model meta data ├── setup.py # setup file for pip installation └── en_core_web_md # model directory ├── __init__.py # init for pip installation ├── meta.json # model meta data └── en_core_web_md-1.2.0 # model data

You can place the model data directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory.

Using models with spaCy

While previous versions of spaCy required you to maintain a data directory containing the models for each installation, you can now choose how and where you want to keep your data files. To load the models conveniently from within spaCy, you can use the spacy.link command to create a symlink. This lets you set up custom shortcut links for models so you can load them by name.

python -m spacy link [package name or path] [shortcut] [--force]

The first argument is the package name (if the model was installed via pip), or a local path to the the data directory. The second argument is the internal name you want to use for the model. Setting the --force flag will overwrite any existing links.

Examples

# set up shortcut link to load installed package as "en_default" python -m spacy link en_core_web_md en_default # set up shortcut link to load local model as "my_amazing_model" python -m spacy link /Users/you/model my_amazing_model

Loading models

To load a model, use spacy.load() with the model's shortcut link.

import spacy
nlp = spacy.load('en_default')
doc = nlp(u'This is a sentence.')

You can also use the command or info() method to print a model's meta data before loading it. Each Language object returned by spacy.load() also exposes the model's meta data as the attribute meta.

python -m spacy info en
# model meta data
import spacy
spacy.info('en_default')
# model meta data

nlp = spacy.load('en_default')
print(nlp.meta['version'])
# 1.2.0

Importing models as modules

If you've installed a model via pip, you can also import it directly and then call its load() method with no arguments:

import spacy
import en_core_web_md

nlp = en_core_web_md.load()
doc = nlp(u'This is a sentence.')

Using your own models

If you've trained your own model, for example for additional languages, you can create a shortuct link for it by pointing spacy.link to the model's data directory. To allow your model to be downloaded and installed via pip, you'll also need to generate a package for it.

The model directory should look like this:

Directory structure

└── / ├── MANIFEST.in # to include meta.json ├── meta.json # model meta data ├── setup.py # setup file for pip installation └── en_core_web_md # model directory ├── __init__.py # init for pip installation └── en_core_web_md-1.2.0 # model data

You can find templates for all files in our spaCy dev resources. Unless you want to customise installation and loading, the only file you'll need to modify is meta.json, which includes the model's meta data. It will later be copied into the package and data directory.

meta.json

{ "name": "core_web_md", "lang": "en", "version": "1.2.0", "spacy_version": "1.7.0", "description": "English model for spaCy", "author": "Explosion AI", "email": "contact@explosion.ai", "license": "MIT" }

Keep in mind that the directories need to be named according to the naming conventions. The lang setting is also used to create the respective Language class in spaCy, which will later be returned by the model's load() method.

To generate the package, run the following command from within the directory. This will create a .tar.gz archive in a directory /dist.

python setup.py sdist
Read next: Lightning tour