As of v1.7.0, models for spaCy can be installed as Python packages. This means that they're a component of your application, just like any
other module. They're versioned and can be defined as a dependency in your
requirements.txt. Models can be installed from a download URL or a local directory, manually or via pip. Their data can be located anywhere on your file system. To make a model
available to spaCy, all you need to do is create a "shortcut link", an
internal alias that tells spaCy where to find the data files for a specific
Model differences are mostly statistical. In general, we do expect larger models to be "better" and more accurate overall. Ultimately, it depends on your use case and requirements, and we recommend starting with the default models (marked with a star below).
|English||50 MB||CC BY-SA|
|English||1 GB||CC BY-SA|
|English||328 MB||CC BY-SA|
|English||727 MB||CC BY-SA|
|German||645 MB||CC BY-SA|
|French||1.33 GB||CC BY-NC|
|Spanish||377 MB||CC BY-SA|
The easiest way to download a model is via spaCy's
download command. It takes care of finding the best-matching model compatible with
your spaCy installation.
# out-of-the-box: download best-matching default model python -m spacy download en python -m spacy download de python -m spacy download fr # download best-matching version of specific model for your spaCy installation python -m spacy download en_core_web_md # download exact model version (doesn't create shortcut link) python -m spacy download en_core_web_md-1.2.1 --direct
The download command will install the model via pip, place the package in your
site-packages directory and create a shortcut link that lets you load the model by name. The shortcut link will be the same as the model name used in
pip install spacy python -m spacy download en
import spacy nlp = spacy.load('en') doc = nlp(u'This is a sentence.')
Installation via pip
To download a model directly using pip, simply point
pip install to the URL or local path of the archive file. To find the direct link to a model, head over to the model releases, right click on the archive link and copy it to your clipboard.
# with external URL pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-1.2.1/en_core_web_md-1.2.1.tar.gz # with local file pip install /Users/you/en_core_web_md-1.2.1.tar.gz
By default, this will install the model into your
site-packages directory. You can then create a shortcut link for your model to load it via
spacy.load(), or import it as a Python module.
Manual download and installation
In some cases, you might prefer downloading the data manually, for example to place it into a custom directory. You can download the model via your browser from the latest releases, or configure your own download script using the URL of the archive file. The archive consists of a model directory that contains another directory with the model data.
└── en_core_web_md-1.2.0.tar.gz # downloaded archive ├── meta.json # model meta data ├── setup.py # setup file for pip installation └── en_core_web_md # model directory ├── __init__.py # init for pip installation ├── meta.json # model meta data └── en_core_web_md-1.2.0 # model data
You can place the model data directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory.
Using models with spaCy
While previous versions of spaCy required you to maintain a data directory
containing the models for each installation, you can now choose how and
where you want to keep your data files. To load the models conveniently from within spaCy, you can use the
spacy.link command to create a symlink. This lets you set up custom shortcut links for models so you can
load them by name.
python -m spacy link [package name or path] [shortcut] [--force]
The first argument is the package name (if the model was installed via
pip), or a local path to the the data directory. The second argument is the internal name you want to use for the model. Setting the
--force flag will overwrite any existing links.
# set up shortcut link to load installed package as "en_default" python -m spacy link en_core_web_md en_default # set up shortcut link to load local model as "my_amazing_model" python -m spacy link /Users/you/model my_amazing_model
To load a model, use
spacy.load() with the model's shortcut link.
import spacy nlp = spacy.load('en_default') doc = nlp(u'This is a sentence.')
You can also use the
info() method to print a model's meta data before loading it. Each
Language object returned by
spacy.load() also exposes the model's meta data as the attribute
python -m spacy info en # model meta data
import spacy spacy.info('en_default') # model meta data nlp = spacy.load('en_default') print(nlp.meta['version']) # 1.2.0
Importing models as modules
If you've installed a model via pip, you can also
import it directly and then call its
load() method with no arguments:
import spacy import en_core_web_md nlp = en_core_web_md.load() doc = nlp(u'This is a sentence.')
Downloading and requiring model dependencies
download command is mostly intended as a convenient, interactive wrapper. It performs
compatibility checks and prints detailed error messages and warnings.
However, if you're downloading models as part of an automated build
process, this only adds an unecessary layer of complexity. If you know
which models your application needs, you should be specifying them directly.
Because all models are valid Python packages, you can add them to your application's
requirements.txt. If you're running your own internal PyPi installation, you can simply upload the models there. pip's requirements file format supports both package names to download via a PyPi server, as well as direct
spacy>=1.8.0,<2.0.0 -e https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-1.2.0/en_core_web_sm-1.2.0.tar.gz#egg=en_core_web_sm-1.2.0
Using your own models
If you've trained your own model, for example for additional languages or custom named entities, you can save its state using the
Language.save_to_directory() method. To make the model more convenient to deploy, we recommend wrapping it as a Python