scikit

Models Overview
Downloadable statistical models for spaCy to predict and assign linguistic features.

spaCy v2.0 features new neural models for tagging, parsing and entity recognition. The models have been designed and implemented from scratch specifically for spaCy, to give you an unmatched balance of speed, size and accuracy. A novel bloom embedding strategy with subword features is used to support huge vocabularies in tiny tables. Convolutional layers with residual connections, layer normalization and maxout non-linearity are used, giving much better efficiency than the standard BiLSTM solution. For more details, see the notes on the model architecture.

The parser and NER use an imitation learning objective to deliver accuracy in-line with the latest research systems, even when evaluated from raw text. With these innovations, spaCy v2.0's models are 10× smaller, 20% more accurate, and even cheaper to run than the previous generation.

Quickstart

Install a default model, get the code to load it from within spaCy and an example to test it. For more options, see the section on available models below.

Language
Loading style
Options
python -m spacy download enimport en_core_web_smnlp = en_core_web_sm.load()import spacynlp = spacy.load('en')doc = nlp(u"This is a sentence.")print([(w.text, w.pos_) for w in doc])python -m spacy download deimport de_core_news_smnlp = de_core_news_sm.load()import spacynlp = spacy.load('de')doc = nlp(u"Dies ist ein Satz.")print([(w.text, w.pos_) for w in doc])python -m spacy download esimport es_core_news_smnlp = es_core_news_sm.load()import spacynlp = spacy.load('es')doc = nlp(u"Esto es una frase.")print([(w.text, w.pos_) for w in doc])python -m spacy download ptimport pt_core_news_smnlp = pt_core_news_sm.load()import spacynlp = spacy.load('pt')doc = nlp(u"Esta é uma frase.")print([(w.text, w.pos_) for w in doc])python -m spacy download frimport fr_core_news_smnlp = fr_core_news_sm.load()import spacynlp = spacy.load('fr')doc = nlp(u"C'est une phrase.")print([(w.text, w.pos_) for w in doc])python -m spacy download itimport it_core_news_smnlp = it_core_news_sm.load()import spacynlp = spacy.load('it')doc = nlp(u"Questa è una frase.")print([(w.text, w.pos_) for w in doc])python -m spacy download nlimport nl_core_news_smnlp = nl_core_news_sm.load()import spacynlp = spacy.load('nl')doc = nlp(u"Dit is een zin.")print([(w.text, w.pos_) for w in doc])python -m spacy download xximport xx_ent_wiki_smnlp = xx_ent_wiki_sm.load()import spacynlp = spacy.load('xx')doc = nlp(u"This is a sentence about Facebook.")print([(ent.text, ent.label) for ent in doc.ents])

Installation & Usage

The easiest way to download a model is via spaCy's download command. It takes care of finding the best-matching model compatible with your spaCy installation.

# out-of-the-box: download best-matching default model
python -m spacy download en
python -m spacy download de
python -m spacy download es
python -m spacy download pt
python -m spacy download fr
python -m spacy download it
python -m spacy download nl
python -m spacy download xx

# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm

# download exact model version (doesn't create shortcut link)
python -m spacy download en_core_web_sm-2.0.0 --direct

The download command will install the model via pip, place the package in your site-packages directory and create a shortcut link that lets you load the model by a custom name. The shortcut link will be the same as the model name used in spacy download.

pip install spacy
python -m spacy download en
import spacy
nlp = spacy.load('en')
doc = nlp(u'This is a sentence.')

Available models

Model differences are mostly statistical. In general, we do expect larger models to be "better" and more accurate overall. Ultimately, it depends on your use case and requirements. We recommend starting with the default models (marked with a star below).

NameLanguageType
en_core_web_smEnglishVocabulary, syntax, entities
en_core_web_mdEnglishVocabulary, syntax, entities, vectors
en_core_web_lgEnglishVocabulary, syntax, entities, vectors
en_vectors_web_lgEnglishWord vectors
de_core_news_smGermanVocabulary, syntax, entities
es_core_news_smSpanishVocabulary, syntax, entities
es_core_news_mdSpanishVocabulary, syntax, entities, vectors
pt_core_news_smPortugueseVocabulary, syntax, entities
fr_core_news_smFrenchVocabulary, syntax, entities
fr_core_news_mdFrenchVocabulary, syntax, entities, vectors
it_core_news_smItalianVocabulary, syntax, entities
nl_core_news_smDutchVocabulary, syntax, entities
xx_ent_wiki_smMulti-languageNamed entities

Model naming conventions

In general, spaCy expects all model packages to follow the naming convention of [lang]_[name]. For spaCy's models, we also chose to divide the name into three components:

Type
Model capabilities (e.g. core for general-purpose model with vocabulary, syntax, entities and word vectors, or depent for only vocab, syntax and entities).
Genre
Type of text the model is trained on, e.g. web or news.
Size
Model size indicator, sm, md or lg.

For example, en_core_web_sm is a small English model trained on written web text (blogs, news, comments), that includes vocabulary, vectors, syntax and entities.

Model versioning

Additionally, the model versioning reflects both the compatibility with spaCy, as well as the major and minor model version. A model version a.b.c translates to:

a spaCy major version. For example, 2 for spaCy v2.x.
b Model major version. Models with a different major version can't be loaded by the same code. For example, changing the width of the model, adding hidden layers or changing the activation changes the model major version.
c Model minor version. Same model structure, but different parameter values, e.g. from being trained on different data, for different numbers of iterations, etc.

For a detailed compatibility overview, see the compatibility.json in the models repository. This is also the source of spaCy's internal compatibility check, performed when you run the download command.