spaCy currently supports the following languages and capabilities:
Model differences are mostly statistical. In general, we do expect larger models to be "better" and more accurate overall. Ultimately, it depends on your use case and requirements, and we recommend starting with the default models (marked with a star below).
|English||50 MB||CC BY-SA|
|English||1 GB||CC BY-SA|
|English||328 MB||CC BY-SA|
|English||727 MB||CC BY-SA|
|German||645 MB||CC BY-SA|
|French||1.33 GB||CC BY-NC|
Work has started on the following languages. You can help by improving the existing language data and extending the tokenization patterns.
Chinese tokenization requires the Jieba library. Statistical models are coming soon.