Language models

spaCy currently supports the following languages and capabilities:

LanguageTokenSBDLemmaPOSNERDepVectorSentiment
English en
German de
French fr

Available models

Model differences are mostly statistical. In general, we do expect larger models to be "better" and more accurate overall. Ultimately, it depends on your use case and requirements, and we recommend starting with the default models (marked with a star below).

NameLanguageVocDepEntVecSizeLicense
en_core_web_sm English50 MBCC BY-SA
en_core_web_mdEnglish1 GBCC BY-SA
en_depent_web_mdEnglish328 MBCC BY-SA
en_vectors_glove_mdEnglish727 MBCC BY-SA
de_core_news_md German645 MBCC BY-SA
fr_depvec_web_lg French1.33 GBCC BY-NC

Alpha support

Work has started on the following languages. You can help by improving the existing language data and extending the tokenization patterns.

LanguageSource
Spanish esspacy/es
Italian itspacy/it
Portuguese ptspacy/pt
Dutch nlspacy/nl
Swedish svspacy/sv
Finnish fispacy/fi
Norwegian Bokmål nbspacy/nb
Hungarian huspacy/hu
Bengali bnspacy/bn
Hebrew hespacy/he
Chinese zhspacy/zh
Japanese jaspacy/ja
Read next: Philosophy