Get started

Facts & Figures

The hard numbers for spaCy and how it compares to other tools

Feature comparison

Here’s a quick comparison of the functionalities offered by spaCy, NLTK and CoreNLP.

spaCyNLTKCoreNLP
Programming languagePythonPythonJava / Python
Neural network models
Integrated word vectors
Multi-language support
Tokenization
Part-of-speech tagging
Sentence segmentation
Dependency parsing
Entity recognition
Entity linking
Coreference resolution

When should I use what?

Natural Language Understanding is an active area of research and development, so there are many different tools or technologies catering to different use-cases. The table below summarizes a few libraries (spaCy, NLTK, AllenNLP, StanfordNLP and TensorFlow) to help you get a feel for things fit together.

spaCyNLTKAllen-
NLP
Stanford-
NLP
Tensor-
Flow
I’m a beginner and just getting started with NLP.
I want to build an end-to-end production application.
I want to try out different neural network architectures for NLP.
I want to try the latest models with state-of-the-art accuracy.
I want to train models from my own data.
I want my application to be efficient on CPU.

Benchmarks

Two peer-reviewed papers in 2015 confirmed that spaCy offers the fastest syntactic parser in the world and that its accuracy is within 1% of the best available. The few systems that are more accurate are 20× slower or more.

SystemYearLanguageAccuracySpeed (wps)
spaCy v2.x2017Python / Cython92.6n/a
spaCy v1.x2015Python / Cython91.813,963
ClearNLP2015Java91.710,271
CoreNLP2015Java89.68,602
MATE2015Java92.5550
Turbo2015C++92.4349

Algorithm comparison

In this section, we compare spaCy’s algorithms to recently published systems, using some of the most popular benchmarks. These benchmarks are designed to help isolate the contributions of specific algorithmic decisions, so they promote slightly “idealized” conditions. Specifically, the text comes pre-processed with “gold standard” token and sentence boundaries. The data sets also tend to be fairly small, to help researchers iterate quickly. These conditions mean the models trained on these data sets are not always useful for practical purposes.

Parse accuracy (Penn Treebank / Wall Street Journal)

This is the “classic” evaluation, so it’s the number parsing researchers are most easily able to put in context. However, it’s quite far removed from actual usage: it uses sentences with gold-standard segmentation and tokenization, from a pretty specific type of text (articles from a single newspaper, 1984-1989).

SystemYearTypeAccuracy
spaCy v2.0.02017neural94.48
spaCy v1.1.02016linear92.80
Dozat and Manning2017neural95.75
Andor et al.2016neural94.44
SyntaxNet Parsey McParseface2016neural94.15
Weiss et al.2015neural93.91
Zhang and McDonald2014linear93.32
Martins et al.2013linear93.10

NER accuracy (OntoNotes 5, no pre-process)

This is the evaluation we use to tune spaCy’s parameters to decide which algorithms are better than the others. It’s reasonably close to actual usage, because it requires the parses to be produced from raw text, without any pre-processing.

SystemYearTypeAccuracy
spaCy en_core_web_lg v2.0.0a3 2017neural85.85
Strubell et al. 2017neural86.81
Chiu and Nichols 2016neural86.19
Durrett and Klein 2014neural84.04
Ratinov and Roth 2009linear83.45

Model comparison

In this section, we provide benchmark accuracies for the pretrained model pipelines we distribute with spaCy. Evaluations are conducted end-to-end from raw text, with no “gold standard” pre-processing, over text from a mix of genres where possible.

English

ModelspaCyTypeUASNER FPOSWPSSize
en_core_web_sm 2.0.02.xneural91.785.397.010.1k35MB
en_core_web_md 2.0.02.xneural91.785.997.110.0k115MB
en_core_web_lg 2.0.02.xneural91.985.997.210.0k812MB
en_core_web_sm 1.2.01.xlinear86.678.596.625.7k50MB
en_core_web_md 1.2.11.xlinear90.681.496.718.8k1GB

Spanish

ModelspaCyTypeUASNER FPOSWPSSize
es_core_news_sm 2.0.02.xneural89.888.796.9n/a35MB
es_core_news_md 2.0.02.xneural90.289.097.8n/a93MB
es_core_web_md 1.1.01.xlinear87.594.296.7n/a377MB

Detailed speed comparison

Here we compare the per-document processing time of various spaCy functionalities against other NLP libraries. We show both absolute timings (in ms) and relative performance (normalized to spaCy). Lower is better.

Absolute (ms per doc)Relative (to spaCy)
SystemTokenizeTagParseTokenizeTagParse
spaCy0.2ms1ms19ms1x1x1x
CoreNLP0.18ms10ms49ms0.9x10x2.6x
ZPar1ms8ms850ms5x8x44.7x
NLTK4ms443msn/a20x443xn/a