Facts & Figures
Feature comparison
Here's a quick comparison of the functionalities offered by spaCy, SyntaxNet, NLTK and CoreNLP.
| spaCy | SyntaxNet | NLTK | CoreNLP | |
|---|---|---|---|---|
| Programming language | Python | C++ | Python | Java |
| Neural network models | yes | yes | no | yes |
| Integrated word vectors | yes | no | no | no |
| Multi-language support | yes | yes | yes | yes |
| Tokenization | yes | yes | yes | yes |
| Part-of-speech tagging | yes | yes | yes | yes |
| Sentence segmentation | yes | yes | yes | yes |
| Dependency parsing | yes | yes | no | yes |
| Entity recognition | yes | no | yes | yes |
| Coreference resolution | no | no | no | yes |
Benchmarks
Two peer-reviewed papers in 2015 confirm that spaCy offers the fastest syntactic parser in the world and that its accuracy is within 1% of the best available. The few systems that are more accurate are 20× slower or more.
| System | Year | Language | Accuracy | Speed (wps) |
|---|---|---|---|---|
| spaCy v2.x | 2017 | Python / Cython | 92.6 | n/a This table shows speed as benchmarked by Choi et al. We therefore can't provide comparable figures, as we'd be running the benchmark on different hardware. |
| spaCy v1.x | 2015 | Python / Cython | 91.8 | 13,963 |
| ClearNLP | 2015 | Java | 91.7 | 10,271 |
| CoreNLP | 2015 | Java | 89.6 | 8,602 |
| MATE | 2015 | Java | 92.5 | 550 |
| Turbo | 2015 | C++ | 92.4 | 349 |
Algorithm comparison
In this section, we compare spaCy's algorithms to recently published systems, using some of the most popular benchmarks. These benchmarks are designed to help isolate the contributions of specific algorithmic decisions, so they promote slightly "idealised" conditions. Specifically, the text comes pre-processed with "gold standard" token and sentence boundaries. The data sets also tend to be fairly small, to help researchers iterate quickly. These conditions mean the models trained on these data sets are not always useful for practical purposes.
Parse accuracy (Penn Treebank / Wall Street Journal)
This is the "classic" evaluation, so it's the number parsing researchers are most easily able to put in context. However, it's quite far removed from actual usage: it uses sentences with gold-standard segmentation and tokenization, from a pretty specific type of text (articles from a single newspaper, 1984-1989).
| System | Year | Type | Accuracy |
|---|---|---|---|
| spaCy v2.0.0 | 2017 | neural | 94.48 |
| spaCy v1.1.0 | 2016 | linear | 92.80 |
| Dozat and Manning | 2017 | neural | 95.75 |
| Andor et al. | 2016 | neural | 94.44 |
| SyntaxNet Parsey McParseface | 2016 | neural | 94.15 |
| Weiss et al. | 2015 | neural | 93.91 |
| Zhang and McDonald | 2014 | linear | 93.32 |
| Martins et al. | 2013 | linear | 93.10 |
NER accuracy (OntoNotes 5, no pre-process)
This is the evaluation we use to tune spaCy's parameters are decide which algorithms are better than others. It's reasonably close to actual usage, because it requires the parses to be produced from raw text, without any pre-processing.
| System | Year | Type | Accuracy |
|---|---|---|---|
spaCy en_core_web_lg v2.0.0a3 | 2017 | neural | 85.85 |
| Strubell et al. | 2017 | neural | 86.81 |
| Chiu and Nichols | 2016 | neural | 86.19 |
| Durrett and Klein | 2014 | neural | 84.04 |
| Ratinov and Roth | 2009 | linear | 83.45 |
Model comparison
In this section, we provide benchmark accuracies for the pre-trained model pipelines we distribute with spaCy. Evaluations are conducted end-to-end from raw text, with no "gold standard" pre-processing, over text from a mix of genres where possible.
English
| Model | spaCy | Type | UAS | NER F | POS | WPS | Size |
|---|---|---|---|---|---|---|---|
en_core_web_sm 2.0.0 | 2.x | neural | 91.7 | 85.3 | 97.0 | 10.1k | 35MB |
en_core_web_md 2.0.0 | 2.x | neural | 91.7 | 85.9 | 97.1 | 10.0k | 115MB |
en_core_web_lg 2.0.0 | 2.x | neural | 91.9 | 85.9 | 97.2 | 10.0k | 812MB |
en_core_web_sm 1.2.0 | 1.x | linear | 86.6 | 78.5 | 96.6 | 25.7k | 50MB |
en_core_web_md 1.2.1 | 1.x | linear | 90.6 | 81.4 | 96.7 | 18.8k | 1GB |
Spanish
| Model | spaCy | Type | UAS | NER F | POS | WPS | Size |
|---|---|---|---|---|---|---|---|
es_core_news_sm 2.0.0 | 2.x | neural | 89.8 | 88.7 | 96.9 | n/a | 35MB |
es_core_news_md 2.0.0 | 2.x | neural | 90.2 | 89.0 | 97.8 | n/a | 93MB |
es_core_web_md 1.1.0 | 1.x | linear | 87.5 | 94.2 | 96.7 | n/a | 377MB |
Detailed speed comparison
Here we compare the per-document processing time of various spaCy functionalities against other NLP libraries. We show both absolute timings (in ms) and relative performance (normalized to spaCy). Lower is better.
| Absolute (ms per doc) | Relative (to spaCy) | |||||
|---|---|---|---|---|---|---|
| System | Tokenize | Tag | Parse | Tokenize | Tag | Parse |
| spaCy | 0.2ms | 1ms | 19ms | 1x | 1x | 1x |
| CoreNLP | 0.18ms | 10ms | 49ms | 0.9x | 10x | 2.6x |
| ZPar | 1ms | 8ms | 850ms | 5x | 8x | 44.7x |
| NLTK | 4ms | 443ms | n/a | 20x | 443x | n/a |
Powered by spaCy
Here's an overview of other tools and libraries that are using spaCy behind the scenes.
torchtext
allennlp
spaCy and other libraries
Data scientists, researchers and machine learning engineers have converged on Python as the language for AI. This gives developers a rich ecosystem of NLP libraries to work with. Here's how we think the pieces fit together.