Facts & Figures
Here's a quick comparison of the functionalities offered by spaCy, SyntaxNet, NLTK and CoreNLP.
|Integrated word vectors|
Two peer-reviewed papers in 2015 confirm that spaCy offers the fastest syntactic parser in the world and that its accuracy is within 1% of the best available. The few systems that are more accurate are 20× slower or more.
In 2016, Google released their SyntaxNet library, setting a new state of the art for syntactic dependency parsing accuracy. SyntaxNet's algorithm is very similar to spaCy's. The main difference is that SyntaxNet uses a neural network while spaCy uses a sparse linear model.
|Martins et al. (2013)||93.1||88.23||94.21|
|Zhang and McDonald (2014)||93.32||88.65||93.37|
|Weiss et al. (2015)||93.91||89.29||94.17|
|Andor et al. (2016)||94.44||90.17||95.4|
Detailed speed comparison
Here we compare the per-document processing time of various spaCy functionalities against other NLP libraries. We show both absolute timings (in ms) and relative performance (normalized to spaCy). Lower is better.
|Absolute (ms per doc)||Relative (to spaCy)|
Named entity comparison
Jiang et al. (2016) present several detailed comparisons of the named entity recognition models provided by spaCy, CoreNLP, NLTK and LingPipe. Here we show their evaluation of person, location and organization accuracy on Wikipedia.