Industrial-Strength
Natural Language
Processing

in Python

Get things done

spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. We like to think of spaCy as the Ruby on Rails of Natural Language Processing.

Get started

Blazing fast

spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research in 2015 found spaCy to be the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.

Facts & Figures

Deep learning

spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python's awesome AI ecosystem. With spaCy, you can easily construct linguistically sophisticated statistical models for a variety of NLP problems.

Read more

Edit the code & try spaCy

# pip install spacy # python -m spacy download en_core_web_sm import spacy # Load English tokenizer, tagger, parser, NER and word vectors nlp = spacy.load("en_core_web_sm") # Process whole documents text = ("When Sebastian Thrun started working on self-driving cars at " "Google in 2007, few people outside of the company took him " "seriously. “I can tell you very senior CEOs of major American " "car companies would shake my hand and turn away because I wasn’t " "worth talking to,” said Thrun, in an interview with Recode earlier " "this week.") doc = nlp(text) # Analyze syntax print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks]) print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"]) # Find named entities, phrases and concepts for entity in doc.ents: print(entity.text, entity.label_)

Features

  • Non-destructive tokenization
  • Named entity recognition
  • Support for 52+ languages
  • 19 statistical models for 9 languages
  • Pre-trained word vectors
  • State-of-the-art speed
  • Easy deep learning integration
  • Part-of-speech tagging
  • Labelled dependency parsing
  • Syntax-driven sentence segmentation
  • Built in visualizers for syntax and NER
  • Convenient string-to-hash mapping
  • Export to numpy data arrays
  • Efficient binary serialization
  • Easy model packaging and deployment
  • Robust, rigorously evaluated accuracy

Watch the videos
spaCy IRL 2019: Two days of NLP

We were pleased to invite the spaCy community and other folks working on Natural Language Processing to Berlin this summer for a small and intimate event July 6, 2019. We booked a beautiful venue, hand-picked an awesome lineup of speakers and scheduled plenty of social time to get to know each other and exchange ideas. The YouTube playlist includes 12 talks about NLP research, development and applications, with keynotes by Sebastian Ruder (DeepMind) and Yoav Goldberg (Allen AI).

From the makers of spaCy
Prodigy: Radically efficient machine teaching

Prodigy is an annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster. Stream in your own examples or real-world data from live APIs, update your model in real-time and chain models together to build more complex systems.

spaCy is trusted by

and many more

Featured on

New in v2.1
BERT-style language model pretraining

Learn more from small training corpora by initializing your models with knowledge from raw text. The new pretrain command teaches spaCy's CNN model to predict words based on their context, producing representations of words in contexts. If you've seen Google's BERT system or fast.ai's ULMFiT, spaCy's pretraining is similar – but much more efficient. It's still experimental, but users are already reporting good results, so give it a try!

Benchmarks

In 2015, independent researchers from Emory University and Yahoo! Labs showed that spaCy offered the fastest syntactic parser in the world and that its accuracy was within 1% of the best available (Choi et al., 2015). spaCy v2.0, released in 2017, is more accurate than any of the systems Choi et al. evaluated.

See details

SystemYearLanguageAccuracySpeed (wps)
spaCy v2.x2017Python / Cython92.6n/a
spaCy v1.x2015Python / Cython91.813,963
ClearNLP2015Java91.710,271
CoreNLP2015Java89.68,602
MATE2015Java92.5550
Turbo2015C++92.4349