Industrial-Strength
Natural Language
Processing

in Python

Fastest in the world

spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.

Facts & figures

Get things done

spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. I like to think of spaCy as the Ruby on Rails of Natural Language Processing.

Get started

Deep learning

spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, Keras, Scikit-Learn, Gensim and the rest of Python's awesome AI ecosystem. spaCy helps you connect the statistical models trained by these libraries to the rest of your application.

Read more
Latest release: v1.2 The results of the spaCy user survey
lightning_tour.py
# Install: pip install spacy && python -m spacy.en.download
import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en')

# Process a document, of any size
text = open('war_and_peace.txt').read()
doc = nlp(text)

# Hook in your own deep learning models
similarity_model = load_my_neural_network()
def install_similarity(doc):
    doc.user_hooks['similarity'] = similarity_model
nlp.pipeline.append(install_similarity)

doc1 = nlp(u'the fries were gross')
doc2 = nlp(u'worst fries ever')
doc1.similarity(doc2)

Features

  • Non-destructive tokenization
  • Syntax-driven sentence segmentation
  • Pre-trained word vectors
  • Part-of-speech tagging
  • Named entity recognition
  • Labelled dependency parsing
  • Convenient string-to-int mapping
  • Export to numpy data arrays
  • GIL-free multi-threading
  • Efficient binary serialization
  • Easy deep learning integration
  • Statistical models for English and German
  • State-of-the-art speed
  • Robust, rigorously evaluated accuracy

spaCy is trusted by

quorachartbeatduedilstitchfix
wayblazerindicochattermillturikip
socratacytorasignalnwonderflowsynapsify

What's spaCy all about?

By 2014, I'd been publishing NLP research for about 10 years. During that time, I saw a huge gap open between the technology that Google-sized companies could take to market, and what was available to everyone else. This was especially clear when companies started trying to use my research. Like most researchers, my work was free to read, but expensive to apply. You could run my code, but its requirements were narrow. My code's mission in life was to print results tables for my papers — it was good at this job, and bad at all others.

spaCy's mission is to make cutting-edge NLP practical and commonly available. That's why I left academia in 2014, to build a production-quality open-source NLP library. It's why Ines joined the project in 2015, to build visualisations, demos and annotation tools that make NLP technologies less abstract and easier to use. Together, we've founded Explosion AI, to develop data packs you can drop into spaCy to extend its capabilities. If you're processing Hindi insurance claims, you need a model for that. We can build it for you.