# Install: pip install spacy && python -m spacy download en import spacy # Load English tokenizer, tagger, parser, NER and word vectors nlp = spacy.load('en') # Process a document, of any size text = open('war_and_peace.txt').read() doc = nlp(text) # Hook in your own deep learning models similarity_model = load_my_neural_network() def install_similarity(doc): doc.user_hooks['similarity'] = similarity_model nlp.pipeline.append(install_similarity) doc1 = nlp(u'the fries were gross') doc2 = nlp(u'worst fries ever') doc1.similarity(doc2)
- Non-destructive tokenization
- Syntax-driven sentence segmentation
- Pre-trained word vectors
- Part-of-speech tagging
- Named entity recognition
- Labelled dependency parsing
- Convenient string-to-int mapping
- Export to numpy data arrays
- GIL-free multi-threading
- Efficient binary serialization
- Easy deep learning integration
- Statistical models for English and German
- State-of-the-art speed
- Robust, rigorously evaluated accuracy
What's spaCy all about?
By 2014, I'd been publishing NLP research for about 10 years. During that time, I saw a huge gap open between the technology that Google-sized companies could take to market, and what was available to everyone else. This was especially clear when companies started trying to use my research. Like most researchers, my work was free to read, but expensive to apply. You could run my code, but its requirements were narrow. My code's mission in life was to print results tables for my papers — it was good at this job, and bad at all others.
spaCy's mission is to make cutting-edge NLP practical and commonly available. That's why I left academia in 2014, to build a production-quality open-source NLP library. It's why Ines joined the project in 2015, to build visualisations, demos and annotation tools that make NLP technologies less abstract and easier to use. Together, we've founded Explosion AI, to develop data packs you can drop into spaCy to extend its capabilities. If you're processing Hindi insurance claims, you need a model for that. We can build it for you.