Loading a language processing pipeline
The standard entry point into spaCy is the
spacy.load() function, which constructs a language processing pipeline. The standard variable name for the language processing pipeline is
nlp, for Natural Language Processing. The
nlp variable is usually an instance of class
spacy.language.Language. For English, the
spacy.en.English class is the default.
You'll use the nlp instance to produce
Doc objects. You'll then use the
Doc object to access linguistic annotations to help you with whatever text processing task you're
trying to do.
import spacy # See "Installing spaCy" nlp = spacy.load('en') # You are here. doc = nlp(u'Hello, spacy!') # See "Using the pipeline" print([(w.text, w.pos_) for w in doc]) # See "Doc, Span and Token"
load function takes the following positional arguments:
| An ID that is resolved to a class or factory function by |
All keyword arguments are passed forward to the pipeline factory. No
keyword arguments are required. The built-in factories (e.g.
spacy.de.German), which are subclasses of
Language , respond to the following keyword arguments:
| Where to load the data from. If None, the default data path is fetched via |
|A sequence of functions that take the Doc object and modify it in-place. See Customizing the pipeline.|
| Callback to construct the pipeline sequence. It should accept the |
|A function that takes the input and returns a document object.|
| Callback to construct the |
|Supply a pre-built Vocab instance, instead of constructing one.|
| Callback that installs word vectors into the Vocab instance. The |
|Supply a pre-built tagger, instead of creating one.|
|Supply a pre-built parser, instead of creating one.|
|Supply a pre-built entity recognizer, instead of creating one.|
|Supply a pre-built matcher, instead of creating one.|