Customizing the pipeline

spaCy provides several linguistic annotation functions by default. Each function takes a Doc object, and modifies it in-place. The default pipeline is [nlp.tagger, nlp.entity, nlp.parser]. spaCy 1.0 introduced the ability to customise this pipeline with arbitrary functions.

def arbitrary_fixup_rules(doc):
    for token in doc:
        if token.text == u'bill' and token.tag_ == u'NNP':
            token.tag_ = u'NN'

def custom_pipeline(nlp):
    return (nlp.tagger, arbitrary_fixup_rules, nlp.parser, nlp.entity)

nlp = spacy.load('en', create_pipeline=custom_pipeline)

The easiest way to customise the pipeline is to pass a create_pipeline callback to the spacy.load() function.

The callback you pass to create_pipeline should take a single argument, and return a sequence of callables. Each callable in the sequence should accept a Doc object and modify it in place.

Instead of passing a callback, you can also write to the .pipeline attribute directly.

nlp = spacy.load('en')
nlp.pipeline = [nlp.tagger]