Use the latest Stanza (StanfordNLP) research models directly in spaCy

This package wraps the Stanza (formerly StanfordNLP) library, so you can use Stanford's models as a spaCy pipeline. Using this wrapper, you'll be able to use the following annotations, computed by your pretrained stanza model:

  • Statistical tokenization (reflected in the Doc and its tokens)
  • Lemmatization (token.lemma and token.lemma_)
  • Part-of-speech tagging (token.tag, token.tag_, token.pos, token.pos_)
  • Dependency parsing (token.dep, token.dep_, token.head)
  • Named entity recognition (doc.ents, token.ent_type, token.ent_type_, token.ent_iob, token.ent_iob_)
  • Sentence segmentation (doc.sents)


import stanza from spacy_stanza import StanzaLanguage snlp = stanza.Pipeline(lang="en") nlp = StanzaLanguage(snlp) doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.") for token in doc: print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_) print(doc.ents)
Author info



Categories pipeline standalone models research

Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea label on the issue tracker.

Read the docsJSON source