Fuzzy matching and more for spaCy.

Spaczz provides fuzzy matching and multi-token regex matching functionality for spaCy. Spaczz's components have similar APIs to their spaCy counterparts and spaczz pipeline components can integrate into spaCy pipelines where they can be saved/loaded as models.


import spacy from spaczz.pipeline import SpaczzRuler nlp = spacy.blank('en') ruler = SpaczzRuler(nlp) ruler.add_patterns([{'label': 'PERSON', 'pattern': 'Bill Gates', 'type': 'fuzzy'}]) nlp.add_pipe(ruler) doc = nlp('Oops, I spelled Bill Gatez wrong.') print([(ent.text, ent.start, ent.end, ent.label_) for ent in doc.ents])
View more
Author info

Grant Andersen


Categories pipeline

Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea label on the issue tracker.

Read the docsJSON source