SpanMarker
The SpanMarker integration with spaCy allows you to seamlessly replace the default spaCy "ner"
pipeline component with any SpanMarker model available on the Hugging Face Hub. Through this, you can take advantage of the advanced Named Entity Recognition capabilities of SpanMarker within the familiar and powerful spaCy framework.
By default, the span_marker
pipeline component uses a SpanMarker model using RoBERTa-large trained on OntoNotes v5.0. This model reaches a competitive 91.54 F1, notably higher than the 85.5 and 89.8 F1 from en_core_web_lg
and en_core_web_trf
, respectively. A short head-to-head between this SpanMarker model and the trf
spaCy model has been posted here.
Additionally, see here for documentation on using SpanMarker with spaCy.
View moreExample
import spacy nlp = spacy.load("en_core_web_sm", exclude=["ner"]) nlp.add_pipe("span_marker", config={"model": "tomaarsen/span-marker-roberta-large-ontonotes5"}) text = """Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the \ Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her \ death in 30 BCE.""" doc = nlp(text) print([(entity, entity.label_) for entity in doc.ents]) # [(Cleopatra VII, "PERSON"), (Cleopatra the Great, "PERSON"), (the Ptolemaic Kingdom of Egypt, "GPE"), # (69 BCE, "DATE"), (Egypt, "GPE"), (51 BCE, "DATE"), (30 BCE, "DATE")]
GitHubtomaarsen/SpanMarkerNER
Categories pipeline
standalone
scientific
Found a mistake or something isn't working?
If you've come across a universe project that isn't working or is incompatible with the reported spaCy version, let us know by opening a discussion thread.
Submit your project
If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea section in Discussions.