Universe

SpanMarker

Effortless state-of-the-art NER in spaCy

SpanMarker on GitHubSpanMarker on GitHubSpanMarker on GitHub

The SpanMarker integration with spaCy allows you to seamlessly replace the default spaCy "ner" pipeline component with any SpanMarker model available on the Hugging Face Hub. Through this, you can take advantage of the advanced Named Entity Recognition capabilities of SpanMarker within the familiar and powerful spaCy framework.

By default, the span_marker pipeline component uses a SpanMarker model using RoBERTa-large trained on OntoNotes v5.0. This model reaches a competitive 91.54 F1, notably higher than the 85.5 and 89.8 F1 from en_core_web_lg and en_core_web_trf, respectively. A short head-to-head between this SpanMarker model and the trf spaCy model has been posted here.

Additionally, see here for documentation on using SpanMarker with spaCy.

Example

import spacy nlp = spacy.load("en_core_web_sm", exclude=["ner"]) nlp.add_pipe("span_marker", config={"model": "tomaarsen/span-marker-roberta-large-ontonotes5"}) text = """Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the \ Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her \ death in 30 BCE.""" doc = nlp(text) print([(entity, entity.label_) for entity in doc.ents]) # [(Cleopatra VII, "PERSON"), (Cleopatra the Great, "PERSON"), (the Ptolemaic Kingdom of Egypt, "GPE"), # (69 BCE, "DATE"), (Egypt, "GPE"), (51 BCE, "DATE"), (30 BCE, "DATE")]
View more
Author info

Tom Aarsen

GitHubtomaarsen/SpanMarkerNER

Categories pipeline standalone scientific

Found a mistake or something isn't working?

If you've come across a universe project that isn't working or is incompatible with the reported spaCy version, let us know by opening a discussion thread.


Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea section in Discussions.

Read the docsJSON source