RONEC - Romanian Named Entity Corpus

Named Entity Recognition corpus for Romanian language.

The corpus holds 5127 sentences, annotated with 16 classes, with a total of 26376 annotated entities. The corpus comes into two formats: BRAT and CONLLUP.


# to train a new model on ronec python3 convert_spacy.py ronec/conllup/ronec.conllup output python3 -m spacy train ro models output/train_ronec.json output/train_ronec.json -p ent # download the Romanian NER model python -m spacy download ro_ner # load the model and print entities for a simple sentence import spacy nlp = spacy.load("ro_ner") doc = nlp("Popescu Ion a fost la Cluj") for ent in doc.ents: print(ent.text, ent.start_char, ent.end_char, ent.label_)
Author info

Stefan Daniel Dumitrescu, Andrei-Marius Avram


Categories standalone models

