
A spaCy Package for Romanian Legal Document Processing
This is a spaCy language model for Romanian legal domain trained with floret 4-gram to 5-gram embeddings and LEGAL
entity recognition. Useful for processing OCR-resulted noisy legal documents.
Example
import spacy nlp = spacy.load("ro_legal_fl") doc = nlp("Titlul III din LEGEA nr. 255 din 19 iulie 2013, publicată în MONITORUL OFICIAL") # legal entity identification for entity in doc.ents: print('entity: ', entity, '; entity type: ', entity.label_) # floret n-gram embeddings robust to typos print(nlp('achizit1e public@').similarity(nlp('achiziții publice'))) # 0.7393895566928835 print(nlp('achizitii publice').similarity(nlp('achiziții publice'))) # 0.8996480808279399
GitHubsenisioi/rolegal
Found a mistake or something isn't working?
If you've come across a universe project that isn't working or is incompatible with the reported spaCy version, let us know by opening a discussion thread.
Submit your project
If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea section in Discussions.