Universe

A spaCy Package for Romanian Legal Document Processing Logo

A spaCy Package for Romanian Legal Document Processing

rolegal: a spaCy Package for Noisy Romanian Legal Document Processing

A spaCy Package for Romanian Legal Document Processing on GitHubA spaCy Package for Romanian Legal Document Processing on GitHubA spaCy Package for Romanian Legal Document Processing on GitHub

This is a spaCy language model for Romanian legal domain trained with floret 4-gram to 5-gram embeddings and LEGAL entity recognition. Useful for processing OCR-resulted noisy legal documents.

Example

import spacy nlp = spacy.load("ro_legal_fl") doc = nlp("Titlul III din LEGEA nr. 255 din 19 iulie 2013, publicată în MONITORUL OFICIAL") # legal entity identification for entity in doc.ents: print('entity: ', entity, '; entity type: ', entity.label_) # floret n-gram embeddings robust to typos print(nlp('achizit1e public@').similarity(nlp('achiziții publice'))) # 0.7393895566928835 print(nlp('achizitii publice').similarity(nlp('achiziții publice'))) # 0.8996480808279399
Author info

Sergiu Nisioi

GitHubsenisioi/rolegal

Categories pipeline training models

Found a mistake or something isn't working?

If you've come across a universe project that isn't working or is incompatible with the reported spaCy version, let us know by opening a discussion thread.


Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea section in Discussions.

Read the docsJSON source