presque
Normalizer for French with focus on online and informal communication, peùUUUt-èTRE becomes peut-être, voilaaaa becomes voilà. it also harmonizes inclusive language (the user can chose how): by default, auteur-rice-s-x et relecteur.xrices becomes auteur·ricexs et relecteur·ricexs.
Example
import spacy import presque @spacy.Language.factory('presque_normalizer') def create_presque_normalizer(nlp, name='presque_normalizer'): return presque.Normalizer(nlp=nlp) nlp = spacy.load('fr_core_news_lg') nlp.add_pipe('presque_normalizer', first=True)
GitHubthjbdvlt/presque
Categories pipeline
Found a mistake or something isn't working?
If you've come across a universe project that isn't working or is incompatible with the reported spaCy version, let us know by opening a discussion thread.
Submit your project
If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea section in Discussions.