A Danish lemmatizer

Lemmy is a lemmatizer for Danish 🇩🇰 . It comes already trained on Dansk Sprognævns (DSN) word list (‘fuldformliste’) and the Danish Universal Dependencies and is ready for use. Lemmy also supports training on your own dataset. The model currently included in Lemmy was evaluated on the Danish Universal Dependencies dev dataset and scored an accruacy > 99%.

You can use Lemmy as a spaCy extension, more specifcally a spaCy pipeline component. This is highly recommended and makes the lemmas easily accessible from the spaCy tokens. Lemmy makes use of POS tags to predict the lemmas. When wired up to the spaCy pipeline, Lemmy has the benefit of using spaCy’s builtin POS tagger.


import da_custom_model as da # name of your spaCy model import lemmy.pipe nlp = da.load() # create an instance of Lemmy's pipeline component for spaCy pipe = lemmy.pipe.load() # add the comonent to the spaCy pipeline. nlp.add_pipe(pipe, after='tagger') # lemmas can now be accessed using the `._.lemma` attribute on the tokens nlp("akvariernes")[0]._.lemma
Author info

Søren Lind Kristiansen


Categories pipeline

Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea label on the issue tracker.

Read the docsJSON source