spaCy .NET Wrapper

SpacyDotNet is a .NET Core compatible wrapper for spaCy, based on Python.NET

This projects relies on Python.NET to interop with spaCy. It's not meant to be a complete and exhaustive implementation of all spaCy features and APIs. Although it should be enough for basic tasks, it's considered as a starting point if you need to build a complex project using spaCy in .NET Most of the basic features in Spacy101 are available. All Container classes are present (Doc, Token, Span and Lexeme) with their basic properties/methods running and also Vocab and StringStore in a limited form. Anyway, any developer should be ready to add the missing properties or classes in a very straightforward manner.


var spacy = new Spacy(); var nlp = spacy.Load("en_core_web_sm"); var doc = nlp.GetDocument("Apple is looking at buying U.K. startup for $1 billion"); foreach (Token token in doc.Tokens) Console.WriteLine($"{token.Text} {token.Lemma} {token.PoS} {token.Tag} {token.Dep} {token.Shape} {token.IsAlpha} {token.IsStop}"); Console.WriteLine(""); foreach (Span ent in doc.Ents) Console.WriteLine($"{ent.Text} {ent.StartChar} {ent.EndChar} {ent.Label}"); nlp = spacy.Load("en_core_web_md"); var tokens = nlp.GetDocument("dog cat banana afskfsd"); Console.WriteLine(""); foreach (Token token in tokens.Tokens) Console.WriteLine($"{token.Text} {token.HasVector} {token.VectorNorm}, {token.IsOov}"); tokens = nlp.GetDocument("dog cat banana"); Console.WriteLine(""); foreach (Token token1 in tokens.Tokens) { foreach (Token token2 in tokens.Tokens) Console.WriteLine($"{token1.Text} {token2.Text} {token1.Similarity(token2) }"); } doc = nlp.GetDocument("I love coffee"); Console.WriteLine(""); Console.WriteLine(doc.Vocab.Strings["coffee"]); Console.WriteLine(doc.Vocab.Strings[3197928453018144401]); Console.WriteLine(""); foreach (Token word in doc.Tokens) { var lexeme = doc.Vocab[word.Text]; Console.WriteLine($@"{lexeme.Text} {lexeme.Orth} {lexeme.Shape} {lexeme.Prefix} {lexeme.Suffix} {lexeme.IsAlpha} {lexeme.IsDigit} {lexeme.IsTitle} {lexeme.Lang}"); }
Author info

Antonio Miras


Categories nonpython

