pySBD is 'real-world' sentence segmenter which extracts reasonable sentences when the format and domain of the input text are unknown. It is a rules-based algorithm based on The Golden Rules - a set of tests to check accuracy of segmenter in regards to edge case scenarios developed by TM-Town dev team. pySBD is python port of ruby gem Pragmatic Segmenter.
from pysbd.util import PySBDFactory nlp = spacy.blank('en') nlp.add_pipe(PySBDFactory(nlp)) doc = nlp('My name is Jonas E. Smith. Please turn to p. 55.') print(list(doc.sents)) # [My name is Jonas E. Smith., Please turn to p. 55.]
Submit your project
If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the
project idea label on the issue tracker.