pySBD - python Sentence Boundary Disambiguation

Rule-based sentence boundary detection that works out-of-the-box

pySBD is 'real-world' sentence segmenter which extracts reasonable sentences when the format and domain of the input text are unknown. It is a rules-based algorithm based on The Golden Rules - a set of tests to check accuracy of segmenter in regards to edge case scenarios developed by TM-Town dev team. pySBD is python port of ruby gem Pragmatic Segmenter.


from pysbd.util import PySBDFactory nlp = spacy.blank('en') nlp.add_pipe(PySBDFactory(nlp)) doc = nlp('My name is Jonas E. Smith. Please turn to p. 55.') print(list(doc.sents)) # [My name is Jonas E. Smith., Please turn to p. 55.]
Author info

Nipun Sadvilkar


Categories scientific

Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea label on the issue tracker.

Read the docsJSON source