Pipeline

Sentencizer

class

A simple pipeline component, to allow custom sentence boundary detection logic that doesn’t require the dependency parse. By default, sentence segmentation is performed by the DependencyParser, so the Sentencizer lets you implement a simpler, rule-based strategy that doesn’t require a statistical model to be loaded. The component is also available via the string name "sentencizer". After initialization, it is typically added to the processing pipeline using nlp.add_pipe.

Sentencizer.__init__ method

Initialize the sentencizer.

NameTypeDescription
punct_charslistOptional custom list of punctuation characters that mark sentence ends. Defaults to [".", "!", "?"].

Sentencizer.__call__ method

Apply the sentencizer on a Doc. Typically, this happens automatically after the component has been added to the pipeline using nlp.add_pipe.

NameTypeDescription
docDocThe Doc object to process, e.g. the Doc in the pipeline.

Sentencizer.to_disk method

Save the sentencizer settings (punctuation characters) a directory. Will create a file sentencizer.json. This also happens automatically when you save an nlp object with a sentencizer added to its pipeline.

NameTypeDescription
pathunicode / PathA path to a file, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects.

Sentencizer.from_disk method

Load the sentencizer settings from a file. Expects a JSON file. This also happens automatically when you load an nlp object or model with a sentencizer added to its pipeline.

NameTypeDescription
pathunicode / PathA path to a JSON file. Paths may be either strings or Path-like objects.

Sentencizer.to_bytes method

Serialize the sentencizer settings to a bytestring.

NameTypeDescription

Sentencizer.from_bytes method

Load the pipe from a bytestring. Modifies the object in place and returns it.

NameTypeDescription
bytes_databytesThe bytestring to load.