Sentencizer
classA simple pipeline component, to allow custom sentence boundary detection logic
that doesn’t require the dependency parse. By default, sentence segmentation is
performed by the DependencyParser
, so the
Sentencizer
lets you implement a simpler, rule-based strategy that doesn’t
require a statistical model to be loaded. The component is also available via
the string name "sentencizer"
. After initialization, it is typically added to
the processing pipeline using nlp.add_pipe
.
Sentencizer.__init__ method
Initialize the sentencizer.
Name | Type | Description |
---|---|---|
punct_chars | list | Optional custom list of punctuation characters that mark sentence ends. Defaults to [".", "!", "?"]. |
RETURNS | Sentencizer | The newly constructed object. |
Sentencizer.__call__ method
Apply the sentencizer on a Doc
. Typically, this happens automatically after
the component has been added to the pipeline using
nlp.add_pipe
.
Name | Type | Description |
---|---|---|
doc | Doc | The Doc object to process, e.g. the Doc in the pipeline. |
RETURNS | Doc | The modified Doc with added sentence boundaries. |
Sentencizer.to_disk method
Save the sentencizer settings (punctuation characters) a directory. Will create
a file sentencizer.json
. This also happens automatically when you save an
nlp
object with a sentencizer added to its pipeline.
Name | Type | Description |
---|---|---|
path | unicode / Path | A path to a file, which will be created if it doesn’t exist. Paths may be either strings or Path -like objects. |
Sentencizer.from_disk method
Load the sentencizer settings from a file. Expects a JSON file. This also
happens automatically when you load an nlp
object or model with a sentencizer
added to its pipeline.
Name | Type | Description |
---|---|---|
path | unicode / Path | A path to a JSON file. Paths may be either strings or Path -like objects. |
RETURNS | Sentencizer | The modified Sentencizer object. |
Sentencizer.to_bytes method
Serialize the sentencizer settings to a bytestring.
Name | Type | Description |
---|---|---|
RETURNS | bytes | The serialized data. |
Sentencizer.from_bytes method
Load the pipe from a bytestring. Modifies the object in place and returns it.
Name | Type | Description |
---|---|---|
bytes_data | bytes | The bytestring to load. |
RETURNS | Sentencizer | The modified Sentencizer object. |