Matcher

Match sequences of tokens, based on pattern rules.

Matcher.load

Load the matcher and patterns from a file path.

NameTypeDescription
pathPathPath to a JSON-formatted patterns file.
vocabVocabThe vocabulary that the documents to match over will refer to.
returnMatcherThe newly constructed object.

Matcher.__init__

Create the Matcher.

NameTypeDescription
vocabVocab The vocabulary object, which must be shared with the documents the matcher will operate on.
patternsdictPatterns to add to the matcher.
returnMatcherThe newly constructed object.

Matcher.__call__

Find all token sequences matching the supplied patterns on the Doc.

NameTypeDescription
docDocThe document to match over.
returnlist A list of(entity_key, label_id, start, end) tuples, describing the matches. A match tuple describes a span doc[start:end]. The label_id and entity_key are both integers.

Matcher.pipe

Match a stream of documents, yielding them in turn.

NameTypeDescription
docs-A stream of documents.
batch_sizeintThe number of documents to accumulate into a working set.
n_threadsint The number of threads with which to work on the buffer in parallel, if the Matcher implementation supports multi-threading.
yieldDocDocuments, in order.

Matcher.add_entity

Add an entity to the matcher.

NameTypeDescription
entity_keyunicode / intAn ID for the entity.
attrs-Attributes to associate with the Matcher.
if_existsunicode 'raise', 'ignore' or 'update'. Controls what happens if the entity ID already exists. Defaults to 'raise'.
acceptor-Callback function to filter matches of the entity.
on_match-Callback function to act on matches of the entity.
returnNone-

Matcher.add_pattern

Add a pattern to the matcher.

NameTypeDescription
entity_keyunicode / intAn ID for the entity.
token_specs-Description of the pattern to be matched.
labelunicode / intLabel to assign to the matched pattern. Defaults to "".
returnNone-

Matcher.has_entity

Check whether the matcher has an entity.

NameTypeDescription
entity_keyunicode / intThe entity key to check.
returnboolWhether the matcher has the entity.