Pipeline

Matcher

class
Match sequences of tokens, based on pattern rules

Matcher.__init__ method

Create the rule-based Matcher. If validate=True is set, all patterns added to the matcher will be validated against a JSON schema and a MatchPatternError is raised if problems are found. Those can include incorrect types (e.g. a string where an integer is expected) or unexpected property names.

NameTypeDescription
vocabVocabThe vocabulary object, which must be shared with the documents the matcher will operate on.
validate v2.1boolValidate all patterns added to this matcher.

Matcher.__call__ method

Find all token sequences matching the supplied patterns on the Doc.

NameTypeDescription
docDocThe document to match over.

Matcher.pipe method

Match a stream of documents, yielding them in turn.

NameTypeDescription
docsiterableA stream of documents.
batch_sizeintThe number of documents to accumulate into a working set.
return_matches v2.1boolYield the match lists along with the docs, making results (doc, matches) tuples.
as_tuplesboolInterpret the input stream as (doc, context) tuples, and yield (result, context) tuples out. If both return_matches and as_tuples are True, the output will be a sequence of ((doc, matches), context) tuples.

Matcher.__len__ methodv2.0

Get the number of rules added to the matcher. Note that this only returns the number of rules (identical with the number of IDs), not the number of individual patterns.

NameTypeDescription

Matcher.__contains__ methodv2.0

Check whether the matcher contains rules for a match ID.

NameTypeDescription
keyunicodeThe match ID.

Matcher.add methodv2.0

Add a rule to the matcher, consisting of an ID key, one or more patterns, and a callback function to act on the matches. The callback function will receive the arguments matcher, doc, i and matches. If a pattern already exists for the given ID, the patterns will be extended. An on_match callback will be overwritten.

NameTypeDescription
match_idunicodeAn ID for the thing you’re matching.
on_matchcallable or NoneCallback function to act on matches. Takes the arguments matcher, doc, i and matches.
*patternslistMatch pattern. A pattern consists of a list of dicts, where each dict describes a token.

Matcher.remove methodv2.0

Remove a rule from the matcher. A KeyError is raised if the match ID does not exist.

NameTypeDescription
keyunicodeThe ID of the match rule.

Matcher.get methodv2.0

Retrieve the pattern stored for a key. Returns the rule as an (on_match, patterns) tuple containing the callback and available patterns.

NameTypeDescription
keyunicodeThe ID of the match rule.