Py impl of TextRank for lightweight phrase extraction

An implementation of TextRank in Python for use in spaCy pipelines which provides fast, effective phrase extraction from texts, along with extractive summarization. The graph algorithm works independent of a specific natural language and does not require domain knowledge. See (Mihalcea 2004) https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf


import spacy import pytextrank nlp = spacy.load('en_core_web_sm') tr = pytextrank.TextRank() nlp.add_pipe(tr.PipelineComponent, name='textrank', last=True) text = 'Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered.' doc = nlp(text) # examine the top-ranked phrases in the document for p in doc._.phrases: print('{:.4f} {:5d} {}'.format(p.rank, p.count, p.text)) print(p.chunks)

Paco Nathan


