Python Automated Term Extraction

PyATE is a term extraction library written in Python using Spacy POS tagging with Basic, Combo Basic, C-Value, TermExtractor, and Weirdness.


import spacy from pyate.term_extraction_pipeline import TermExtractionPipeline nlp = spacy.load('en_core_web_sm') nlp.add_pipe(TermExtractionPipeline()) # source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994795/ string = 'Central to the development of cancer are genetic changes that endow these “cancer cells” with many of the hallmarks of cancer, such as self-sufficient growth and resistance to anti-growth and pro-death signals. However, while the genetic changes that occur within cancer cells themselves, such as activated oncogenes or dysfunctional tumor suppressors, are responsible for many aspects of cancer development, they are not sufficient. Tumor promotion and progression are dependent on ancillary processes provided by cells of the tumor environment but that are not necessarily cancerous themselves. Inflammation has long been associated with the development of cancer. This review will discuss the reflexive relationship between cancer and inflammation with particular focus on how considering the role of inflammation in physiologic processes such as the maintenance of tissue homeostasis and repair may provide a logical framework for understanding the connection between the inflammatory response and cancer.' doc = nlp(string) print(doc._.combo_basic.sort_values(ascending=False).head(5)) """""" dysfunctional tumor 1.443147 tumor suppressors 1.443147 genetic changes 1.386294 cancer cells 1.386294 dysfunctional tumor suppressors 1.298612 """"""
View more
Author info

Kevin Lu


Categories pipeline research

Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea label on the issue tracker.

Read the docsJSON source