scikit

Top-level Functions

spacy.load
function
Needs model To use this functionality, spaCy needs a model to be installed.

Load a model via its shortcut link, the name of an installed model package, a unicode path or a Path-like object. spaCy will try resolving the load argument in this order. If a model is loaded from a shortcut link or package name, spaCy will assume it's a Python package and import it and call the model's own load() method. If a model is loaded from a path, spaCy will assume it's a data directory, read the language and pipeline settings off the meta.json and initialise the Language class. The data will be loaded in via Language.from_disk() .

NameTypeDescription
nameunicode or PathModel to load, i.e. shortcut link, package name or path.
disablelist Names of pipeline components to disable.
returnsLanguageA Language object with the loaded model.

Essentially, spacy.load() is a convenience wrapper that reads the language ID and pipeline components from a model's meta.json, initialises the Language class, loads in the model data and returns it.

Abstract example

cls = util.get_lang_class(lang) # get language for ID, e.g. 'en' nlp = cls() # initialise the language for name in pipeline: component = nlp.create_pipe(name) # create each pipeline component nlp.add_pipe(component) # add component to pipeline nlp.from_disk(model_data_path) # load in model data

spacy.blank
function
v2.0 This feature is new and was introduced in spaCy v2.0

Create a blank model of a given language class. This function is the twin of spacy.load().

NameTypeDescription
nameunicode ISO code of the language class to load.
disablelist Names of pipeline components to disable.
returnsLanguageAn empty Language object of the appropriate subclass.

spacy.info
function

The same as the info command . Pretty-print information about your installation, models and local setup from within spaCy. To get the model meta data as a dictionary instead, you can use the meta attribute on your nlp object with a loaded model, e.g. nlp.meta.

NameTypeDescription
modelunicodeA model, i.e. shortcut link, package name or path (optional).
markdownboolPrint information as Markdown.

spacy.explain
function

Get a description for a given POS tag, dependency label or entity type. For a list of available terms, see glossary.py .

NameTypeDescription
termunicodeTerm to explain.
returnsunicodeThe explanation, or None if not found in the glossary.

spacy.prefer_gpu
function
v2.0.14 This feature is new and was introduced in spaCy v2.0.14

Allocate data and perform operations on GPU, if available. If data has already been allocated on CPU, it will not be moved. Ideally, this function should be called right after importing spaCy and before loading any models.

NameTypeDescription
returnsboolWhether the GPU was activated.

spacy.require_gpu
function
v2.0.14 This feature is new and was introduced in spaCy v2.0.14

Allocate data and perform operations on GPU. Will raise an error if no GPU is available. If data has already been allocated on CPU, it will not be moved. Ideally, this function should be called right after importing spaCy and before loading any models.

NameTypeDescription
returnsboolTrue

displaCySource

As of v2.0, spaCy comes with a built-in visualization suite. For more info and examples, see the usage guide on visualizing spaCy.

displacy.serve
method
v2.0 This feature is new and was introduced in spaCy v2.0

Serve a dependency parse tree or named entity visualization to view it in your browser. Will run a simple web server.

NameTypeDescriptionDefault
docslist, Doc, SpanDocument(s) to visualize.
styleunicodeVisualization style, 'dep' or 'ent'.'dep'
pageboolRender markup as full HTML page.True
minifyboolMinify HTML markup.False
optionsdictVisualizer-specific options, e.g. colors.{}
manualbool Don't parse Doc and instead, expect a dict or list of dicts. See here for formats and examples.False
portintPort to serve visualization.5000

displacy.render
method
v2.0 This feature is new and was introduced in spaCy v2.0

Render a dependency parse tree or named entity visualization.

NameTypeDescriptionDefault
docslist, Doc, SpanDocument(s) to visualize.
styleunicodeVisualization style, 'dep' or 'ent'.'dep'
pageboolRender markup as full HTML page.False
minifyboolMinify HTML markup.False
jupyterbool Explicitly enable "Jupyter mode" to return markup ready to be rendered in a notebook.detected automatically
optionsdictVisualizer-specific options, e.g. colors.{}
manualbool Don't parse Doc and instead, expect a dict or list of dicts. See here for formats and examples.False
returnsunicodeRendered HTML markup.

Visualizer options

The options argument lets you specify additional settings for each visualizer. If a setting is not present in the options, the default value will be used.

Dependency Visualizer options

NameTypeDescriptionDefault
collapse_punctbool Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation.True
collapse_phrasesboolMerge noun phrases into one token.False
compactbool"Compact mode" with square arrows that takes up less space.False
colorunicodeText color (HEX, RGB or color names).'#000000'
bgunicodeBackground color (HEX, RGB or color names).'#ffffff'
fontunicodeFont name or font family for all text.'Arial'
offset_xintSpacing on left side of the SVG in px.50
arrow_strokeintWidth of arrow path in px.2
arrow_widthintWidth of arrow head in px.10 / 8 (compact)
arrow_spacingintSpacing between arrows in px to avoid overlaps.20 / 12 (compact)
word_spacingintVertical spacing between words and arcs in px.45
distanceintDistance between words in px.175 / 85 (compact)

Named Entity Visualizer options

NameTypeDescriptionDefault
entslist Entity types to highlight (None for all types).None
colorsdict Color overrides. Entity types in uppercase should be mapped to color names or values.{}

By default, displaCy comes with colours for all entity types supported by spaCy. If you're using custom entity types, you can use the colors setting to add your own colours for them.

Utility functionsSource

spaCy comes with a small collection of utility functions located in spacy/util.py . Because utility functions are mostly intended for internal use within spaCy, their behaviour may change with future releases. The functions documented on this page should be safe to use and we'll try to ensure backwards compatibility. However, we recommend having additional tests in place if your application depends on any of spaCy's utilities.

util.get_data_path
function

Get path to the data directory where spaCy looks for models. Defaults to spacy/data.

NameTypeDescription
require_existsboolOnly return path if it exists, otherwise return None.
returnsPath / NoneData path or None.

util.set_data_path
function

Set custom path to the data directory where spaCy looks for models.

NameTypeDescription
pathunicode or PathPath to new data directory.

util.get_lang_class
function

Import and load a Language class. Allows lazy-loading language data and importing languages using the two-letter language code. To add a language code for a custom language class, you can use the set_lang_class helper.

NameTypeDescription
langunicodeTwo-letter language code, e.g. 'en'.
returnsLanguageLanguage class.

util.set_lang_class
function

Set a custom Language class name that can be loaded via get_lang_class . If your model uses a custom language, this is required so that spaCy can load the correct class from the two-letter language code.

NameTypeDescription
nameunicodeTwo-letter language code, e.g. 'en'.
clsLanguageThe language class, e.g. English.

util.load_model
function
v2.0 This feature is new and was introduced in spaCy v2.0

Load a model from a shortcut link, package or data path. If called with a shortcut link or package name, spaCy will assume the model is a Python package and import and call its load() method. If called with a path, spaCy will assume it's a data directory, read the language and pipeline settings from the meta.json and initialise a Language class. The model data will then be loaded in via Language.from_disk() .

NameTypeDescription
nameunicodePackage name, shortcut link or model path.
**overrides-Specific overrides, like pipeline components to disable.
returnsLanguageLanguage class with the loaded model.

util.load_model_from_path
function
v2.0 This feature is new and was introduced in spaCy v2.0

Load a model from a data directory path. Creates the Language class and pipeline based on the directory's meta.json and then calls from_disk() with the path. This function also makes it easy to test a new model that you haven't packaged yet.

NameTypeDescription
model_pathunicodePath to model data directory.
metadict Model meta data. If False, spaCy will try to load the meta from a meta.json in the same directory.
**overrides-Specific overrides, like pipeline components to disable.
returnsLanguageLanguage class with the loaded model.

util.load_model_from_init_py
function
v2.0 This feature is new and was introduced in spaCy v2.0

A helper function to use in the load() method of a model package's __init__.py .

NameTypeDescription
init_fileunicodePath to model's __init__.py, i.e. __file__.
**overrides-Specific overrides, like pipeline components to disable.
returnsLanguageLanguage class with the loaded model.

util.get_model_meta
function
v2.0 This feature is new and was introduced in spaCy v2.0

Get a model's meta.json from a directory path and validate its contents.

NameTypeDescription
pathunicode or PathPath to model directory.
returnsdictThe model's meta data.

util.is_package
function

Check if string maps to a package installed via pip. Mainly used to validate model packages.

NameTypeDescription
nameunicodeName of package.
returnsboolTrue if installed package, False if not.

util.get_package_path
function
v2.0 This feature is new and was introduced in spaCy v2.0

Get path to an installed package. Mainly used to resolve the location of model packages. Currently imports the package to find its path.

NameTypeDescription
package_nameunicodeName of installed package.
returnsPathPath to model package directory.

util.is_in_jupyter
function
v2.0 This feature is new and was introduced in spaCy v2.0

Check if user is running spaCy from a Jupyter notebook by detecting the IPython kernel. Mainly used for the displacy visualizer.

NameTypeDescription
returnsboolTrue if in Jupyter, False if not.

util.update_exc
function

Update, validate and overwrite tokenizer exceptions. Used to combine global exceptions with custom, language-specific exceptions. Will raise an error if key doesn't match ORTH values.

NameTypeDescription
base_exceptionsdictBase tokenizer exceptions.
*addition_dictsdictsException dictionaries to add to the base exceptions, in order.
returnsdictCombined tokenizer exceptions.

util.prints
function
v2.0 This feature is new and was introduced in spaCy v2.0

Print a formatted, text-wrapped message with optional title. If a text argument is a Path, it's converted to a string. Should only be used for interactive components like the command-line interface.

NameTypeDescription
*textsunicodeTexts to print. Each argument is rendered as paragraph.
**kwargs- title is rendered as coloured headline. exits performs system exit after printing, using the value of the argument as the exit code, e.g. exits=1.

util.minibatch
function
v2.0 This feature is new and was introduced in spaCy v2.0

Iterate over batches of items. size may be an iterator, so that batch-size can vary on each step.

NameTypeDescription
itemsiterableThe items to batch up.
sizeint / iterable The batch size(s). Use util.compounding or util.decaying or for an infinite series of compounding or decaying values.
yieldslistThe batches.

util.compounding
function
v2.0 This feature is new and was introduced in spaCy v2.0

Yield an infinite series of compounding values. Each time the generator is called, a value is produced by multiplying the previous value by the compound rate.

NameTypeDescription
startint / floatThe first value.
stopint / floatThe maximum value.
compoundint / floatThe compounding factor.
yieldsintCompounding values.

util.decaying
function
v2.0 This feature is new and was introduced in spaCy v2.0

Yield an infinite series of linearly decaying values.

NameTypeDescription
startint / floatThe first value.
endint / floatThe maximum value.
decayint / floatThe decaying factor.
yieldsintThe decaying values.

util.itershuffle
function
v2.0 This feature is new and was introduced in spaCy v2.0

Shuffle an iterator. This works by holding bufsize items back and yielding them sometime later. Obviously, this is not unbiased – but should be good enough for batching. Larger bufsize means less bias.

NameTypeDescription
iterableiterableIterator to shuffle.
buffsizeintItems to hold back.
yieldsiterableThe shuffled iterator.

Compatibility functionsSource

All Python code is written in an intersection of Python 2 and Python 3. This is easy in Cython, but somewhat ugly in Python. Logic that deals with Python or platform compatibility only lives in spacy.compat. To distinguish them from the builtin functions, replacement functions are suffixed with an underscore, e.e unicode_.

NamePython 2Python 3
compat.bytes_strbytes
compat.unicode_unicodestr
compat.basestring_basestringstr
compat.input_raw_inputinput
compat.json_dumpsujson.dumps with .decode('utf8')ujson.dumps
compat.path2strstr(path) with .decode('utf8')str(path)

compat.is_config
function

Check if a specific configuration of Python version and operating system matches the user's setup. Mostly used to display targeted error messages.

NameTypeDescription
python2boolspaCy is executed with Python 2.x.
python3boolspaCy is executed with Python 3.x.
windowsboolspaCy is executed on Windows.
linuxboolspaCy is executed on Linux.
osxboolspaCy is executed on OS X or macOS.
returnsboolWhether the specified configuration matches the user's platform.