scikit

Command Line Interface
Download, train and package models, and debug spaCy.

As of v1.7.0, spaCy comes with new command line helpers to download and link models and show useful debugging information. For a list of available commands, type spacy --help.

Download

Download models for spaCy. The downloader finds the best-matching compatible version, uses pip to download the model as a package and automatically creates a shortcut link to load the model by name. Direct downloads don't perform any compatibility checks and require the model name to be specified with its version (e.g., en_core_web_sm-1.2.0).

python -m spacy download [model] [--direct]
ArgumentTypeDescription
modelpositionalModel name or shortcut (en, de, vectors).
--direct, -dflagForce direct download of exact model version.
--help, -hflagShow help message and available arguments.
createsdirectory, symlink The installed model package in your site-packages directory and a shortcut link as a symlink in spacy/data.

Create a shortcut link for a model, either a Python package or a local directory. This will let you load models from any location using a custom name via spacy.load() .

python -m spacy link [origin] [link_name] [--force]
ArgumentTypeDescription
originpositionalModel name if package, or path to local directory.
link_namepositionalName of the shortcut link to create.
--force, -fflagForce overwriting of existing link.
--help, -hflagShow help message and available arguments.
createssymlink A shortcut link of the given name as a symlink in spacy/data.

Info

Print information about your spaCy installation, models and local setup, and generate Markdown-formatted markup to copy-paste into GitHub issues.

python -m spacy info [--markdown]
python -m spacy info [model] [--markdown]
ArgumentTypeDescription
modelpositionalA model, i.e. shortcut link, package name or path (optional).
--markdown, -mdflagPrint information as Markdown.
--help, -hflagShow help message and available arguments.
printsstdoutInformation about your spaCy installation.

Validate
v2.0 This feature is new and was introduced in spaCy v2.0

Find all models installed in the current environment (both packages and shortcut links) and check whether they are compatible with the currently installed version of spaCy. Should be run after upgrading spaCy via pip install -U spacy to ensure that all installed models are can be used with the new version. The command is also useful to detect out-of-sync model links resulting from links created in different virtual environments. Prints a list of models, the installed versions, the latest compatible version (if out of date) and the commands for updating.

python -m spacy validate
ArgumentTypeDescription
printsstdoutDetails about the compatibility of your installed models.

Convert

Convert files into spaCy's JSON format for use with the train command and other experiment management functions. The converter can be specified on the command line, or chosen based on the file extension of the input file.

python -m spacy convert [input_file] [output_dir] [--converter] [--n-sents]
[--morphology]
ArgumentTypeDescription
input_filepositionalInput file.
output_dirpositionalOutput directory for converted JSON file.
converter, -coption
v2.0 This feature is new and was introduced in spaCy v2.0
Name of converter to use (see below).
--n-sents, -noptionNumber of sentences per document.
--morphology, -moptionEnable appending morphology to tags.
--help, -hflagShow help message and available arguments.
createsJSONData in spaCy's JSON format.

The following converters are available:

IDDescription
autoAutomatically pick converter based on file extension (default).
conllu, conllUniversal Dependencies .conllu or .conll format.
nerTab-based named entity recognition format.
iobIOB named entity recognition format.

Train

Train a model. Expects data in spaCy's JSON format. On each epoch, a model will be saved out to the directory. Accuracy scores and model details will be added to a meta.json to allow packaging the model using the package command.

python -m spacy train [lang] [output_dir] [train_data] [dev_data] [--n-iter]
[--n-sents] [--use-gpu] [--meta-path] [--vectors] [--no-tagger] [--no-parser]
[--no-entities] [--gold-preproc]
ArgumentTypeDescription
langpositionalModel language.
output_dirpositionalDirectory to store model in.
train_datapositionalLocation of JSON-formatted training data.
dev_datapositionalLocation of JSON-formatted development data for evaluation.
--n-iter, -noptionNumber of iterations (default: 20).
--n-sents, -nsoptionNumber of sentences (default: 0).
--use-gpu, -goptionUse GPU.
--vectors, -voptionModel to load vectors from.
--meta-path, -moption
v2.0 This feature is new and was introduced in spaCy v2.0
Optional path to model meta.json. All relevant properties like lang, pipeline and spacy_version will be overwritten.
--version, -Voption Model version. Will be written out to the model's meta.json after training.
--no-tagger, -TflagDon't train tagger.
--no-parser, -PflagDon't train parser.
--no-entities, -NflagDon't train NER.
--gold-preproc, -GflagUse gold preprocessing.
--help, -hflagShow help message and available arguments.
createsmodel, pickleA spaCy model on each epoch, and a final .pickle file.

Environment variables for hyperparameters
v2.0 This feature is new and was introduced in spaCy v2.0

spaCy lets you set hyperparameters for training via environment variables. This is useful, because it keeps the command simple and allows you to create an alias for your custom train command while still being able to easily tweak the hyperparameters. For example:

parser_hidden_depth=2 parser_maxout_pieces=1 train-parser
NameDescriptionDefault
dropout_fromInitial dropout rate.0.2
dropout_toFinal dropout rate.0.2
dropout_decayRate of dropout change.0.0
batch_fromInitial batch size.1
batch_toFinal batch size.64
batch_compoundRate of batch size acceleration.1.001
token_vector_widthWidth of embedding tables and convolutional layers.128
embed_sizeNumber of rows in embedding tables.7500
hidden_widthSize of the parser's and NER's hidden layers.128
learn_rateLearning rate.0.001
optimizer_B1Momentum for the Adam solver.0.9
optimizer_B2Adagrad-momentum for the Adam solver.0.999
optimizer_epsEpsylon value for the Adam solver.1e-08
L2_penaltyL2 regularisation penalty.1e-06
grad_norm_clipGradient L2 norm constraint.1.0

Vocab
v2.0 This feature is new and was introduced in spaCy v2.0

Compile a vocabulary from a lexicon JSONL file and optional word vectors. Will save out a valid spaCy model that you can load via spacy.load or package using the package command.

python -m spacy vocab [lang] [output_dir] [lexemes_loc] [vectors_loc]
ArgumentTypeDescription
langpositional Model language ISO code, e.g. en.
output_dirpositionalModel output directory. Will be created if it doesn't exist.
lexemes_locpositional Location of lexical data in spaCy's JSONL format.
vectors_locpositionalOptional location of vectors data as numpy .npz file.
createsmodelA spaCy model containing the vocab and vectors.

Evaluate
v2.0 This feature is new and was introduced in spaCy v2.0

Evaluate a model's accuracy and speed on JSON-formatted annotated data. Will print the results and optionally export displaCy visualizations of a sample set of parses to .html files. Visualizations for the dependency parse and NER will be exported as separate files if the respective component is present in the model's pipeline.

python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit] [--gpu-id] [--gold-preproc]
ArgumentTypeDescription
modelpositional Model to evaluate. Can be a package or shortcut link name, or a path to a model data directory.
data_pathpositionalLocation of JSON-formatted evaluation data.
--displacy-path, -dpoption Directory to output rendered parses as HTML. If not set, no visualizations will be generated.
--displacy-limit, -dloption Number of parses to generate per file. Defaults to 25. Keep in mind that a significantly higher number might cause the .html files to render slowly.
--gpu-id, -goptionGPU to use, if any. Defaults to -1 for CPU.
--gold-preproc, -GflagUse gold preprocessing.
prints / createsstdout, HTMLTraining results and optional displaCy visualizations.

Package

Generate a model Python package from an existing model data directory. All data files are copied over. If the path to a meta.json is supplied, or a meta.json is found in the input directory, this file is used. Otherwise, the data can be entered directly from the command line. After packaging, you can run python setup.py sdist from the newly created directory to turn your model into an installable archive file.

python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--force]
ArgumentTypeDescription
input_dirpositionalPath to directory containing model data.
output_dirpositionalDirectory to create package folder in.
--meta-path, -moption
v2.0 This feature is new and was introduced in spaCy v2.0
Path to meta.json file (optional).
--create-meta, -cflag
v2.0 This feature is new and was introduced in spaCy v2.0
Create a meta.json file on the command line, even if one already exists in the directory. If an existing file is found, its entries will be shown as the defaults in the command line prompt.
--force, -fflagForce overwriting of existing folder in output directory.
--help, -hflagShow help message and available arguments.
createsdirectoryA Python package containing the spaCy model.