scikit

Training spaCy’s Statistical Models

This guide describes how to train new statistical models for spaCy's part-of-speech tagger, named entity recognizer and dependency parser. Once the model is trained, you can then save and load it.

Training basics

spaCy's models are statistical and every "decision" they make – for example, which part-of-speech tag to assign, or whether a word is a named entity – is a prediction. This prediction is based on the examples the model has seen during training. To train a model, you first need training data – examples of text, and the labels you want the model to predict. This could be a part-of-speech tag, a named entity or any other information.

The model is then shown the unlabelled text and will make a prediction. Because we know the correct answer, we can give the model feedback on its prediction in the form of an error gradient of the loss function that calculates the difference between the training example and the expected output. The greater the difference, the more significant the gradient and the updates to our model.

PREDICT SAVE Model Training data label label Updated Model text GRADIENT

When training a model, we don't just want it to memorise our examples – we want it to come up with theory that can be generalised across other examples. After all, we don't just want the model to learn that this one instance of "Amazon" right here is a company – we want it to learn that "Amazon", in contexts like this, is most likely a company. That's why the training data should always be representative of the data we want to process. A model trained on Wikipedia, where sentences in the first person are extremely rare, will likely perform badly on Twitter. Similarly, a model trained on romantic novels will likely perform badly on legal text.

This also means that in order to know how the model is performing, and whether it's learning the right things, you don't only need training data – you'll also need evaluation data. If you only test the model with the data it was trained on, you'll have no idea how well it's generalising. If you want to train a model from scratch, you usually need at least a few hundred examples for both training and evaluation. To update an existing model, you can already achieve decent results with very few examples – as long as they're representative.

How do I get training data?

Collecting training data may sound incredibly painful – and it can be, if you're planning a large-scale annotation project. However, if your main goal is to update an existing model's predictions – for example, spaCy's named entity recognition – the hard is part usually not creating the actual annotations. It's finding representative examples and extracting potential candidates. The good news is, if you've been noticing bad performance on your data, you likely already have some relevant text, and you can use spaCy to bootstrap a first set of training examples. For example, after processing a few sentences, you may end up with the following entities, some correct, some incorrect.

TextEntityStartEndLabel
Uber blew through $1 million a weekUber04ORGright
Android Pay expands to CanadaAndroid07PERSONwrong
Android Pay expands to CanadaCanada2330GPEright
Spotify steps up Asia expansionSpotify08ORGright
Spotify steps up Asia expansionAsia1721NORPwrong

Alternatively, the rule-based matcher can be a useful tool to extract tokens or combinations of tokens, as well as their start and end index in a document. In this case, we'll extract mentions of Google and assume they're an ORG.

TextEntityStartEndLabel
let me google this for yougoogle713ORGwrong
Google Maps launches location sharingGoogle06ORGwrong
Google rebrands its business appsGoogle06ORGright
look what i found on google! 😂google2127ORGwrong

Based on the few examples above, you can already create six training sentences with eight entities in total. Of course, what you consider a "correct annotation" will always depend on what you want the model to learn. While there are some entity annotations that are more or less universally correct – like Canada being a geopolitical entity – your application may have its very own definition of the NER annotation scheme.

train_data = [
    ("Uber blew through $1 million a week", [(0, 4, 'ORG')]),
    ("Android Pay expands to Canada", [(0, 11, 'PRODUCT'), (23, 30, 'GPE')]),
    ("Spotify steps up Asia expansion", [(0, 8, "ORG"), (17, 21, "LOC")]),
    ("Google Maps launches location sharing", [(0, 11, "PRODUCT")]),
    ("Google rebrands its business apps", [(0, 6, "ORG")]),
    ("look what i found on google! 😂", [(21, 27, "PRODUCT")])]

Training with annotations

The GoldParse object collects the annotated training examples, also called the gold standard. It's initialised with the Doc object it refers to, and keyword arguments specifying the annotations, like tags or entities. Its job is to encode the annotations, keep them aligned and create the C-level data structures required for efficient access. Here's an example of a simple GoldParse for part-of-speech tags:

vocab = Vocab(tag_map={'N': {'pos': 'NOUN'}, 'V': {'pos': 'VERB'}})
doc = Doc(vocab, words=['I', 'like', 'stuff'])
gold = GoldParse(doc, tags=['N', 'V', 'N'])

Using the Doc and its gold-standard annotations, the model can be updated to learn a sentence of three words with their assigned part-of-speech tags. The tag map is part of the vocabulary and defines the annotation scheme. If you're training a new language model, this will let you map the tags present in the treebank you train on to spaCy's tag scheme.

doc = Doc(Vocab(), words=['Facebook', 'released', 'React', 'in', '2014'])
gold = GoldParse(doc, entities=['U-ORG', 'O', 'U-TECHNOLOGY', 'O', 'U-DATE'])

The same goes for named entities. The letters added before the labels refer to the tags of the BILUO schemeO is a token outside an entity, U an single entity unit, B the beginning of an entity, I a token inside an entity and L the last token of an entity.

Training data label text Doc GoldParse update nlp optimizer

Of course, it's not enough to only show a model a single example once. Especially if you only have few examples, you'll want to train for a number of iterations. At each iteration, the training data is shuffled to ensure the model doesn't make any generalisations based on the order of examples. Another technique to improve the learning results is to set a dropout rate, a rate at which to randomly "drop" individual features and representations. This makes it harder for the model to memorise the training data. For example, a 0.25 dropout means that each feature or internal representation has a 1/4 likelihood of being dropped.

Example training loop

optimizer = nlp.begin_training(get_data) for itn in range(100): random.shuffle(train_data) for raw_text, entity_offsets in train_data: doc = nlp.make_doc(raw_text) gold = GoldParse(doc, entities=entity_offsets) nlp.update([doc], [gold], drop=0.5, sgd=optimizer) nlp.to_disk('/model')

The nlp.update method takes the following arguments:

NameDescription
docs Doc objects. The update method takes a sequence of them, so you can batch up your training examples. Alternatively, you can also pass in a sequence of raw texts.
golds GoldParse objects. The update method takes a sequence of them, so you can batch up your training examples. Alternatively, you can also pass in a dictionary containing the annotations.
drop Dropout rate. Makes it harder for the model to just memorise the data.
sgd An optimizer, i.e. a callable to update the model's weights. If not set, spaCy will create a new one and save it for further use.

Instead of writing your own training loop, you can also use the built-in train command, which expects data in spaCy's JSON format. On each epoch, a model will be saved out to the directory. After training, you can use the package command to generate an installable Python package from your model.

python -m spacy convert /tmp/train.conllu /tmp/data
python -m spacy train en /tmp/model /tmp/data/train.json -n 5

Simple training style
v2.0 This feature is new and was introduced in spaCy v2.0

Instead of sequences of Doc and GoldParse objects, you can also use the "simple training style" and pass raw texts and dictionaries of annotations to nlp.update . The dictionaries can have the keys entities, heads, deps, tags and cats. This is generally recommended, as it removes one layer of abstraction, and avoids unnecessary imports. It also makes it easier to structure and load your training data.

Simple training loop

TRAIN_DATA = [ ("Uber blew through $1 million a week", {'entities': [(0, 4, 'ORG')]}), ("Google rebrands its business apps", {'entities': [(0, 6, "ORG")]})] nlp = spacy.blank('en') optimizer = nlp.begin_training() for i in range(20): random.shuffle(TRAIN_DATA) for text, annotations in TRAIN_DATA: nlp.update([text], [annotations], sgd=optimizer) nlp.to_disk('/model')

The above training loop leaves out a few details that can really improve accuracy – but the principle really is that simple. Once you've got your pipeline together and you want to tune the accuracy, you usually want to process your training examples in batches, and experiment with minibatch sizes and dropout rates, set via the drop keyword argument. See the Language and Pipe API docs for available options.

Training the named entity recognizer

All spaCy models support online learning, so you can update a pre-trained model with new examples. You'll usually need to provide many examples to meaningfully improve the system — a few hundred is a good start, although more is better.

You should avoid iterating over the same few examples multiple times, or the model is likely to "forget" how to annotate other examples. If you iterate over the same few examples, you're effectively changing the loss function. The optimizer will find a way to minimize the loss on your examples, without regard for the consequences on the examples it's no longer paying attention to. One way to avoid this "catastrophic forgetting" problem is to "remind" the model of other examples by augmenting your annotations with sentences annotated with entities automatically recognised by the original model. Ultimately, this is an empirical process: you'll need to experiment on your data to find a solution that works best for you.

Updating the Named Entity Recognizer

This example shows how to update spaCy's entity recognizer with your own examples, starting off with an existing, pre-trained model, or from scratch using a blank Language class. To do this, you'll need example texts and the character offsets and labels of each entity contained in the texts.

Can't fetch code example from GitHub :(

Please use the link below to view the example. If you've come across
a broken link, we always appreciate a pull request to the repository,
or a report on the issue tracker. Thanks!

Step by step guide

  1. Load the model you want to start with, or create an empty model using spacy.blank with the ID of your language. If you're using a blank model, don't forget to add the entity recognizer to the pipeline. If you're using an existing model, make sure to disable all other pipeline components during training using nlp.disable_pipes . This way, you'll only be training the entity recognizer.
  2. Shuffle and loop over the examples. For each example, update the model by calling nlp.update , which steps through the words of the input. At each word, it makes a prediction. It then consults the annotations to see whether it was right. If it was wrong, it adjusts its weights so that the correct action will score higher next time.
  3. Save the trained model using nlp.to_disk .
  4. Test the model to make sure the entities in the training data are recognised correctly.

Training an additional entity type

This script shows how to add a new entity type ANIMAL to an existing pre-trained NER model, or an empty Language class. To keep the example short and simple, only a few sentences are provided as examples. In practice, you'll need many more — a few hundred would be a good start. You will also likely need to mix in examples of other entity types, which might be obtained by running the entity recognizer over unlabelled sentences, and adding their annotations to the training set.

Can't fetch code example from GitHub :(

Please use the link below to view the example. If you've come across
a broken link, we always appreciate a pull request to the repository,
or a report on the issue tracker. Thanks!

Step by step guide

  1. Load the model you want to start with, or create an empty model using spacy.blank with the ID of your language. If you're using a blank model, don't forget to add the entity recognizer to the pipeline. If you're using an existing model, make sure to disable all other pipeline components during training using nlp.disable_pipes . This way, you'll only be training the entity recognizer.
  2. Add the new entity label to the entity recognizer using the add_label method. You can access the entity recognizer in the pipeline via nlp.get_pipe('ner').
  3. Loop over the examples and call nlp.update , which steps through the words of the input. At each word, it makes a prediction. It then consults the annotations, to see whether it was right. If it was wrong, it adjusts its weights so that the correct action will score higher next time.
  4. Save the trained model using nlp.to_disk .
  5. Test the model to make sure the new entity is recognised correctly.

Training the tagger and parser

Updating the Dependency Parser

This example shows how to train spaCy's dependency parser, starting off with an existing model or a blank model. You'll need a set of training examples and the respective heads and dependency label for each token of the example texts.

Can't fetch code example from GitHub :(

Please use the link below to view the example. If you've come across
a broken link, we always appreciate a pull request to the repository,
or a report on the issue tracker. Thanks!

Step by step guide

  1. Load the model you want to start with, or create an empty model using spacy.blank with the ID of your language. If you're using a blank model, don't forget to add the parser to the pipeline. If you're using an existing model, make sure to disable all other pipeline components during training using nlp.disable_pipes . This way, you'll only be training the parser.
  2. Add the dependency labels to the parser using the add_label method. If you're starting off with a pre-trained spaCy model, this is usually not necessary – but it doesn't hurt either, just to be safe.
  3. Shuffle and loop over the examples. For each example, update the model by calling nlp.update , which steps through the words of the input. At each word, it makes a prediction. It then consults the annotations to see whether it was right. If it was wrong, it adjusts its weights so that the correct action will score higher next time.
  4. Save the trained model using nlp.to_disk .
  5. Test the model to make sure the parser works as expected.

Updating the Part-of-speech Tagger

In this example, we're training spaCy's part-of-speech tagger with a custom tag map. We start off with a blank Language class, update its defaults with our custom tags and then train the tagger. You'll need a set of training examples and the respective custom tags, as well as a dictionary mapping those tags to the Universal Dependencies scheme.

Can't fetch code example from GitHub :(

Please use the link below to view the example. If you've come across
a broken link, we always appreciate a pull request to the repository,
or a report on the issue tracker. Thanks!

Step by step guide

  1. Load the model you want to start with, or create an empty model using spacy.blank with the ID of your language. If you're using a blank model, don't forget to add the tagger to the pipeline. If you're using an existing model, make sure to disable all other pipeline components during training using nlp.disable_pipes . This way, you'll only be training the tagger.
  2. Add the tag map to the tagger using the add_label method. The first argument is the new tag name, the second the mapping to spaCy's coarse-grained tags, e.g. {'pos': 'NOUN'}.
  3. Shuffle and loop over the examples. For each example, update the model by calling nlp.update , which steps through the words of the input. At each word, it makes a prediction. It then consults the annotations to see whether it was right. If it was wrong, it adjusts its weights so that the correct action will score higher next time.
  4. Save the trained model using nlp.to_disk .
  5. Test the model to make sure the parser works as expected.

Training a parser for custom semantics

spaCy's parser component can be used to trained to predict any type of tree structure over your input text – including semantic relations that are not syntactic dependencies. This can be useful to for conversational applications, which need topredict trees over whole documents or chat logs, with connections between the sentence roots used to annotate discourse structure. For example, you can train spaCy's parser to label intents and their targets, like attributes, quality, time and locations. The result could look like this:

doc = nlp(u"find a hotel with good wifi")
print([(t.text, t.dep_, t.head.text) for t in doc if t.dep_ != '-'])
# [('find', 'ROOT', 'find'), ('hotel', 'PLACE', 'find'),
#  ('good', 'QUALITY', 'wifi'), ('wifi', 'ATTRIBUTE', 'hotel')]

The above tree attaches "wifi" to "hotel" and assigns the dependency label ATTRIBUTE. This may not be a correct syntactic dependency – but in this case, it expresses exactly what we need: the user is looking for a hotel with the attribute "wifi" of the quality "good". This query can then be processed by your application and used to trigger the respective action – e.g. search the database for hotels with high ratings for their wifi offerings.

The following example example shows a full implementation of a training loop for a custom message parser for a common "chat intent": finding local businesses. Our message semantics will have the following types of relations: ROOT, PLACE, QUALITY, ATTRIBUTE, TIME and LOCATION.

Can't fetch code example from GitHub :(

Please use the link below to view the example. If you've come across
a broken link, we always appreciate a pull request to the repository,
or a report on the issue tracker. Thanks!

Step by step guide

  1. Create the training data consisting of words, their heads and their dependency labels in order. A token's head is the index of the token it is attached to. The heads don't need to be syntactically correct – they should express the semantic relations you want the parser to learn. For words that shouldn't receive a label, you can choose an arbitrary placeholder, for example -.
  2. Load the model you want to start with, or create an empty model using spacy.blank with the ID of your language. If you're using a blank model, don't forget to add the custom parser to the pipeline. If you're using an existing model, make sure to remove the old parser from the pipeline, and disable all other pipeline components during training using nlp.disable_pipes . This way, you'll only be training the parser.
  3. Add the dependency labels to the parser using the add_label method.
  4. Shuffle and loop over the examples. For each example, update the model by calling nlp.update , which steps through the words of the input. At each word, it makes a prediction. It then consults the annotations to see whether it was right. If it was wrong, it adjusts its weights so that the correct action will score higher next time.
  5. Save the trained model using nlp.to_disk .
  6. Test the model to make sure the parser works as expected.

Training a text classification model

Adding a text classifier to a spaCy model
v2.0 This feature is new and was introduced in spaCy v2.0

This example shows how to train a multi-label convolutional neural network text classifier on IMDB movie reviews, using spaCy's new TextCategorizer component. The dataset will be loaded automatically via Thinc's built-in dataset loader. Predictions are available via Doc.cats .

Can't fetch code example from GitHub :(

Please use the link below to view the example. If you've come across
a broken link, we always appreciate a pull request to the repository,
or a report on the issue tracker. Thanks!

Step by step guide

  1. Load the model you want to start with, or create an empty model using spacy.blank with the ID of your language. If you're using an existing model, make sure to disable all other pipeline components during training using nlp.disable_pipes . This way, you'll only be training the text classifier.
  2. Add the text classifier to the pipeline, and add the labels you want to train – for example, POSITIVE.
  3. Load and pre-process the dataset, shuffle the data and split off a part of it to hold back for evaluation. This way, you'll be able to see results on each training iteration.
  4. Loop over the training examples and partition them into batches using spaCy's minibatch and compounding helpers.
  5. Update the model by calling nlp.update , which steps through the examples and makes a prediction. It then consults the annotations to see whether it was right. If it was wrong, it adjusts its weights so that the correct prediction will score higher next time.
  6. Optionally, you can also evaluate the text classifier on each iteration, by checking how it performs on the development data held back from the dataset. This lets you print the precision, recall and F-score.
  7. Save the trained model using nlp.to_disk .
  8. Test the model to make sure the text classifier works as expected.

Optimization tips and advice

There are lots of conflicting "recipes" for training deep neural networks at the moment. The cutting-edge models take a very long time to train, so most researchers can't run enough experiments to figure out what's really going on. For what it's worth, here's a recipe seems to work well on a lot of NLP problems:

  1. Initialise with batch size 1, and compound to a maximum determined by your data size and problem type.
  2. Use Adam solver with fixed learning rate.
  3. Use averaged parameters
  4. Use L2 regularization.
  5. Clip gradients by L2 norm to 1.
  6. On small data sizes, start at a high dropout rate, with linear decay.

This recipe has been cobbled together experimentally. Here's why the various elements of the recipe made enough sense to try initially, and what you might try changing, depending on your problem.

Compounding batch size

The trick of increasing the batch size is starting to become quite popular (see Smith et al., 2017). Their recipe is quite different from how spaCy's models are being trained, but there are some similarities. In training the various spaCy models, we haven't found much advantage from decaying the learning rate – but starting with a low batch size has definitely helped. You should try it out on your data, and see how you go. Here's our current strategy:

Batch heuristic

def get_batches(train_data, model_type): max_batch_sizes = {'tagger': 32, 'parser': 16, 'ner': 16, 'textcat': 64} max_batch_size = max_batch_sizes[model_type] if len(train_data) < 1000: max_batch_size /= 2 if len(train_data) < 500: max_batch_size /= 2 batch_size = compounding(1, max_batch_size, 1.001) batches = minibatch(train_data, size=batch_size) return batches

This will set the batch size to start at 1, and increase each batch until it reaches a maximum size. The tagger, parser and entity recognizer all take whole sentences as input, so they're learning a lot of labels in a single example. You therefore need smaller batches for them. The batch size for the text categorizer should be somewhat larger, especially if your documents are long.

Learning rate, regularization and gradient clipping

By default spaCy uses the Adam solver, with default settings (learning rate 0.001, beta1=0.9, beta2=0.999). Some researchers have said they found these settings terrible on their problems – but they've always performed very well in training spaCy's models, in combination with the rest of our recipe. You can change these settings directly, by modifying the corresponding attributes on the optimizer object. You can also set environment variables, to adjust the defaults.

There are two other key hyper-parameters of the solver: L2 regularization, and gradient clipping (max_grad_norm). Gradient clipping is a hack that's not discussed often, but everybody seems to be using. It's quite important in helping to ensure the network doesn't diverge, which is a fancy way of saying "fall over during training". The effect is sort of similar to setting the learning rate low. It can also compensate for a large batch size (this is a good example of how the choices of all these hyper-parameters intersect).

Dropout rate

For small datasets, it's useful to set a high dropout rate at first, and decay it down towards a more reasonable value. This helps avoid the network immediately overfitting, while still encouraging it to learn some of the more interesting things in your data. spaCy comes with a decaying utility function to facilitate this. You might try setting:

from spacy.util import decaying
dropout = decaying(0.6, 0.2, 1e-4)

You can then draw values from the iterator with next(dropout), which you would pass to the drop keyword argument of nlp.update . It's pretty much always a good idea to use at least some dropout. All of the models currently use Bernoulli dropout, for no particularly principled reason – we just haven't experimented with another scheme like Gaussian dropout yet.

Parameter averaging

The last part of our optimisation recipe is parameter averaging, an old trick introduced by Freund and Schapire (1999), popularised in the NLP community by Collins (2002), and explained in more detail by Leon Bottou. Just about the only other people who seem to be using this for neural network training are the SyntaxNet team (one of whom is Michael Collins) – but it really seems to work great on every problem.

The trick is to store the moving average of the weights during training. We don't optimise this average – we just track it. Then when we want to actually use the model, we use the averages, not the most recent value. In spaCy (and Thinc) this is done by using a context manager, use_params , to temporarily replace the weights:

with nlp.use_params(optimizer.averages):
    nlp.to_disk('/model')

The context manager is handy because you naturally want to evaluate and save the model at various points during training (e.g. after each epoch). After evaluating and saving, the context manager will exit and the weights will be restored, so you resume training from the most recent value, rather than the average. By evaluating the model after each epoch, you can remove one hyper-parameter from consideration (the number of epochs). Having one less magic number to guess is extremely nice – so having the averaging under a context manager is very convenient.

Transfer learning

Finally, if you're training from a small data set, it's very useful to start off with some knowledge already in the model. Word vectors are an easy and reliable way to do that, but depending on the application, you may also be able to start with useful knowledge from one of spaCy's pre-trained models, such as the parser, entity recogniser and tagger. If you're adapting a pre-trained model and you want it to retain accuracy on the tasks it was originally trained for, you should consider the "catastrophic forgetting" problem. See this blog post to read more about the problem and our suggested solution, pseudo-rehearsal.

Saving and loading models

After training your model, you'll usually want to save its state, and load it back later. You can do this with the Language.to_disk() method:

nlp.to_disk('/home/me/data/en_example_model')

The directory will be created if it doesn't exist, and the whole pipeline will be written out. To make the model more convenient to deploy, we recommend wrapping it as a Python package.

Generating a model package

spaCy comes with a handy CLI command that will create all required files, and walk you through generating the meta data. You can also create the meta.json manually and place it in the model data directory, or supply a path to it using the --meta flag. For more info on this, see the package docs.

python -m spacy package /home/me/data/en_example_model /home/me/my_models

This command will create a model package directory that should look like this:

Directory structure

└── / ├── MANIFEST.in # to include meta.json ├── meta.json # model meta data ├── setup.py # setup file for pip installation └── en_example_model # model directory ├── __init__.py # init for pip installation └── en_example_model-1.0.0 # model data

You can also find templates for all files on GitHub . If you're creating the package manually, keep in mind that the directories need to be named according to the naming conventions of lang_name and lang_name-version.

Customising the model setup

The meta.json includes the model details, like name, requirements and license, and lets you customise how the model should be initialised and loaded. You can define the language data to be loaded and the processing pipeline to execute.

SettingTypeDescription
langunicodeID of the language class to initialise.
pipelinelist A list of strings mapping to the IDs of pipeline factories to apply in that order. If not set, spaCy's default pipeline will be used.

The load() method that comes with our model package templates will take care of putting all this together and returning a Language object with the loaded pipeline and data. If your model requires custom pipeline components or a custom language class, you can also ship the code with your model. For examples of this, check out the implementations of spaCy's load_model_from_init_py and load_model_from_path utility functions.

Building the model package

To build the package, run the following command from within the directory. For more information on building Python packages, see the docs on Python's Setuptools.

python setup.py sdist

This will create a .tar.gz archive in a directory /dist. The model can be installed by pointing pip to the path of the archive:

pip install /path/to/en_example_model-1.0.0.tar.gz

You can then load the model via its name, en_example_model, or import it directly as a module and then call its load() method.

Loading a custom model package

To load a model from a data directory, you can use spacy.load() with the local path. This will look for a meta.json in the directory and use the lang and pipeline settings to initialise a Language class with a processing pipeline and load in the model data.

nlp = spacy.load('/path/to/model')

If you want to load only the binary data, you'll have to create a Language class and call from_disk instead.

nlp = spacy.blank('en').from_disk('/path/to/data')

Example: How we're training and packaging models for spaCy

Publishing a new version of spaCy often means re-training all available models – currently, that's 13 models for 8 languages. To make this run smoothly, we're using an automated build process and a spacy train template that looks like this:

python -m spacy train {lang} {models_dir}/{name} {train_data} {dev_data} -m meta/{name}.json -V {version} -g {gpu_id} -n {n_epoch} -ns {n_sents}

In a directory meta, we keep meta.json templates for the individual models, containing all relevant information that doesn't change across versions, like the name, description, author info and training data sources. When we train the model, we pass in the file to the meta template as the --meta argument, and specify the current model version as the --version argument.

On each epoch, the model is saved out with a meta.json using our template and added properties, like the pipeline, accuracy scores and the spacy_version used to train the model. After training completion, the best model is selected automatically and packaged using the package command. Since a full meta file is already present on the trained model, no further setup is required to build a valid model package.

python -m spacy package -f {best_model} dist/
cd dist/{model_name}
python setup.py sdist

This process allows us to quickly trigger the model training and build process for all available models and languages, and generate the correct meta data automatically.