spaCy v3.6 adds the new
SpanFinder component to the core
spaCy library and new trained pipelines for Slovenian.
SpanFinder component identifies potentially
overlapping, unlabeled spans by identifying span start and end tokens. It is
intended for use in combination with a component like
SpanCategorizer that may further filter or label the
spans. See our
Spancat blog post for a more
detailed introduction to the span finder.
To train a pipeline with
spancat, remember to add
span_finder (and its
transformer if required) to
[training.annotating_components] so that the
spancat component can be
trained directly from its predictions:
In practice it can be helpful to initially train the
before sourcing it (along with
tok2vec) into the
spancat pipeline for further training. Otherwise the
memory usage can spike for
spancat in the first few training steps if the
span_finder makes a large number of predictions.
- Language updates:
- Add initial support for Malay.
- Update Latin defaults to support noun chunks, update lexical/tokenizer settings and add example sentences.
spacy debug dataCLI.
spacy evaluateCLI displaCy output.
- Support custom token/lexeme attribute for vectors.
- Add option to return scores separately keyed by component name with
spacy evaluate --per-component,
Scorer.score(per_component=True). This is useful when the pipeline contains more than one of the same component like
textcatthat may have overlapping scores keys.
- Typing updates for
The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize “get” as a passive auxiliary.
The Danish pipeline
da_core_news_trf has been updated to use
performance improvements across the board.
When initializing a
SpanGroup, there is a new check to verify that all added
spans refer to the current doc. Without this check, it was possible to run into
string store or other errors.
One place this may crop up is when creating
Example objects for training with
When you’re loading a pipeline package trained with an earlier version of spaCy v3, you will see a warning telling you that the pipeline may be incompatible. This doesn’t necessarily have to be true, but we recommend running your pipelines against your test suite or evaluation data to make sure there are no unexpected results.
If you’re using one of the trained pipelines we provide, you should
spacy download to update to the latest version. To
see an overview of all installed packages and their compatibility, you can run
If you’ve trained your own custom pipeline and you’ve confirmed that it’s still
working as expected, you can update the spaCy version requirements in the
Updating v3.5 configs
To update a config from spaCy v3.5 with the new v3.6 settings, run