A pipeline to build Qiime2 taxonomy classifiers for the UNITE database.
If you are interested in Fungi ππβπ« you could use their genomic fingerprint to identify them. Affordable PCR amplification and sequencing of the ITS gene gives you these nucleic acid fingerprints, and the UNITE team provides a database to gives these sequences a name.
We can predict the taxonomy of our fungal fingerprints using an old-school machine learning method: a supervised k-mer nb-classifier. But first, we need to prepare our database in a process called βtraining.β
This is a pipeline that trains the UNITE ITS taxonomy database for use with Qiime2. You can run this pipeline yourself, but you donβt have to! Iβve provided a ready to use pre-trained classifiers so you can simply run qiime feature-classifier classify-sklearn
.
If you have questions about using Qiime2, ask on the Qiime2 forums.
If you have questions about the UNITE ITS database, contact the UNITE team.
If you have questions about this pipeline, please open a new issue!
Set up:
conda
with mamba
.)Configure:
config/config.yaml
and configure it to your liking.
(For example, you may need to update the name of your Qiime2 environment.)Run:
snakemake --cores 8 --use-conda --resources mem_mb=10000
Training one classifier takes 1-9 hours on an AMD EPYC 75F3 Milan, depending on the size and complexity of the data.
Reports:
snakemake --report results/report.html
snakemake --forceall --dag --dryrun | dot -Tpdf > results/dag.pdf