Usage¶

Samplesheet¶

Prepare a comma-separated samplesheet describing your input genomes:

Column	Required	Description
`sample`	Yes	Unique sample identifier (used as the output directory name)
`genome`	Yes	Path to a genome file (GenBank `.gbff`/`.gbk`, EMBL `.embl`, or FASTA `.fna`/`.fa`/`.fasta`). Gzip-compressed files are accepted.
`annotation`	No	Path to a GFF3 annotation file. When provided, antiSMASH skips its built-in gene-finding step.

samplesheet.csv

sample,genome,annotation
strain_A,data/strain_A.gbff.gz,
strain_B,data/strain_B.fna,
strain_C,data/strain_C.fna.gz,data/strain_C.gff3

Basic run¶

nextflow run exterex/clystere \
    --input samplesheet.csv \
    --outdir results \
    -profile docker

By default, clystere runs antiSMASH, GECCO, and deepBGC for every sample. To disable an individual predictor:

--gecco_run false
--deepbgc_run false

Profile combinations¶

Profiles are combined with commas. The first group selects the software environment; the second selects the execution backend.

# Singularity on a local machine
-profile singularity

# Docker on SLURM (elevated resource ceiling)
-profile docker,slurm

# Conda on an HPC without a scheduler integration
-profile conda,hpc

Profile	Type	Description
`docker`	Container	Run all processes in Docker containers
`apptainer/singularity`	Container	Run all processes in Apptainer images
`podman`	Container	Run all processes in Podman containers
`conda`	Environment	Use per-process Conda environments
`hpc`	Resource	Raise default CPU/memory/time ceilings for cluster nodes
`slurm`	Executor	Submit jobs to a SLURM scheduler (combine with `hpc`)
`test`	Test	Preset params for the bundled example data

Enabling optional analyses¶

BiG-SCAPEBiG-SLiCE

nextflow run exterex/clystere \
    --input samplesheet.csv \
    --outdir results \
    --bigscape_run \
    -profile docker

nextflow run exterex/clystere \
    --input samplesheet.csv \
    --outdir results \
    --bigslice_run \
    -profile docker

--bigscape_run and --bigslice_run are mutually exclusive.

When --bigscape_run is enabled, clystere runs bigscape dereplicate by default before clustering to collapse redundant regions found by multiple predictors. You can disable or tune this behavior with:

--bigscape_dereplicate false
--bigscape_dereplicate_cutoff 0.8

On the first BiG-SLiCE task execution, the pipeline downloads the BiG-SLiCE HMM model bundle into the task work directory and reuses it via -resume. Ensure outbound network access is available for this initial download.

KnownClusterBlast annotation¶

Adds knownclusterblast_hit, knownclusterblast_accession, and knownclusterblast_similarity columns to all_regions.tsv.

--antismash_cb_knownclusters

Per-contig BGC counts¶

--count_per_contig        # one row per contig instead of per assembly
--split_hybrids           # count each BGC type in a hybrid region individually

Disabling minimal mode¶

antiSMASH's minimal mode is enabled by default. To run the full detection suite, disable it and selectively enable modules:

--antismash_minimal false \
--antismash_enable_nrps_pks \
--antismash_enable_terpene \
--antismash_clusterhmmer

Resuming runs¶

Nextflow caches intermediate results in the work/ directory. Use -resume to skip already completed steps after a pipeline modification or failure:

nextflow run exterex/clystere --input samplesheet.csv --outdir results -profile docker -resume

Reusing existing antiSMASH results¶

If antiSMASH has already been run, set --antismash_reuse_results to skip antiSMASH re-annotation and go directly to tabulation:

--antismash_reuse_results   # skip antiSMASH; use results already in --outdir/antismash/

For BiG-SCAPE or BiG-SLiCE, GECCO and deepBGC also need to run (or their results must already exist in the expected published layout) because clustering now uses the unified comBGC-filtered region set.