Output¶
All results are written to --outdir (default: results). The directory structure is:
results/
├── antismash/
│ ├── <sample>/ # one directory per input genome
│ │ ├── <sample>.json # antiSMASH JSON (parsed by tabulation scripts)
│ │ ├── <sample>.gbk # annotated GenBank output
│ │ └── ...
│ └── ...
├── gecco/
│ ├── <sample>/ # GECCO output directory per genome
│ │ ├── *.clusters.tsv
│ │ ├── *.features.tsv
│ │ └── *.region*.gbk # generated when BiG-SCAPE or BiG-SLiCE is enabled
│ └── ...
├── deepbgc/
│ ├── <sample>/ # deepBGC output directory per genome
│ │ ├── *.bgc.tsv
│ │ ├── *.full.gbk
│ │ └── *.region*.gbk # converted to antiSMASH-like region files
│ └── ...
├── combgc/
│ ├── <sample>/
│ │ ├── combgc_summary.tsv
│ │ └── combined_regions/ # representative non-redundant region GBKs used for clustering
│ └── ...
├── bigscape/ # only when --bigscape_run
│ └── output_files/
│ ├── *.network # GCF network files (one per class + mix)
│ ├── *.tsv # cluster annotation tables
│ └── ...
├── bigslice/ # only when --bigslice_run
│ ├── result/ # SQLite database and analysis outputs
│ └── ...
├── summary/
│ ├── all_regions.tsv # per-BGC-region table
│ └── region_counts.tsv # per-genome BGC count table
└── pipeline_info/
├── execution_timeline_<timestamp>.html
├── execution_report_<timestamp>.html
├── execution_trace_<timestamp>.txt
└── pipeline_dag_<timestamp>.html
summary/all_regions.tsv¶
One row per biosynthetic region across all input genomes. Produced by TABULATE_REGIONS.
| Column | Description |
|---|---|
file |
Input genome identifier (stem of the antiSMASH JSON filename) |
record_id |
Sequence record name (e.g. NCBI accession) |
region |
Region number within the record |
start |
Start coordinate (bp, 0-based) |
end |
End coordinate (bp) |
contig_edge |
True if the region extends to the edge of the contig (potentially truncated) |
product |
BGC product class(es), separated by / for hybrids |
knownclusterblast_hit |
Top KnownClusterBlast hit name (only present when --antismash_cb_knownclusters) |
knownclusterblast_accession |
MIBiG accession of the top hit |
knownclusterblast_similarity |
Similarity category: high (>75 %), medium (>50 %), or low (>15 %) |
record_desc |
Full sequence record description |
!!! note The three knownclusterblast_* columns are only included when --antismash_cb_knownclusters is set.
summary/region_counts.tsv¶
One row per genome assembly (or contig when --count_per_contig). Produced by COUNT_REGIONS.
| Column | Description |
|---|---|
record |
Genome assembly name (or genome\|contig when --count_per_contig) |
total_count |
Total number of BGC regions regardless of class |
<bgc_type> |
Count of regions of that product type (one column per distinct type observed across all genomes) |
hybrid |
Count of multi-product regions (absent when --split_hybrids) |
description |
Sequence description; includes [N total records] suffix for multi-contig assemblies |
antismash/<sample>/¶
Raw antiSMASH output for each genome. The key file is <sample>.json, which is read by the tabulation scripts. Other
files (HTML, GenBank, SVG plots) are present depending on the antiSMASH flags used.
gecco/<sample>/¶
Raw GECCO output for each genome. When --bigscape_run or --bigslice_run is enabled, clystere also runs
gecco convert gbk --format bigslice and publishes GECCO region GenBank files compatible with clustering tools.
deepbgc/<sample>/¶
Raw deepBGC output for each genome (*.bgc.tsv and *.full.gbk). clystere also generates antiSMASH-like
*.regionNNN.gbk files using BiG-SLiCE's conversion script to ensure compatibility with BiG-SCAPE/BiG-SLiCE.
combgc/<sample>/¶
Unified per-sample BGC selection generated from antiSMASH, GECCO, and deepBGC predictions. The combined_regions/
directory contains representative region GenBank files used as clustering input.
bigscape/output_files/¶
Standard BiG-SCAPE output. The .network files are tab-separated edge lists suitable for import into Cytoscape or
Python networkx. One network is generated per BGC class plus a mix network when --bigscape_mix (default).
bigslice/¶
Standard BiG-SLiCE output generated from the unified combgc/*/combined_regions/ folders. The directory includes the
processed analysis results and SQLite-backed data used for downstream inspection.
pipeline_info/¶
Nextflow execution metadata generated per run:
| File | Description |
|---|---|
execution_timeline_*.html |
Gantt chart of process execution times |
execution_report_*.html |
Resource usage report (CPU, memory, I/O per process) |
execution_trace_*.txt |
Raw per-task resource trace (parseable TSV) |
pipeline_dag_*.html |
Directed acyclic graph of the pipeline |