Frequently asked questions

General

When should I use the gene-level pipeline vs. the variant-level pipeline?

Use the gene-level pipeline (interpret_phase + make_pseudo_vcf) when you want to:

Use the variant-level pipeline (recode) when you want to:


What is the difference between dominance and recessive mode?


Can I use arcade with unphased data?

Yes. Use interpret_phase --unphased for the gene-level pipeline. In unphased mode, the tool performs a het/hom burden collapse without distinguishing compound heterozygotes from in cis pairs. The variant-level pipeline (recode) works with both phased and unphased VCFs.


What is the difference between --scale-per-variant, --scale-globally, and --scale-by-group?

These options control how the non-additive dosages are scaled in recode:


Input / Output

What VCF fields does arcade use?


Why does the non-additive VCF have fewer variants than the additive VCF?

The recode tool applies filters (--min-hom-count, --min-het-count, --max-maf) that remove variants with insufficient genotype counts. Non-additive encoding requires both heterozygous and homozygous individuals. Variants failing --min-het-count (default: 1) or --min-hom-count (default: 1) are excluded.


What format does interpret_phase expect for input genotypes?

The genotype file should be produced by bcftools query:

bcftools query -i'GT="alt"' \
    -f'[%SAMPLE %CHROM:%POS:%REF:%ALT %GT\n]' \
    input.vcf.gz | gzip > genotypes.txt.gz

This produces space-separated lines: SAMPLE_ID CHROM:POS:REF:ALT GENOTYPE


GWAS integration

How do I run a joint additive + non-additive test?

Produce both additive and non-additive VCFs, then include both as predictors in your association model. For example:

The orthogonalized non-additive encoding ensures that the additive and non-additive components are independent.


Should I use --scale-per-variant or --scale-globally for REGENIE?

For variant-level testing with REGENIE, use --scale-per-variant. This scales each variant's non-additive dosage independently, which is appropriate when each variant is tested separately.

For set-based analysis, use --scale-globally so that dosages are comparable across variants within a set.


Docker

How do I run arcade via Docker?

docker pull fhlassen/arcade:latest

# Run any tool
docker run -v $PWD:/data fhlassen/arcade \
    interpret_phase --help

# Process files in current directory
docker run -v $PWD:/data fhlassen/arcade \
    recode \
        --input /data/variants.vcf.gz \
        --mode dominance \
        --scale-per-variant \
    | bgzip > dominance.vcf.gz

I get "bcftools not found" when running the examples.

The gene-level pipeline requires bcftools for the genotype extraction step. Install it via your package manager:

# macOS
brew install bcftools

# Ubuntu/Debian
sudo apt-get install bcftools