REGENIE integration

This page demonstrates variant-level genetic association testing using REGENIE with both additive and non-additive encodings produced by arcade.

The full example is in examples/regenie-variant-based/.

Note: REGENIE currently supports variant-level testing only with arcade's non-additive encodings. For set-based (gene-level) burden testing, see the SAIGE integration.

Overview

REGENIE uses a two-step approach:

  1. Step 1 (Null Model): Fits a whole-genome regression model using common variants to capture population structure and polygenic effects
  2. Step 2 (Variant Testing): Tests individual variants for association, using predictions from Step 1 to control for confounding

This example tests two genetic encodings:

Quick start

cd examples/regenie-variant-based
./run_all.sh

Or step by step:

./00_prepare_phenotypes.sh     # Format phenotypes for REGENIE
./01_regenie_step1.sh          # Step 1: fit null model
./01b_convert_vcf_to_bgen.sh   # Convert VCFs to BGEN format
./02_regenie_step2.sh          # Step 2: test both encodings

Requirements

Step 1: Encode genotypes with recode

Create the non-additive VCF from a standard additive VCF:

recode \
    --input simulated.vcf.gz \
    --mode nonadditive \
    --scale-per-variant \
    --min-hom-count 5 \
    --set-variant-id \
    --all-info \
    | bgzip > simulated.nonadditive.vcf.gz

Note the use of --scale-per-variant for variant-level analysis, which scales each variant's non-additive dosage independently.

Step 2: Convert to BGEN

REGENIE Step 2 requires BGEN format. Convert both encodings:

Additive — standard conversion from VCF GT field:

plink2 --vcf simulated.vcf.gz \
    --export bgen-1.3 'bits=16' ref-first

Non-additive — uses DS field from the recode-transformed VCF:

plink2 --vcf simulated.nonadditive.vcf.gz dosage=DS \
    --import-dosage-certainty 1 \
    --hard-call-threshold 0 \
    --export bgen-1.3 'bits=16' ref-first

Step 3: Fit null model

regenie \
    --step 1 \
    --bed simulated \
    --bsize 100 \
    --qt \
    --apply-rint \
    --threads 4

Step 4: Test variants

Both encodings are tested against the same null model:

for encoding in additive dominance; do
    regenie \
        --step 2 \
        --bgen simulated.${encoding}.bgen \
        --sample simulated.${encoding}.sample \
        --ref-first \
        --pred regenie_step1_pred.list \
        --bsize 400 \
        --minMAC 0.5 \
        --qt \
        --apply-rint
done

Adapting for real data

  1. Replace input files with your own PLINK (.bed/.bim/.fam) and VCF (.vcf.gz) files
  2. Update covariate lists in the scripts
  3. Adjust REGENIE parameters:
    • --bsize: Larger values use more memory but may be faster
    • --minMAC: Filter out very rare variants

References