This document outlines how Franklin evaluates the confidence of various variant types—SNPs/Indels, CNVs, and Fusions—using both standard quality metrics and proprietary algorithms.
SNPs/Indels
Variant confidence for SNPs and Indels is determined using multiple quality and bias-related metrics, including:
Caller-provided metrics:
Quality - Confidence score based on the likelihood of reads supporting the alternate vs. reference allele.
Strand Bias (Fisher’s Exact Test) - Determines possible strand-specific bias for reference and alternate alleles.
Mapping Quality - rank sum test of mapping quality of reads supporting the alternate allele.
Read Position Bias - Assesses whether the alternate allele tends to appear at specific positions within reads (e.g. toward the ends).
Total Depth - Total read depth covering the variant region.
Base Quality - Rank sum of base quality scores for bases supporting the alternate allele
Allele Balance - Ratio of reads supporting alternate vs. reference allele - should match expected homozygous or heterozygous allele ratios. (Note: Disabled in somatic variant analysis.)
Caller-Specific Metrics - Additional metrics may be included depending on the variant caller used.
Joint Genotyping - Family-based joint calling increases confidence for inherited variants.
Global metrics:
Region Annotations - Adjust confidence scores based on genomic context, including:
Repetitive regions
Homopolymers
Noisy or hard-to-sequence regions
Internal and public variant frequencies (e.g., gnomAD)
Known Common Artifacts - Confidence may be reduced if a variant is known to be false based on internal or public databases (e.g., gnomAD).
VQSR (Variant Quality Score Recalibration) - Machine learning-based algorithm, which calculates thresholds for caller-provided metrics based on a sample cohort.
CNVs
Confidence in CNV calls is calculated using:
Machine Learning Models - ML-based scoring integrated into Franklin's proprietary CNV detection algorithm.
Fold Change (Log Ratio) - Compares coverage between the sample and a reference, normalized for:
GC content
Repetitive or low-complexity regions
Tumor-Specific Factors (Somatic CNVs) - For somatic CNV calls, confidence is also influenced by:
Tumor purity (estimated and provided)
Subclonal populations
RNA Fusions
Fusion confidence is primarily based on evidence from read alignment and quality, including:
Supporting Read Types:
Paired Reads: Mapped to opposing sides of the fusion junction
Split Reads (Hard Clipped): A single read split across the junction
Soft Clipped Reads: Clipped region aligns to the mate’s sequence
Noise in Breakend Region - evaluates the level of similar read types (paired, clipped, split) that are mapped only to one side of the junction, but not the other (different fusion mate). High background noise lowers confidence.
Read Re-assembly
Reads are reassembled around the fusion site, and confidence is adjusted based on the number of supporting assemblies rather than just raw read counts.
Still have questions? Reach out to our Support Team, they'll be happy to help!