CNV calling QC metrics

CNV calling QC metrics aim to measure coverage distribution fit to reference samples distribution

Updated over a week ago

Franklin offers five different QC metrics concerning CNV detection, aiming to estimate the quality of CNV calling.

All the aforementioned QC metrics measure the coverage distribution consistency with the one of the reference samples (cohort of samples used for CNV model creation). As the CNV detection is based on coverage depth distribution, an anomalous sample in this context will result in improper CNV calling.

QC metrics descriptions

Coverage distribution fit to CNV model - this is the main indication for CNV calling quality. Failed value occurs when a large percent of targeted exons were predicted to have abnormal copy number. This implies that the coverage distribution of the sample significantly differs from the one of the reference samples and thus, many artifacts are called, and CNV calling is not reliable enough.

Coverage Median Absolute Z-Score - median absolute z-score of all exons normalized coverage depth, with respect to the reference samples. This metric measures the level of discrepancy in coverage distribution, relative to the variance among the reference samples. Therefore, large values imply anomaly exceeding expected variability.

Coverage MAPD - median of the absolute values of all pairwise log2 ratio differences, with respect to reference samples. This metric measures whether the level of consistency with reference samples' normalized coverage is uniform along the sample.

Coverage Depth Median Absolute Log2 Ratio - median absolute Log2 Ratio of all exons coverage depth, with respect to reference samples. This metric reflects the aggregated sample consistency with reference samples' normalized coverage.

Coverage Median Residual Noise - the median ratio of predicted copy number vs. expected copy number of all exons. This metric is similar to the previous one; however, it relies on CNV model predictions rather than the normalized coverage.

Note that inherent high variance in coverage distribution among samples may cause most QC metrics to cross the given thresholds, although foreseen in the model's underlying cohort. An exception is the Z-Score-based QC metric, which is more adapted to such variance. Thresholds of all metrics can be customized.

How should one address “inconsistency” between the QC metrics?

The "Coverage distribution fit to CNV model" is the main CNV QC metric.

Failed value suggests that CNV calling has low quality and the other metrics may shed more light on the reason for it.

Valid value suggests that CNV calling is reliable. It may still be accompanied by invalid values of the other metrics. This means that an inconsistency was found in sample coverage distribution; however, the CNV model managed to overcome this discrepancy and reduce sufficient levels of noise to achieve a feasible CNV analysis.

Still have questions? Reach out to our Support Team, they'll be happy to help!

Did this answer your question?