Prediction tools PP3, BP4
Updated over a week ago

The original ACMG guidelines (Richards et al.) for evidence-based computational prediction tools, supporting pathogenic (PP3) or benign (BP4)did not include explicit specifications of how to take into account the different prediction tools or which thresholds to use. In addition, these criteria were limited to the Supporting evidence level only. Later studies recommended (Plon et al, Qian et al) the usage of meta-predictors tools, yet there were still no explicit recommendations on how to use these different tools.

New recommendations by the ClinGen Sequence Interpretation (SVI) Working Group were released, providing recommendations on which thresholds to use for a subset of prediction tools when applying the PP3/BP4 rules. In addition, the new recommendations no longer limit the evidence level to Supporting, but now some prediction tools can reach a Pathogenic Strong (PP3_Strong) level, or Benign Very Strong level (BP4_Very_Strong).

In their study, they used an unbiased dataset of pathogenic and benign variants, obtained from gnomAD and ClivnVar databases, in order to evaluate 13 different prediction tools.

Based on the performance and accuracy of these 13 prediction tools, different evidence strengths were given to different score thresholds by using Tavtigian et al Bayesian classification framework.

Multiple tools reached score thresholds justifying a Moderate strength and several reached Strong strength levels. A single tool (Revel) even reached a Very Strong evidence level for a benign classification.

The thresholds assigned to some of the different tools did not align with the scores suggested by the developer-recommended thresholds. For example, the authors reported that SIFT, PolyPhen-2, and CADD did not meet the local likelihood ratio thresholds for Supporting evidence for PP3. Moreover, the developer-recommended threshold of 20.0 for CADD was in the score interval corresponding to Moderate benign evidence for BP4. This means that users might have applied the PP3 rule for variants with a CADD score of 20 (or a bit more), while this score actually corresponds with moderate evidence for Benign, suggesting using these threshold overcalls for pathogenicity and inappropriate use for this evidence.

Based on these findings the following recommendations were given:

  • Specific thresholds will be used, corresponding to different evidence strengths (e.g Supporting, Moderate, Strong)

  • Users should select a single tool to use and follow. This tool should be one that reaches at least Strong evidence for pathogenicity. This includes BayesDel, MutPred2, REVEL, and VEST4.

  • The maximal evidence level that is being given for the PP3/BP4 is limited by the recommended thresholds. Meaning that a variant with a given score corresponding to a level of Moderate evidence, can not be given a higher weighting (e.g Strong)

  • When PP3 is being applied with the PM1 rule (missense variant located in a hotspot or a functional domain), the combined evidence should be limited to Strong. For example, if PM1 was met with a Moderate strength, and PP3 was met with a Strong strength (PP3_Strong), the total evidence should be Strong (PS), and not Strong+Moderate (PS, PM). Conversely, PM1 and PP3_Moderate is a valid combination (PM, PM) as it doesn’t exceed the Strong strength.

  • In situations where research groups have developed gene-specific predictions, laboratories could select the recommended alternative tool and thresholds for these variants, instead of their standard tool.

*This study and recommendations were limited to only missense variants.

Franklin PP3/BP4 implementation

Based on the latest recommendations, Franklin has adjusted the thresholds for the different prediction tools, as well as displayed their corresponding suggested evidence strength.

In addition, Franklin’s aggregated prediction and implementation for PP3/BP4 rules have been modified accordingly.

While the recommendations allow for PP3/BP4 to be given a higher strength-based weighting, as a step of caution, Franklin currently only uses these recommendations for the PP3 rule, while for BP4 the default strength will remain as Supporting, while the user can change its strength manually based on their judgment.

Franklin uses REVEL as its default tool for missense variants, as it achieved the best results among all tools. As mentioned, Franklin uses a more cautious approach and will use REVEL only if it’s less than 0.15. In addition, Franklin also takes into account other prediction tools for different scenarios including mitochondrial variants, variants with gene-specific predictions, and splice region variants as described below:

  • In case of a standard missense variant, it will use REVEL scores:

    • REVEL score >= 0.932 , PP3_Strong

    • REVEL score [0.773, 0.932) , PP3_Moderate

    • REVEL score [0.644, 0.773) , PP3_Supporting

    • REVEL score < 0.15 , BP4_Supporting

  • In the case of Mitochondrial variants, Franklin uses APOGEE and MitoTip prediction tools based on the recommendations for Mitochondrial variants classification (Falk et al). The thresholds being used are:

    • APOGEE score >= 0.75 or MitoTip score >= 16.25, PP3_Supporting

    • APOGEE score < 0.25 and MitoTip score < 8.44 , BP4_Supporting

  • In the case of a gene-specific prediction exist for CardioBoost tool predictions, it will apply the rules based on these tools, and adjust strength based on REVEL

    • CardioBoost_CM score >= 0.9 or CardioBoost_ARMscore >= 0.9, PP3_Moderate

    • CardioBoost_CM score [0.7, 0.9) or CardioBoost_ARM score [0.7, 0.9) , PP3_Moderate

    • CardioBoost_CM score<= 0.15 and CardioBoost_ARM score <= 0.15 , BP4_Supporting

  • In the case the variant overlaps with a splice region it will also consider the SpliceAI and dbscSNV Ada

    • In case SpliceAI score > 0.5 or dbscSNV score > 0.7 , PP3_Supporting

    • In case SpliceAI score < 0.1 and dbscSNV score < 0.15 , BP4_Supporting

  • In any other case, both rules won’t be applied.

For simplification of interpreting the prediction score, our aggregated score was normalized to be in the range of 0-1, where:

  • Score [0.9, 1.0] corresponds with Strong pathogenic evidence, PP3_Strong

  • Score [0.8, 0.9) corresponds with Moderate pathogenic evidence, PP3_Moderate

  • Score [0.7, 0.8) corresponds with Supporting pathogenic evidence, PP3_Supporting

  • Score [0, 0.15) corresponds with Supporting benign evidence, BP4_Supporting

In addition, the recommendations limit the usage of PM1 and PP3 rules to Strong strength to avoid double counting of similar evidence. This means when the PM1 rule is met with Moderate strength, and PP3 is met with a Strong strength (PP3_Strong), it will downgrade PP3 to Moderate (PP3_Moderate).

In an additional step of caution, in the case the PM1 rule is met, and the PP3 rule is met with Moderate strength (PP3_Moderate), Franklin will downgrade PP3 strength to Supporting (PP3_Supporting). If the user chooses to, he can change it manually to PP3_Moderate based on their judgment.

Still have questions? Reach out to our Support Team, they'll be happy to help!

Did this answer your question?