All Collections
FAQ
Create a case from a VCF - troubleshooting for files with >10M variant rows
Create a case from a VCF - troubleshooting for files with >10M variant rows
Updated over a week ago

Franklin supports VCF files with up to 10M variant rows. For most of our users use-cases, this will more than suffice. In some edge-cases, however, you may find yourself with a VCF of more than 10M variant rows. Most likely, many of the variant rows will have hom-ref genotype (usually described as a GT info of 0/0 in the sample column).

Such VCFs are usually the result of batching many different samples into a single multi-sample VCF, or having a WGS VCF with hom-ref entries. In these cases we suggest the following:

  1. In case your VCF is a multi-sample VCF, split it into several single-sample VCF files (or at least, multisample VCF files with less samples in each of them). This can be done by writing your own filter / split script, or using a standard bioinformatic tool (such as bcftools, for example).

  2. In case your VCF file is a WGS VCF with hom-ref entries, we suggest just purging these rows altogether. All hom-ref variants are filtered out as part of Franklin's processing anyway, so it won't affect the analysis or the data available to you. Once the variants are filtered out, and you file has less than 1M variant rows, you'll be able to upload and process it.

Still have questions? Reach out to our Support Team, they'll be happy to help!

Did this answer your question?