Question: a) Download the clinvar data set in VCF format from the link below. b) Clean and do some normalization of this data (get to at

a) Download the clinvar data set in VCF format from the link below.

b) Clean and do some normalization of this data (get to at least 1NF) and put this into a new schema called clinvar inside a table called variants. Infer from reading the VCF specification and the downloaded clinvar data what attributes are needed and how you can determine each row and each column (hint: lines and \t rows with ## on header lines). Pay close attention to: CHROM, POS, ID, REF, ALT and INFO (the really nasty one that violates 1NF). Because INFO is ; separated, you must expand it out to several columns of which CLNVC,CLNSIG, RS and especially GENEINFO are needed. You can toss the rest if you wish and may use any programming language or processing tool to complete this section.

Paper that describes the domain of CLINVAR:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965032/

VCF format file that will give you a DB-like (is this 1NF?)

ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20180401.vcf.gz

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!