Variant Analysis with GATK PrerequisitesUpdated
This workshop will focus on the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. You will learn why each step is essential to the variant discovery process, what are the operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.
In the course of this workshop, we highlight key functionalities such as the germline GVCF workflow for joint variant discovery in cohorts, RNAseq specific processing, and somatic variant discovery using MuTect2. We also preview capabilities of the upcoming GATK version 4, including a new workflow for CNV discovery, and we demonstrate the use of pipelining tools to assemble and execute GATK workflows.
The workshop is composed of one day of lectures and two days of handson training, structured as follows. Day 1: theory and application of the Best Practices for Variant Discovery in highthroughput sequencing data. Day 2 and the morning of Day 3: handson exercises on how to manipulate the standard data formats involved in variant discovery and how to apply GATK tools appropriately to various use cases and data types. Day 3 afternoon: hands-on exercises on how to write workflow scripts using WDL, the Broad's new Workflow Description Language, and to execute these workflows locally as well as through a publicly accessible cloud-based service.
Please note that this workshop is focused on human data analysis. The majority of the materials presented does apply equally to nonhuman data, and we will address some questions regarding adaptations that are needed for analysis of non-human data, but we will not go into much detail on those points.
Please note that if you are not eligible for a University of Cambridge Raven account you will need to Book or register Interest by linking here.
- Graduate students, Postdocs and Staff members from the University of Cambridge, Affiliated Institutions and other external Institutions or individuals
- Please be aware that these courses are only free for University of Cambridge students. All other participants will be charged a registration fee in some form. Registration fees and further details regarding the charging policy are available here
- Further details regarding eligibility criteria are available here
- Familiarity with the basic terms and concepts of genetics and genomics.
- Basic familiarity with the command line environment is required
- Sufficient UNIX experience might be obtained from one of the many UNIX tutorials available online.
Bioinformatics, Data handling, Data mining, Data visualisation, Genomics, Sequence variations
After this course you should be able to:
- Understand the overall variant discovery workflow rationale and requirements
- Understand key methods and functionalities in light of the latest research
- Understand key differences between germline and somatic variant discovery approaches
- Apply analysis tools and Best Practices workflows to a real data set
- Interpret analysis results and troubleshoot common problems
- Write and execute WDL analysis pipelines
During this course you will learn about:
- Pre-processing of high-throughput DNA and RNAseq sequencing data
- Variant discovery (germline and somatic short variants, somatic CNV)
- Germline variant filtering and evaluation
- Pipelining strategies
Presentations, demonstrations and practicals
- Free for University of Cambridge students
- £ 50/day for all University of Cambridge staff, including postdocs, and participants from Affiliated Institutions. Please note that these charges are recovered by us at the Institutional level
- It remains the participant's responsibility to acquire prior approval from the relevant group leader, line manager or budget holder to attend the course. It is requested that people booking only do so with the agreement of the relevant party as costs will be charged back to your Lab Head or Group Supervisor.
- £ 50/day for all other academic participants from external Institutions and charitable organizations. These charges must be paid at registration
- £ 100/day for all Industry participants. These charges must be paid at registration
- Further details regarding the charging policy are available here
3
Once a year
- Introduction to genome variation analysis using NGS
- Introduction to high-throughput sequencing data analysis
- EMBL-EBI: European Variation Archive
Events available