Unveiling the genetic potential of UK Biobank
The immense value of UK Biobank as a resource for studying the genetics behind complex traits and diseases is demonstrated for the first time in a genetic study of lung health, published today. With all genotype data from UK Biobank to be made available next year, here Professor of Genetic Epidemiology and Public Health at the University of Leicester Martin Tobin shares his experience and exciting study findings.
UK Biobank is the largest European biobank available to date. Set up in 2006 and part-funded by the MRC, it is a huge resource containing data from 502,682 UK individuals. Participants have provided a range of information about their lifestyles, physical characteristics and health, and they will be followed up for at least 25 years.
We were really excited about the potential value of this data to our research which led us to conduct the first ever genetic association analyses in UK Biobank, the UK Biobank Lung Exome Variant Evaluation (UK BiLEVE) study.
By extracting DNA from participants’ banked blood samples, we analysed the genomes of a subset of UK Biobank participants, 50,008 in total selected according to their measures of lung health and whether or not they smoked.
New treatment targets
In our analysis we compared 28 million genetic variants – different variations of the same gene – across the genomes of all participants, against measures of lung health and smoking status. By making these comparisons we were able to identify new ‘signals’ of association – sections of the human genome that relate to tobacco addiction, lung health and disease.
Six of these relate to lung function, asthma and chronic obstructive pulmonary disease (COPD). COPD is the third leading cause of death worldwide and characterised by irreversible obstruction to airflow in the major airways of the lungs. We found a lot of overlap between the genetic causes of airflow obstruction in people with and without asthma, including a variant on chromosome six which predisposes to asthma and is strongly associated with COPD. These signals provide clues to new proteins and cellular mechanisms that could be targeted to prevent and treat lung disease.
By studying heavy smokers and never smokers, we identified five new genetic signals relating to smoking behaviour. Two genetic variants in the signals relate to expression of the same gene, NCAM1. We don’t yet know how this gene might affect smoking behaviour, but studying the mechanism could provide useful clues for developing treatments to help people quit smoking.
But as well as demonstrating the quality of UK Biobank data, our study has shown it is possible to measure large-scale genetic variation in UK Biobank. We worked with academic and industry collaborators to design a new genotyping array for our study. It consists of a miniaturised array or ‘chip’ which contains single strands of DNA capable of quickly recognising and measuring hundreds of thousands of genetic variants. In our study we measured common and rare genetic variants spaced across the whole human genome.
Based on the success of our array, which performed to a consistently high standard, UK Biobank is using a similar array to genotype all remaining participants. The 95 per cent similarity between arrays means that once data is collected for the remaining 450,000 participants, it can be combined with ours.
An untapped resource
From early 2016, approved researchers will be able to study genotype data for all participants alongside their health and disease data. Although anyone can view the type of information collected and the overall characteristics of participants*, there is a two-stage application process for registered researchers to propose a specific project and gain access to actual data.
While it may seem like there are a lot of steps to navigate to access the data, these are important to ensure that participants’ rights are safeguarded. And the large number of approved projects suggests that access procedures are working well.
The greatest benefit will occur when groups of talented researchers come together to access the data. I hope that early career researchers make good use of the resource. I encourage them to look beyond their own disciplines for training opportunities and chances to collaborate alongside researchers with different but complementary skills.
No discussion of UK Biobank would be complete without thanking the participants who have committed their time and been willing to share their data. UK Biobank was a bold vision at the time it was first funded. But I think that there is little doubt that it was a very sound investment as we begin to unravel the causes of disease and find clues about how to improve prevention and treatment.
The research was published in The Lancet Respiratory Medicine.
Find out how to gain access to UK Biobank data.
*UK Biobank’s Data Showcase allows scientists and members of the public to view the valuable information that has been collected to help improve the health of future generations.