Skip to content

Rare genetic disease: a haystack full of needles

David Willetts and Nick Dand

David Willetts and Nick Dand

Nick Dand, a PhD student at King’s College London, explains his research developing tools to find the genetic mutations that cause rare diseases. This article was commended for the 2013 Max Perutz Science Writing Award.

Finding a needle in a haystack is – presumably – not easy. But in theory, with enough time and a lot of patience most of us could probably manage it, especially if we cheated a bit (with a magnet?). So let’s make the problem harder. Now we’ve lost our needle in a haystack which already happens to contain hundreds or thousands of other needles, all subtly different in shape or size. Even if we can pull out all of the needles we’re stuck: how can we find our needle when they all look so similar?

Identifying the genetic mutations that cause rare diseases feels a lot like the “too many needles” problem.

Recent technological breakthroughs mean we can now read a person’s entire genetic code, the blueprint found in every cell that guides how our bodies develop and function. It is a sequence of three billion nucleotides (which can be A, C, T or G) and is organised into units called genes, each having a specific function. Most of the code is identical from person to person (that’s what makes us all humans) but a tiny fraction can vary (that’s what makes us different humans).

A tiny fraction of three billion is not insignificant: we each carry upwards of a million sequence variants – for example a ‘C’ nucleotide where most people have a ‘T’. For genetic diseases, just one sequence variant can make all the difference. While most genetic variation is harmless, a variant in a critical position can cause a bodily function to fail and lead to disease. Cystic fibrosis is an example of a relatively common disease that is caused by mutations in a gene named CFTR, and our understanding of this causal relationship is of great benefit when it comes to diagnosis and treatment.

For many less common genetic diseases, however, we are yet to identify the sequence variants responsible – despite their often devastating consequences, like developmental problems that leave newborns little chance of survival. Reading a patient’s genetic sequence is a start, but the problem now is that we have too much data; since there are so many sequence variants we can’t easily tell which one causes the disease. Genetic sequencing technology has done a wonderful job of dispensing with the haystack altogether, but has left us knee-deep in needles.

This is the challenge that I face in my research. I am not a biologist in the conventional sense, but a mathematician-turned-computer scientist working in genetics. The volume of data now generated in this field is vast, and making sense of it all is one of the biggest challenges in the immediate future of genetics research. My goal is to build computational tools to analyse genetic data, picking out the noteworthy sequence variants from the rest. Tools that do this well are in great demand by the biologists and medics that study individual diseases.

With this “too many sequence variants” problem I can benefit from the progress made by others. For a number of rare diseases, a simple but powerful approach has identified the sequence variants responsible: read the genetic sequences of a handful of patients and find the only rare variants shared by them all. In haystack terms, this might equate to searching a number of haystacks for a matching set of needles.

Sadly, though, this doesn’t always work. There may be no single gene implicated in all of the patients, and it is this scenario that intrigues me. After all, genes do not carry out their tasks alone; they interact, working together to keep our various bodily processes running smoothly. So could a disease caused by some malfunctioning process not result from a sequence variant in any of the genes involved? With this in mind, I work on tools that take the patients’ sequence variants and add yet more data, this time from databases containing thousands of known gene interactions. What they look for are groups of genes that work together but carry a sequence variant in all of our patients. Our matching set of needles no longer have to be identical, just all of the same “type”.

It’s a difficult task but if we can find these groups they will tell us much about the roles of the individual genes involved and the function they perform together, crucial information if we are to develop better diagnosis and treatment options for these rare diseases. The benefits could reach further than this, however. Understanding how genes work together in simple cases is a great first step towards understanding the genetics of common diseases like diabetes or heart disease, which result not from a single mutation but from sequence variants in tens or hundreds of genes, or even more.

Just don’t ask me to explain that in terms of needles.

Nick Dand

No comments yet

Leave a Reply

You may use basic HTML in your comments. Your email address will not be published.

Subscribe to this comment feed via RSS