Wednesday 9 May 2012

Mendelian Randomisation and the Prevention of Spurious Findings

(first published on Nature Network April 19th)

Epidemiology uses statistical methods to investigate patterns in public health. Or, if you believe certain news articles, it splits every possible life influence into things that either cause or cure cancer (or occasionally do both). Epidemiologists investigate health outcomes in a number of different ways, ranging from looking at changes in whole populations to conducting randomised controlled trials (RCTs), testing one intervention against another, or against the placebo effect. RCTs are the 'Gold Standard' of Epidemiological research, as their design means they can provide the strongest evidence for or against a hypothesis. Other epidemiological methods involve observation, rather than manipulation, and this can be problematic.

One week there may be a paper suggesting vitamin pills protect against heart disease, and the next week evidence emerges that they don't. This is often because of misleading findings from observational studies. People who decide to take vitamin supplements are likely to be different in a number of ways from people who don't take them. They might lead a more healthy lifestyle, exercise more, smoke less, and eat more vegetables. These differences are likely to impact on their chance of getting heart disease. Epidemiologists call these differences confounding variables, and although it's possible to take them into account in various statistical ways, the scientist has to know what they are in order to control for them. Miss one confounder out of your analysis, and it may inflate any relationship between the two factors you're interested in, so you've got a spurious finding. One way of getting round this problem is to conduct RCTs, so different types of people are randomly distributed across your intervention condition. Although this may be fine to do with a vitamin pill, it's sometimes not ethical or even possible. If withholding the intervention being investigated is thought to lead to harm, it is unethical to do so, and if the intervention is something like drinking alcohol, it's impractical to get people to either drink or not drink for an experiment.

Investigating alcohol in observational studies has the same problems mentioned above; people who drink are different to people who don't. Socioeconomic status, likelihood to smoke and education level may all affect the relationship you're interested in. There are added complications in that people may have stopped drinking because they are unwell, so instead of drinking affecting the disease, the disease is changing alcohol use. This is called reverse causation, and may explain why we hear that a glass of wine is good for you (sorry to be the bringer of bad news).

But advances in the understanding of genetics can help. Everyone has 23 pairs of chromosomes, and coded in these are genes; the building blocks for who we are. My genes are practically identical to yours, but there are some key genes which have different DNA code in different people. Some of these differences can lead to diseases, such as Cystic Fibrosis, caused by a faulty gene not properly making an important protein. Some have less extreme, but still very interesting, effects. There has been a gene variation found, common in East Asian populations, which means a protein needed to break down a metabolite of alcohol is not produced. People with this variation get unpleasant symptoms when they drink due to a build up of acetaldehyde in their blood, so very often they avoid alcohol. By looking at someone's DNA in the location known here, we can investigate the effect of alcohol use on whatever we're interested in, using the gene as a proxy variable instead of directly analysing alcohol intake, as people with the unusual variation will be less likely to drink. All well and good, but surely this is still just observing, so how do we stop the interference of confounding?

It turns out that our genes have some very useful properties which make them perfect for this task. The genes that you have are all unrelated to environment, as you got them before you were born, so environmental confounders should be randomly distributed between your gene categories. Also, when your parents' chromosomes divided to create the egg or sperm that contained the genes you inherited from them, each gene splits independently of all the others, so you have a random chance of also inheriting a genetic confounder. Because of these neat properties, you can assume that your proxy gene will be independent of any confounding variable affecting the exposure you're interested in, and therefore where confounder levels would have been uneven across people grouped by alcohol consumption, they will be randomly distributed across gene variation. Results using this technique have shown evidence against alcohol being a gateway drug leading to illicit drug use, as the gene variation was not associated with illicit drug use.

Of course, nothing's perfect. There are a few conditions where this technique will fall down, but as long as you're aware of them, you should be able to avoid the problems. Firstly, occasionally 'linkage disequilibrium' occurs. Certain genes are more likely to move together during meiosis, meaning they are not inherited independently. If your proxy gene travels with a gene which affects your outcome of interest, this will impact on your findings. There is a method to check for linkage disequilibrium, so you can ensure it's not a problem. Also, there are certain genes which have an impact on a number of different traits (this is called pleiotropy), so if your gene has a direct effect on the outcome you are interested in, it is unsuitable for Mendelian Randomisation. Finally, the technique fails to work effectively if there are systematic differences in the genetics of the population you are investigating. For example, if a population is made up of two groups of peoples that used to live separately, but now live together, there will be non random genetic differences between the groups due to selective mating over the time when they were separate. This may mean other differences between the groups will not be randomly distributed across the gene you're interested in, making the population unsuitable.

However, although this technique can only be used in very specific circumstances, where a gene is known to affect the intervention you're interested in, and doesn't suffer from the limitations mentioned above, it is a really elegant technique which will hopefully stop the spurious results from observational studies becoming newspaper fodder.


Check this article out for an overview of MR.

No comments:

Post a Comment