Tampilkan postingan dengan label science. Tampilkan semua postingan
Tampilkan postingan dengan label science. Tampilkan semua postingan

The Tufnel Effect


In This Is Spin̈al Tap, British heavy metal god Nigel Tufnel says, in reference to one of his band's less succesful creations:

It's such a fine line between stupid and...uh, clever.
This is all too true when it comes to science. You can design a breathtakingly clever experiment, using state of the art methods to address a really interesting and important question. And then at the end you realize that you forgot to type one word when writing the 1,000 lines of software code that runs this whole thing, and as a result, the whole thing's a bust.

It happens all too often. It has happened to me, let me think, three times in my scientific career and, I know of several colleagues who had similar problems and I'm currently struggling to deal with the consequences of someone else's stupid mistake.

Here's my cautionary tale. I once ran an experiment involving giving people a drug or placebo and when I crunched the numbers I found, or thought I'd found, a really interesting effect which was consistent with a lot of previous work giving this drug to animals. How cool is that?

So I set about writing it up and told my supervisor and all my colleagues. Awesome.

About two or three months later, for some reason I decided to reopen the data file, which was in Microsoft Excel, to look something up. I happened to notice something rather odd - one of the experimental subjects, who I remembered by name, was listed with a date-of-birth which seemed wrong: they weren't nearly that old.

Slightly confused - but not worried yet - I looked at all the other names and dates of birth and, oh dear, they were all wrong. But why?

Then it dawned on me and now I was worried: the dates were all correct but they were lined up with the wrong names. In an instant I saw the horrible possibility: m ixed up names would be harmless in themselves but what if the group assignments (1 = drug, 0 = placebo) were lined up with the wrong results? That would render the whole analysis invalid... and oh dear. They were.

As the temperature of my blood plummeted I got up and lurched over to my filing cabinet where the raw data was stored on paper. It was deceptively easy to correct the mix-up and put the data back together. I re-ran the analysis.

No drug effect.

I checked it over and over. Everything was completely watertight - now. I went home. I didn't eat and I didn't sleep much. The next morning I broke the news to my supervisor. Writing that email was one of the hardest things I've ever done.

What happened? As mentioned I had been doing all the analysis in Excel. Excel is not a bad stats package and it's very easy to use but the problem is that it's too easy: it just does whatever you tell it to do, even if this is stupid.

In my data as in most people's, each row was one sample (i.e. a person) and each column was a piece of info. What happened was that I'd tried to take all the data, which was in no particular order, and reorder the rows alphabetically by subject name to make it easier to read.

How could I screw that up? Well, by trying to select "all the data" but actually only selecting a few of the columns. Then I reordered them, but not the others, so all the rows became mixed up. And the crucial column, drug=1 placebo=0, was one of the ones I reordered.

The immediate lesson I learned from this was: don't use Excel, use SPSS, which simply does not allow you to reorder only some of the data. Actually, I still use Excel for making graphs and figures but every time I use it, I think back to that terrible day.

The broader lesson though is that if you're doing something which involves 100 steps, it only takes 1 mistake to render the other 99 irrelevant. This is true in all fields but I think it's especially bad in science, because mistakes can so easily go unnoticed due to the complexity of the data, and the consequences are severe because of the long time-scale of scientific projects.


Here's what I've learned: Look at your data, every step of the way, and look at your methods, every time you use them. If you're doing a neuroimaging study, the first thing you do after you collect the brain scans is to open them up and just look at them. Do they look sensible?

Analyze your data as you go along. Every time some new results come in, put it into your data table and just look at it. Make a graph which just shows absolutely every number all on one massive, meaningless line from Age to Cigarette's Smoked Per Week to EEG Alpha Frequency At Time 58. For every subject. Get to know the data. That way if something weird happens to it, you'll know. Don't wait to the end of the study to do the analysis. And don't rely on just your own judgement - show your data to other experts.

Check and recheck your methods as you go along. If you're running, say, a psychological experiment involving showing people pictures and getting them to push buttons, put yourself in the hot seat and try it on yourself. Not just once, but over and over. Some of the most insidious problems with these kinds of studies will go unnoticed if you only look at the task once - such as the old "randomized"-stimuli-that-aren't-random issue.

Trust no-one. This sounds bad, but it's not. Don't rely on their work, in experimental design or data analysis, until you've checked it yourself. This doesn't mean you're assuming they're stupid, because everyone makes these mistakes. It just means you're assuming they're human like you.

Finally, if the worst happens and you discover a stupid mistake in your own work: admit it. It feels like the end of the world when this happens, but it's not. However, if you don't admit it, or even worse, start fiddling other results to cover it up - that's misconduct, and if you get caught doing that, it is the end of the world, or your career, at any rate.

The Brain's Sarcasm Centre? Wow, That's Really Useful

A team of Japanese scientists have found the most sarcastic part of the brain known to date. They also found the metaphor centre of the brain and, well, it's kind of like a pair of glasses.

The paper is Distinction between the literal and intended meanings of sentences and it's brought to you by Uchiyama et al. They took 20 people and used fMRI to record neural activity while the volunteers read 4 kinds of statements:

  • Literally true
  • Nonsensical
  • Sarcastic
  • Metaphorical
The neat thing was that the statements themselves were the same in each case. The preceding context determined how they were to be interpreted. So for example, the statement "It was bone-breaking" was literally true when it formed part of a story about someone in hospital describing an accident; it was metaphorical in the context of someone describing how hard it was to do something difficult; and it was nonsensical if the context was completely unrelated ("He went to the bar and ordered:...").

Here's what they found. Compared to the literally-true and the nonsensical statements, which were a control condition, metaphorical statements activated the head of the caudate nucleus, the thalamus, and an area of the medial PFC they dub the "arMPFC" but which other people might call the pgACC or something even more exotic; names get a bit vague in the frontal lobe.


The caudate nucleus, as I said, looks like a pair of glasses. Except without the nose bit. The area activated by metaphors was the "lenses". Kind of.

Sarcasm however activated the same mPFC region, but not the caudate:

Sarcasm also activated the amygdala.

*

So what? This is a very nice fMRI study. 20 people is a lot, the task was well-designed and the overlap of the mPFC blobs in the sarcasm-vs-control and the metaphor-vs-control tasks was impressive. There's clearly something going on there in both cases, relative to just reading literal statements. Something's going on in the caudate and thalamus with metaphor but not sarcasm, too.

But what can this kind of study tell us about the brain? They've localized something-about-metaphor to the caudate nucleus, but what is it, and what does the caudate actually do to make that thing happen?

The authors offer a suggestion - the caudate is involved in "searching for the meaning" of the metaphorical statement in order to link it to the context, and work out what the metaphor is getting at. This isn't required for sarcasm because there's only one, literal, meaning - it's just reversed, the speaker actually thinks the exact opposite. Whereas with both sarcasm and metaphor you need to attribute intentions (mentalizing or "Theory of Mind").

That's as plausible an account as any but the problem is that we have no way of knowing, at least not from imaging studies, if it's true or not. As I said this is not the fault of this study but rather an inherent challenge for the whole enterprise. The problem is - switch on your caudate, metaphor coming up - a lot like the challenge facing biology in the aftermath of the Human Genome Project.

The HGP mapped the human genome, and like any map it told us where stuff is, in this case where genes are on chromosomes. You can browse it here. But by itself this didn't tell us anything about biology. We still have to work out what most of these genes actually do; and then we have to work out how they interact; and they we have to work out how those interactions interact with other genes and the environment...

Genomics people call this, broadly speaking, "annotating" the genome, although this is not perhaps an ideal term because it's not merely scribbling notes in the margins, it's the key to understanding. Without annotation, the genome's just a big list.

fMRI is building up a kind of human localization map, a blobome if you will, but by itself this doesn't really tell us much; other tools are required.

ResearchBlogging.orgUchiyama HT, Saito DN, Tanabe HC, Harada T, Seki A, Ohno K, Koeda T, & Sadato N (2011). Distinction between the literal and intended meanings of sentences: A functional magnetic resonance imaging study of metaphor and sarcasm. Cortex; a journal devoted to the study of the nervous system and behavior PMID: 21333979

Fat Genes Make You Happy?

Does being heavier make you happier?

An interesting new paper from a British/Danish collaboration uses a clever trick based on genetics to untangle the messy correlation between obesity and mental health.

They had a huge (53,221) sample of people from Copenhagen, Denmark. It measured people's height and weight to calculate their BMI, and asked them some simple questions about their mood, such as "Do you often feel nervous or stressed?"

Many previous studies have found that being overweight is correlated with poor mental health, or at least with unhappiness ("psychological distress"). And this was exactly what the authors found in this study, as well.

Being very underweight was also correlated with distress; perhaps these were people with eating disorders or serious medical illnesses. But if you set those small number of people aside, there was a nice linear correlation between BMI and unhappiness. When they controlled for various other variables like income, age, and smoking, the effect of BMI became smaller but it was still significant.

But that's just a correlation, and as we all know, "correlation doesn't imply causation". Actually, it does; something must be causing the correlation, it didn't just magically appear out of nowhere. The point is that shouldn't make simplistic assumptions about what the causal direction is.

It would be easy to make these assumptions. Maybe being miserable makes you fat, due to comfort eating. Or maybe being fat makes you miserable, because overweight is considered bad in our society. Or both. Or neither. We don't know.

Finding this kind of correlation and then speculating about it is where a lot of papers finish, but for these authors, it was just the start. They genotyped everyone for two different genetic variants known, from lots of earlier work, to consistently affect body weight (FTO rs9939609 and MC4R rs17782313).

They confirmed that they were indeed associated with BMI; no surprise there. But here's the surprising bit: the "fat" variants of each gene were associated with less psychological distress. The effects were very modest, but then again, their effects on weight are small too (see the graph above; the effects are in terms of z scores and anything below 0.3 is considered "small".)

The picture was very similar for the other gene.

This allows us to narrow down the possibilities about causation. Being depressed clearly can't change your genotype. Nothing short of falling into a nuclear reactor can change your genotype. It also seems unlikely that genotype was correlated with something else which protects against depression. That's not impossible; it's the problem of population stratification, and it's a serious issue with multi-ethnic samples, but this paper only included white Danish people.

So the author's conclusion is that being slightly heavier causes you to be slightly happier, even though overall, weight is strongly correlated with being less happy. This seems paradoxical, but that's what the data show.

That conclusion would fall apart, though, if these genes directly effect mood, and also, separately, make you fatter. The authors argue that this is unlikely, but I wonder. Both FTO and MC4R are active in the brain: they influence weight by making you eat more. If they can affect appetite, they might also affect mood. A quick PubMed search only turns up a couple of rather speculative papers about MC4R and its possible links to mood, so there's no direct evidence for this, but we can't rule it out.

But this paper is still an innovative and interesting attempt to use genetics to help get beneath the surface of complex correlations. It doesn't explain the observed correlation between BMI and unhappiness - it actually makes it more mysterious. But that's a whole lot better than just speculating about it.

ResearchBlogging.orgLawlor DA, Harbord RM, Tybjaerg-Hansen A, Palmer TM, Zacho J, Benn M, Timpson NJ, Smith GD, & Nordestgaard BG (2011). Using genetic loci to understand the relationship between adiposity and psychological distress: a Mendelian Randomization study in the Copenhagen General Population Study of 53,221 adults. Journal of internal medicine PMID: 21210875

The Scanner's Prayer

MRI scanners have revolutionized medicine and provided neuroscientists with some incredible tools for exploring the brain.

But that doesn't mean they're fun to use. They can be annoying, unpredictable beings, and you never know whether they're going to bless you with nice results or curse you with cancelled scans and noisy data.

So for the benefit of everyone who has to work with MRI, here is a devotional litany which might just keep your scanner from getting wrathful at the crucial moment. Say this before each scan. Just remember, the magnet is always on and it can read your mind, so make sure you really mean it, and refrain from scientific sins...

*

Our scanner, which art from Siemens,
Hallowed be thy coils.
Thy data come;
Thy scans be done;
In grey matter as it is in white matter.
Give us this day our daily blobs.
And forgive us our trespasses,
As we forgive them that trespass onto our scan slots.
And lead us not into the magnet room carrying a pair of scissors,
But deliver us from volunteers who can’t keep their heads still.
For thine is the magnet,
The gradients,
And the headcoil,
For ever and ever (at least until we can afford a 7T).
Amen.

(Apologies to Christians).

England Rules the (Brain) Waves

Yes, England has finally won something. After a poor showing in the 2010 World Cup, the Eurovision Song Contest, and the global economic crisis, we're officially #1 in neuroscience. Which clearly is the most important measure of a nation's success.

According to data collated by ScienceWatch.com and released recently, each English neuroscience paper from the past 10 years has been cited, on average, 24.53 times, making us the most cited country in the world relative to the total number of papers published (source here). We're second only to the USA in terms of overall citations.

(In this table, "Rank" refers to total number of citations).

Why is this? I suspect it owes a lot to the fact that England has produced many of the technical papers which everyone refers to (although few people have ever read). Take the paper Dynamic Causal Modelling by Karl Friston et al from London. It's been cited 649 times since 2003, because it's the standard reference for the increasingly popular fMRI technique of the same name.

Or take Ashburner and Friston's Voxel-Based Morphometry—The Methods, cited over 2000 times in the past 10 years, which introduced a method for measuring the size of different brain regions. Or take...most of Karl Friston's papers, actually. He's the single biggest contributor to the way in which modern neuroimaging is done.

The Tree of Science

How do you know whether a scientific idea is a good one or not?


The only sure way is to study it in detail and know all the technical ins and outs. But good ideas and bad ideas behave differently over time, and this can provide clues as to which ones are solid; useful if you're a non-expert trying to evaluate a field, or a junior researcher looking for a career.

Today's ideas are the basis for tomorrow's experiments. A good idea will lead to experiments which provide interesting results, generating new ideas, which will lead to more experiments, and so on.

Before long, it will be taken as granted that it's true, because so many successful studies assumed it was. The mark of a really good idea is not that it's always being tested and found to be true; it's that it's an unstated assumption of studies which could only work if it were true. Good ideas grow onwards and upwards, in an expanding tree, with each exciting new discovery becoming the boring background of the next generation.

Astronomers don't go around testing whether light travels at a finite speed as opposed to an infinite one; rather, if it were infinite, their whole set-up would fail.

Bad ideas generate experiments too, but they don't work out. The assumptions are wrong. You try to explain why something happens, and you find that it doesn't happen at all. Or you come up with an "explanation", but next time, someone comes along and finds evidence suggesting the "true" explanation is the exact opposite.

Unfortunately, some bad ideas stick around, for political or historical reasons or just because people are lazy. What tends to happen is that these ideas are, ironically, more "productive" than good ideas: they are always giving rise to new hypotheses. It's just that these lines of research peter out eventually, meaning that new ones have to take their place.

As an example of a bad idea, take the theory that "vaccines cause autism". This hypothesis is, in itself, impossible to test: it's too vague. Which vaccines? How do they cause autism? What kind of autism? In which people? How often?

The basic idea that some vaccines, somewhere, somehow, cause some autism, has been very productive. It's given rise to a great many, testable, ideas. But every one which has been tested has proven false.

First there was the idea that the MMR vaccine causes autism, linked to a "leaky gut" or "autistic enterocolitis". It doesn't, and it's not linked to that. Then along came the idea that actually it's mercury preservatives in vaccines that cause autism. It doesn't. No problem - maybe it's aluminium? Or maybe it's just the Hep B vaccine? And so on.

At every turn, it's back to square one after a few years, and a new idea is proposed. "We know this is true; now we just need to work out why and how...". Except that turns out to be tricky. Hmm. Maybe, if you keep ending up back at square one, you ought to find a new square to start from.

Worst. Antidepressant. Ever.

Reboxetine is an antidepressant. Except it's not, because it doesn't treat depression.

This is the conclusion of a much-publicized article just out in the BMJ: Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and SSRI controlled trials.

Reboxetine was introduced to some fanfare, because its mechanism of action is unique - it's a selective norepinephrine reuptake inhibitor (NRI), which has no effect on serotonin, unlike Prozac and other newer antidepressants. Several older tricyclic antidepressants were NRIs, but they weren't selective because they also blocked a shed-load of receptors.

So in theory reboxetine treats depression while avoiding the side effects of other drugs, but last year, Cipriani et al in a headline-grabbing meta-analysis concluded that in fact it's the exact opposite: reboxetine was the least effective new antidepressant, and was also one of the worst in terms of side effects. Oh dear.

And that was only based on the published data. It turns out that Pfizer, the manufacturers of reboxetine, had chosen to not publish the results of most of their clinical trials of the drug, because the data showed that it was crap.

The new BMJ paper includes these unpublished results - it took an inordinate amount of time and pressure to make Pfizer agree to share them, but they eventually did - and we learn that reboxetine is:

  • no more effective than a placebo at treating depression.
  • less effective than SSRIs, which incidentally are better than placebo in this dataset (a bit).
  • worse tolerated than most SSRIs, and much worse tolerated than placebo.
The one faint glimmer of hope that it's not a complete dud was that it did seem to work better than placebo in depressed inpatients. However, this could well have been a fluke, because the numbers involved were tiny: there was one trial showing a humongous benefit in inpatients, but it only had a total of 52 people.)

Claims that reboxetine is dangerous on the basis of this study are a bit misleading - it may be, but there was no evidence for that in these data. It caused nasty and annoying side-effects, but that's not the same thing, because if you don't like side-effects, you could just stop taking it (which is what many people in these trials did).

Anyway, what are the lessons of this sorry tale, beyond reboxetine being rubbish? The main one is: we have to start forcing drug companies and other researchers to publish the results of clinical trials, whatever the results are. I've discussed this previously and suggested one possible way of doing that.

The situation regarding publication bias is far better than it was 10 years ago, thanks to initiatives such as clinicaltrials.gov; almost all of the reboxetine trials were completed before the year 2000; if they were run today, it would have been much harder to hide them, but still not impossible, especially in Europe. We need to make it impossible, everywhere, now.

The other implication is, ironically, good news for antidepressants - well, except reboxetine. The existence of reboxetine, a drug which has lots of side effects, but doesn't work, is evidence against the theory (put forward by Joanna Moncrieff, Irving Kirsch and others) that even the antidepressants that do seem to work, only work because of active placebo effects driven by their side effects.

So given that reboxetine had more side effects than SSRIs, it ought to have worked better, but actually it worked worse. This is by no means the nail in the coffin of the active placebo hypothesis but it is, to my mind, quite convincing.

Link: This study also blogged by Good, Bad and Bogus.

ResearchBlogging.orgEyding, D., Lelgemann, M., Grouven, U., Harter, M., Kromp, M., Kaiser, T., Kerekes, M., Gerken, M., & Wieseler, B. (2010). Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials BMJ, 341 (oct12 1) DOI: 10.1136/bmj.c4737

Marc Hauser's Scapegoat?

The dust is starting to settle after the Hauser-gate scandal which rocked psychology a couple of weeks back.

Harvard Professor Marc Hauser has been investigated by a faculty committee and the verdict was released on the 20th August: Hauser was "found solely responsible... for eight instances of scientific misconduct." He's taking a year's "leave", his future uncertain.

Unfortunately, there has been no official news on what exactly the misconduct was, and how much of Hauser's work is suspect. According to Harvard, only three publications were affected: a 2002 paper in Cognition, which has been retracted; a 2007 paper which has been "corrected" (see below), and another 2007 Science paper, which is still under discussion.

But what happened? Cognition editor Gerry Altmann writes that he was given access to some of the Harvard internal investigation. He concludes that Hauser simply invented some of the crucial data in the retracted 2002 paper.

Essentially, some monkeys were supposed to have been tested on two conditions, X and Y, and their responses were videotaped. The difference in the monkey's behaviour between the two conditions was the scientifically interesting outcome.

In fact, the videos of the experiment showed them being tested only on condition X. There was no video evidence that condition Y was even tested. The "data" from condition Y, and by extension the differences, were, apparently, simply made up.

If this is true, it is, in Altmann's words, "the worst form of academic misconduct." As he says, it's not quite a smoking gun: maybe tapes of Y did exist, but they got lost somehow. However, this seems implausible. If so, Hauser would presumably have told Harvard so in his defence. Yet they found him guilty - and Hauser retracted the paper.

So it seems that either Hauser never tested the monkeys on condition B at all, and just made up the data, or he did test them, saw that they weren't behaving the "right" way, deleted the videos... and just made up the data. Either way it's fraud.

Was this a one-off? The Cognition paper is the only one that's been retracted. But another 2007 paper was "replicated", with Hauser & a colleague recently writing:

In the original [2007] study by Hauser et al., we reported videotaped experiments on action perception with free ranging rhesus macaques living on the island of Cayo Santiago, Puerto Rico. It has been discovered that the video records and field notes collected by the researcher who performed the experiments (D. Glynn) are incomplete for two of the conditions.
Luckily, Hauser said, when he and a colleague went back to Puerto Rico and repeated the experiment, they found "the exact same pattern of results" as originally reported. Phew.

This note, however, was sent to the journal in July, several weeks before the scandal broke - back when Hauser's reputation was intact. Was this an attempt by Hauser to pin the blame on someone else - David Glynn, who worked as a research assistant in Hauser's lab for three years, and has since left academia?

As I wrote in my previous post:
Glynn was not an author on the only paper which has actually been retracted [the Cognition 2002 paper that Altmann refers to]... according to his resume, he didn't arrive in Hauser's lab until 2005.
Glynn cannot possibly have been involved in the retracted 2002 paper. And Harvard's investigation concluded that Hauser was "solely responsible", remember. So we're to believe that Hauser, guilty of misconduct, was himself an innocent victim of some entirely unrelated mischief in 2007 - but that it was all OK in the end, because when Hauser checked the data, it was fine.

Maybe that's what happened. I am not convinced.

Personally, if I were David Glynn, I would want to clear my name. He's left science, but still, a letter to a peer reviewed journal accuses him of having produced "incomplete video records and field notes", which is not a nice thing to say about someone.

Hmm. On August 19th, the Chronicle of Higher Education ran an article about the case, based on a leaked Harvard document. They say that "A copy of the document was provided to The Chronicle by a former research assistant in the lab who has since left psychology."

Hmm. Who could blame them for leaking it? It's worth remembering that it was a research assistant in Hauser's lab who originally blew the whistle on the whole deal, according to the Chronicle.

Apparently, what originally rang alarm bells was that Hauser appeared to be reporting monkey behaviours which had never happened, according to the video evidence. So at least in that case, there were videos, and it was the inconsistency between Hauser's data and the videos that drew attention. This is what makes me suspect that maybe there were videos and field notes in every case, and the "inconvenient" ones were deleted to try to hide the smoking gun. But that's just speculation.

What's clear is that science owes the whistle-blowing research assistant, whoever it is, a huge debt.

Hauser Of Cards

Update: Lots of stuff has happened since I wrote this post: see here for more.

A major scandal looks to be in progress involving Harvard Professor Marc Hauser, a psychologist and popular author whose research on the minds of chimpanzees and other primates is well-known and highly respected. The Boston Globe has the scoop and it's well worth a read (though you should avoid reading the comments if you react badly to stupid.)

Hauser's built his career on detailed studies of the cognitive abilities of non-human primates. He's generally argued that our closest relatives are smarter than people had previously believed, with major implications for evolutionary psychology. Now one of his papers has been retracted, another has been "corrected" and a third is under scrutiny. Hauser has also announced that he's taking a year off from his position at Harvard.

It's not clear what exactly is going on, but the problems seem to centre around videotapes of the monkeys that took part in Hauser's experiments. The story begins with a 2007 paper published in Proceedings of the Royal Society B. That paper has just been amended in a statement that appeared in the same journal last month:

In the original study by Hauser et al., we reported videotaped experiments on action perception with free ranging rhesus macaques living on the island of Cayo Santiago, Puerto Rico. It has been discovered that the video records and field notes collected by the researcher who performed the experiments (D. Glynn) are incomplete for two of the conditions.
The authors of the original paper were Hauser, David Glynn and Justin Wood. In the amendment, which is authored by Hauser and Wood i.e. not Glynn, they say that upon discovering the issues with Glynn's data, they went back to Puerto Rico, did the studies again, and confirmed that the original results were valid. Glynn left academia in 2007, to work for a Boston company, Innerscope Research, according to this online resume.

If that was the whole of the scandal it wouldn't be such a big deal, but according to the Boston Globe, that was just the start. David Glynn was also an author on a second paper which is now under scrutiny. It was published in Science 2007, with the authors listed as Wood, Glynn, Brenda Phillips and Hauser.

However, crucially, Glynn was not an author on the only paper which has actually been retracted, "Rule learning by cotton-top tamarins". This appeared in the journal Cognition in 2002. The three authors were Hauser, Daniel Weiss and Gary Marcus. David Glynn wasn't mentioned in the acknowledgements section either, and according to his resume, he didn't arrive in Hauser's lab until 2005.

So the problem, whatever it is, is not limited to Glynn.

Not was Glynn an author on the final paper mentioned in the Boston Globe, a 1995 article by Hauser, Kralik, Botto-Mahan, Garrett, and Oser. Note that the Globe doesn't say that this paper is formally under investigation, but rather, that it was mentioned in an interview by researcher Gordon G. Gallup who says that when he viewed the videotapes of the monkeys from that study, he didn't observe the behaviours which Hauser et al. said were present. Gallup is famous for his paper "Does Semen Have Antidepressant Properties?" in which he examined the question of whether semen... oh, guess.

The crucial issue for scientists is whether the problems are limited to the three papers that have so far been officially investigated or whether it goes further: that's an entirely open question right now.

In Summary: We don't know what is going on here and it would be premature to jump to conclusions. However, the only author who appears on all of the papers known to be under scrutiny, is Marc Hauser himself.

ResearchBlogging.orgHauser MD, Weiss D, & Marcus G (2002). Rule learning by cotton-top tamarins. Cognition, 86 (1) PMID: 12208654

Hauser MD, Glynn D, & Wood J (2007). Rhesus monkeys correctly read the goal-relevant gestures of a human agent. Proceedings. Biological sciences / The Royal Society, 274 (1620), 1913-8 PMID: 17540661

Wood JN, Glynn DD, Phillips BC, & Hauser MD (2007). The perception of rational, goal-directed action in nonhuman primates. Science (New York, N.Y.), 317 (5843), 1402-5 PMID: 17823353

Hauser MD, Kralik J, Botto-Mahan C, Garrett M, & Oser J (1995). Self-recognition in primates: phylogeny and the salience of species-typical features. Proceedings of the National Academy of Sciences of the United States of America, 92 (23), 10811-14 PMID: 7479889

DSM-V: Change We Can Believe In?

So the draft of DSM-V is out.

If, as everyone says, the Diagnostic and Statistical Manual is the Bible of Psychiatry, I'm not sure why it gets heavily edited once every ten years or so. Perhaps the previous versions are a kind of Old Testament, and only the current one represents the New Revelation from the gods of the mind?

Mind Hacks has an excellent summary of the proposed changes. Bear in mind that the book won't be released until 2013. Some of the headlines:

  • Asperger's Syndrome is out - everyone's going to have an "autistic spectrum disorder" now.
  • Personality Disorders are out - kind of. In their place, there's 5 Personality Disorder Types, each of which you can have to varying degrees, and also 6 Personality Traits, each of which you can have to varying degrees.
  • Hypoactive Sexual Desire Disorder - the disease which failed-antidepressant-turned-aphrodisiac flibanserin is supposed to treat - is out, to be replaced by Sexual Interest and Arousal Disorder.
  • Binge Eating Disorder, Hypersexuality Disorder, and Gambling Addiction are in. Having Fun is not a disorder yet, but that's on the agenda for DSM-VI.
More important, at least in theory, are the Structural, Cross-Cutting, and General Classification Issues. This is where the grand changes to the whole diagnostic approach happen. But it turns out they're pretty modest. First up, the Axis system, by which most disorders were "Axis I", personality disorders which were "Axis II", and other medical illnesses "Axis III", is to be abolished - everything will be on a single Axis from now on. This will have little, if any, practical effect, but will presumably make it easier on whoever it is that has to draw up the contents page of the book.

Excitingly, "dimensional assessments" have been added... but only in a limited way. Some people have long argued that having categorical diagnoses - "schizophrenia", "bipolar disorder", "major depression" etc. - is a mistake, since it forces psychiatrists to pigeon-hole people, and that we should stop thinking in terms of diagnoses and just focus on symptoms: if someone's depressed, say, then treat them for depression, but don't diagnose them with "major depressive disorder".

DSM-V hasn't gone this far - the categorical diagnoses remain in most cases (the exception is Personality Disorders, see above). However, new dimensional assessments have been proposed, which are intended to complement the diagnoses, and some of them will be "cross-cutting" i.e. not tied to one particular diagnosis. See for example here for a cross-cutting questionnaire designed to assess common anxiety, depression and substance abuse symptoms.

Finally, the concept of "mental disorder" is being redefined. In DSM-V a mental disorder is (drumroll)...
A. A behavioral or psychological syndrome or pattern that occurs in an individual

B. The consequences of which are clinically significant distress (e.g., a painful symptom) or disability (i.e., impairment in one or more important areas of functioning)

C. Must not be merely an expectable response to common stressors and losses...

D. That reflects an underlying psychobiological dysfunction

E. That is not primarily a result of social deviance or conflicts with society
The main change here is that now it's all about "psychobiological dysfunction", whereas in DSM-IV, it was about "behavioral, psychological, or biological dysfunction". Hmm. I am not sure what this means, if anything.

But read on, and we find something rather remarkable...
J. When considering whether to add a mental/psychiatric condition to the nomenclature, or delete a mental/psychiatric condition from the nomenclature, potential benefits (for example, provide better patient care, stimulate new research) should outweigh potential harms (for example, hurt particular individuals, be subject to misuse)
This all sounds very nice and sensible. Diagnoses should be helpful, not harmful, right?

No. Diagnoses should be true. The whole point of the DSM is that it's supposed to be an accurate list of the mental diseases that people can suffer from. The diagnoses are in there because they are, in some sense, real, objectively-existing disorders, or at least because the American Psychiatric Association thinks that they are.

This seemingly-innocuous paragraph seems to be an admission that, in fact, disorders are added or subtracted for reasons which have little to do with whether they really, objectively exist or not. This is what's apparently happened in the case of Temper Dysregulation Disorder with Dysphoria (TDDD), a new childhood disorder.

TDDD has been proposed in order to reduce the number of children being diagnosed with pediatric bipolar disorder. The LA Times quote a psychiatrist on the DSM-V team:
The diagnosis of bipolar [in children] "is being given, we believe, too frequently," said Dr. David Shaffer, a member of the work group on disorders in childhood and adolescence. In reality, when such children are tracked into adulthood, very few of them turn out to be bipolar, he said.
And the DSM-V website has a lengthy rationale for TDDD, to the same effect.

Now, many people agree that pediatric bipolar is being over-diagnosed. As I've written before, pediatric bipolar was considered to be a vanishingly rare disease until about 10 years ago, it still is pretty much everywhere outside the USA.

So we can all sympathize with the sentiment behind TDDD - but this is fighting fire with fire. Is the only way to stop kids getting one diagnosis, to give them another one? Should we really be creating diagnoses for more or less "strategic" purposes? When the time comes for DSM-VI, and the fashion for "pediatric bipolar" has receded, will TDDD get deleted as no longer necessary? What will happen to all the "TDDD" kids then?

Can't we just decide to diagnose people less? Apparently, that would be a rather too radical change...

Deconstructing the Placebo

Last month Wired, announced that Placebos Are Getting More Effective. Drugmakers Are Desperate to Know Why.

The article's a good read, and the basic story is true, at least in the case of psychiatric drugs. In clinical trials, people taking placebos do seem to get better more often now than in the past (paper). This is a big problem for Big Pharma, because it means that experimental new drugs often fail to perform better than placebo, i.e. they don't work. Wired have just noticed this, but it's been being discussed in the academic literature for several years.

Why is this? No-one knows. There have been many suggestions - maybe people "believe in" the benefits of drugs more nowadays, so the placebo effect is greater; maybe clinical trials are recruiting people with milder illnesses that respond better to placebo, or just get better on their own. But we really don't have any clear idea.

What if the confusion is because of the very concept of the "placebo"? Earlier this year, the BMJ ran a short opinion piece called It’s time to put the placebo out of our misery. Robin Nunn wants us to "stop thinking in terms of placebo...The placebo construct conceals more than it clarifies."

His central argument is an analogy. If we knew nothing about humour and observed a comedian telling jokes to an audience, we might decide there was a mysterious "audience effect" at work, and busy ourselves studying it...
Imagine that you are a visitor from another world. You observe a human audience for the first time. You notice a man making vocal sounds. He is watched by an audience. Suddenly they burst into smiles and laughter. Then they’re quiet. This cycle of quietness then laughter then quietness happens several times.

What is this strange audience effect? Not all of the man’s sounds generate an audience effect, and not every audience member reacts. You deem some members of the audience to be “audience responders,” those who are particularly influenced by the audience effect. What makes them react? A theory of the audience effect could be spun into an entire literature analogous to the literature on the placebo effect.
But what we should be doing is examining the details of jokes and of laughter -
We could learn more about what makes audiences laugh by returning to fundamentals. What is laughter? Why is “fart” funnier than “flatulence”? Why are some people just not funny no matter how many jokes they try?
And this is what we should be doing with the "placebo effect" as well -
Suppose there is no such unicorn as a placebo. Then what? Just replace the thought of placebo with something more fundamental. For those who use placebo as treatment, ask what is going on. Are you using the trappings of expertise, the white coat and diploma? Are you making your patients believe because they believe in you?
Nunn's piece is a polemic and he seems to be conclude by calling for a "post-placebo era" in which there will be no more placebo-controlled trials (although it's not clear what he means by this). This is going too far. But his analogy with humour is an important one because it forces us to analyse the placebo in detail.

"The placebo effect" has become a vague catch-all term for anything that seems to happen to people when you give them a sugar pill. Of course, lots of things could happen. They could feel better just because of the passage of time. Or they could realize that they're supposed to feel better and say they feel better, even if they don't.

The "true" placebo effect refers to improvement (or worsening) of symptoms driven purely by the psychological expectation of such. But even this is something of a catch-all term. Many things could drive this improvement. Suppose you give someone a placebo pill that you claim will make them more intelligent, and they believe it.

Believing themselves to be smarter, they start doing smart things like crosswords, math puzzles, reading hard books (or even reading Neuroskeptic), etc. But the placebo itself was just a nudge in the right direction. Anything which provided that nudge would also have worked - and the nudge itself can't take all the credit.

The strongest meaning of the "placebo effect" is a direct effect of belief upon symptoms. You give someone a sugar pill or injection, and they immediately feel less pain, or whatever. But even this effect encompasses two kinds of things. It's one thing if the original symptoms have a "real" medical cause, like a broken leg. But it's another thing if the original symptoms are themselves partially or wholly driven by psychological factors, i.e. if they are "psychosomatic".

If a placebo treats a "psychosomatic" disease, then that's not because the placebo has some mysterious, mind-over-matter "placebo effect". All the mystery, rather, lies with the psychosomatic disease. But this is a crucial distinction.

People seem more willing to accept the mind-over-matter powers of "the placebo" than they are to accept the existence of psychosomatic illness. As if only doctors with sugar pills possess the power of suggestion. If a simple pill can convince someone that they are cured, surely the modern world in all its complexity could convince people that they're ill.

[BPSDB]

ResearchBlogging.orgNunn, R. (2009). It's time to put the placebo out of our misery BMJ, 338 (apr20 2) DOI: 10.1136/bmj.b1568

fMRI Gets Slap in the Face with a Dead Fish

A reader drew my attention to this gem from Craig Bennett, who blogs at prefrontal.org:

Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction

This is a poster presented by Bennett and colleagues at this year's Human Brain Mapping conference. It's about fMRI scanning on a dead fish, specifically a salmon. They put the salmon in an MRI scanner and "the salmon was shown a series of photographs depicting human individuals in social situations. The salmon was asked to determine what emotion the individual in the photo must have been experiencing."

I'd say that this research was justified on comedic grounds alone, but they were also making an important scientific point. The (fish-)bone of contention here is multiple comparisons correction. The "multiple comparisons problem" is simply the fact that if you do a lot of different statistical tests, some of them will, just by chance, give interesting results.

In fMRI, the problem is particularly severe. An MRI scan divides the brain up into cubic units called voxels. There are over 40,000 in a typical scan. Most fMRI analysis treats every voxel independently, and tests to see if each voxel is "activated" by a certain stimulus or task. So that's at least 40,000 separate comparisons going on - potentially many more, depending upon the details of the experiment.

Luckily, during the 1990s, fMRI pioneers developed techniques for dealing with the problem: multiple comparisons correction. The most popular method uses Gaussian Random Field Theory to calculate the probability of falsely "finding" activated areas just by chance, and to keep this acceptably low (details), although there are other alternatives.

But not everyone uses multiple comparisons correction. This is where the fish comes in - Bennett et al show that if you don't use it, you can find "neural activation" even in the tiny brain of dead fish. Of course, with the appropriate correction, you don't. There's nothing original about this, except the colourful nature of the example - but many fMRI publications still report "uncorrected" results (here's just the last one I read).

Bennett concludes that "the vast majority of fMRI studies should be utilizing multiple comparisons correction as standard practice". But he says on his blog that he's encountered some difficulty getting the results published as a paper, because not everyone agrees. Some say that multiple comparisons correction is too conservative, and could lead to genuine activations being overlooked - throwing the baby salmon out with the bathwater, as it were. This is a legitimate point, but as Bennett says, in this case we should report both corrected and uncorrected results, to make it clear to the readers what is going on.

Of Carts and Horses

Last week, I wrote about a paper finding that the mosquito repellent chemical, DEET, inhibits an important enzyme, cholinesterase. If DEET were toxic to humans, this finding might explain why.
But it isn't - tens of millions of people use DEET safely every year, and there's no reason to think that it is dangerous unless it's used completely inappropriately. That didn't stop this laboratory finding being widely reported as a cause for concern about the safety of DEET.

This is putting the cart before the horse. If you know that something happens, then it's appropriate to search for an explanation for it. If you have a phenemonon, then there must be a mechanism by which it occurs.

But this doesn't work in reverse: just because you have a plausible mechanism by which something could happen, doesn't mean that it does in fact happen. This is because there are always other mechanisms at work which you may not know about. And the effect of your mechanism may be trivial by comparison.

Caffeine can damage DNA under some conditions. Other things which damage DNA, like radiation, can cause cancer. But the clinical evidence is that, if anything, drinking coffee may protect against some kinds of cancer (previous post). There's a plausible mechanism by which coffee could cause cancer, but it doesn't.

Medicine has learned the hard way that while understanding mechanisms is important, it's no substitute for clinical trials. The whole philosophy of evidence-based medicine is that treatments should only be used when there is clinical evidence that they do in fact work.

Unfortunately, in other fields, the horse routinely finds itself behind the cart. An awful lot - perhaps most - of political debate consists of saying that if you do X, Y will happen, through some mechanism. If you legalize heroin, people will take more of it, because it'll be more available and cheaper. If you privatize public services, they'll improve, because competition will ensure that only the most efficient services survive. If you topple this dictator, the country will become a peaceful democracy, because people like peace and democracy. And so on.

These kinds of arguments sound good. And they invite opponents to respond in kind: actually, legalizing heroin is a good idea, because it will make taking it much safer by eliminating impurities and infections... And so the debate becomes a case of fantasizing about things that might happen, with the winner being the person whose fantasy sounds best.

If you want to know what will happen when you implement some policy, the only way of knowing is to look at other countries or other places which have already done it. If no-one else has ever done it, you are making a leap into the unknown. This is not necessarily a bad thing - there's a first time for everything. But it means that "We don't know" should be heard much more often in politics.

In Science, Popularity Means Inaccuracy

Who's more likely to start digging prematurely: one guy with a metal-detector looking for an old nail, or a field full of people with metal-detectors searching for buried treasure?

In any area of science, there will be some things which are more popular than others - maybe a certain gene, a protein, or a part of the brain. It's only natural and proper that some things get of lot of attention if they seem to be scientifically important. But Thomas Pfeiffer and Robert Hoffmann warn in a PLoS One paper that popularity can lead to inaccuracy - Large-Scale Assessment of the Effect of Popularity on the Reliability of Research.

They note two reasons for this. Firstly, popular topics tend to attract interest and money. This means that scientists have much to gain by publishing "positive results" as this allows them to get in on the action -

In highly competitive fields there might be stronger incentives to “manufacture” positive results by, for example, modifying data or statistical tests until formal statistical significance is obtained. This leads to inflated error rates for individual findings... We refer to this mechanism as “inflated error effect”.
Secondly, in fields where there is a lot of research being done, the chance that someone will, just by chance, come up with a positive finding increases -
The second effect results from multiple independent testing of the same hypotheses by competing research groups. The more often a hypothesis is tested, the more likely a positive result is obtained and published even if the hypothesis is false. ... We refer to this mechanism as “multiple testing effect”.
But does this happen in real life? The authors say yes, based on a review of research into protein-protein interactions in yeast. (Happily, you don't need to be a yeast expert to follow the argument.)

There are two ways of trying to find out whether two proteins interact with each other inside cells. You could do a small-scale experiment specifically looking for one particular interaction: say, Protein B with Protein X. Or you can do "high-throughput" screening of lots of proteins to see which ones interact: Does Protein A interact with B, C, D, E... Does Protein B interact with A, C, D, E... etc.

There have been tens of thousands of small-scale experiments into yeast proteins, and more recently, a few high-throughput studies. The authors looked at the small-scale studies and found that the more popular a certain protein was, the less likely it was that reported interactions involving it would be confirmed by high-throughput experiments.

The second and the third of the above graphs shows the effect. Increasing popularity leads to a falling % of confirmed results. The first graph shows that interactions which were replicated by lots of small-scale experiments tended to be confirmed, which is what you'd expect.

Pfeiffer and Hoffmann note that high-throughput studies have issues of their own, so using them as a yardstick to judge the truth of other results is a little problematic. However, they say that the overall trend remains valid.

This is an interesting paper which provides some welcome empirical support to the theoretical argument that popularity could lead to unreliability. Unfortunately, the problem is by no means confined to yeast. Any area of science in which researchers engage in a search for publishable "positive results" is vulnerable to the dangers of publication bias, data cherry-picking, and so forth. Even obscure topics are vulnerable but when researchers are falling over themselves to jump on the latest scientific bandwagon, the problems multiply exponentially.

A recent example may be the "depression gene", 5HTTLPR. Since a landmark paper in 2003 linked it to clinical depression, there has been an explosion of research into this genetic variant. Literally hundreds of papers appeared - it is by far the most studied gene in psychiatric genetics. But a lot of this research came from scientists with little experience or interest in genes. It's easy and cheap to collect a DNA sample and genotype it. People started routinely looking at 5HTTLPR whenever they did any research on depression - or anything related.

But wait - a recent meta-analysis reported that the gene is not in fact linked to depression at all. If that's true (it could well be), how did so many hundreds of papers appear which did find an effect? Pfeiffer and Hoffmann's paper provides a convincing explanation.

Link - Orac also blogged this paper and put a characteristic CAM angle on it.

ResearchBlogging.orgPfeiffer, T., & Hoffmann, R. (2009). Large-Scale Assessment of the Effect of Popularity on the Reliability of Research PLoS ONE, 4 (6) DOI: 10.1371/journal.pone.0005996

Picturing the Brain

You may well have already heard about neuro images, a new blog from Neurophilosophy's Mo. As the name suggests, it's all about pictures of the brain. All of them are very pretty. Some are also pretty gruesome.

But images are, of course, more than decoration. There are dozens of ways of picturing the brain, each illuminating different aspects of neural function. Neuropathologists diagnose diseases by examining tissue under the microscope; using various stains you can visualize normal and abnormal cell types -

FDG-PET scans reveal metabolic activity in different areas, which can be used to diagnose tumors amongst much else -

Egaz Moniz, better known as the inventor of "psychosurgery", pioneered cerebral angiography, a technique for visualizing the blood vessels of the brain using x-rays (this is the view from below) -

And so on. However, for all too many cognitive neuroscientists - e.g. fMRI researchers - the only kind of brain images that matter are MRI scans, traditionally black-and-white with "activity" depicted on top in colour -

fMRI is a powerful technique. But there is much more to the brain than that. Even a casual glance down a microscope reveals that brain tissue is composed of a rich variety of cells, the most numerous of which, glia, do not transmit neural signals - they are not "brain cells" at all. And there are many different types of brain cells, which inhabit distinct layers of the cerebral cortex - the cortex has at least six layers in most places, and different things happen in each one.

The brain, in other words, is a living organ, not a grey canvas across which activity patterns occasionally flash. Of course, no-one denies this, but all too many neuroscientists forget it because in their day-to-day work all they see of the brain is what an MRI scan reveals. This is especially true for those scientists who came to fMRI from a psychology background, many of whom have never studied neurobiology.

Maybe researchers should have to spend a week with a scalpel cutting up an actual brain before they get allowed to use fMRI - this might help to guard against the kind of simplistic "Region X does Y" thinking that plagues the field.

 
powered by Blogger