Tampilkan postingan dengan label bad neuroscience. Tampilkan semua postingan
Tampilkan postingan dengan label bad neuroscience. Tampilkan semua postingan

Depressed or Bereaved? (Part 2)

In Part 1, I discussed a paper by Jerome Wakefield examining the issue of where to draw the line between normal grief and clinical depression.


The line moved in the American Psychiatric Association's DSM diagnostic system when the previous DSM-III edition was replaced by the current DSM-IV. Specifically, the "bereavement exclusion" was made narrower.

The bereavement exclusion says that you shouldn't diagnose depression in someone whose "depressive" symptoms are a result of grief - unless they're particularly severe or prolonged when you should. DSM-IV lowered the bar for "severe" and "prolonged", thus making grief more likely to be classed as depression. Wakefield argued that the change made things worse.

But DSM-V is on its way soon. The draft was put up online in 2010, and it turns out that depression is to have no bereavement exclusion at all. Grief can be diagnosed as depression in exactly the same way as depressive symptoms which come out of the blue.

The draft itself offered just one sentence by way of justification for this. However, big cheese psychiatrist Kenneth S. Kendler recently posted a brief note defending the decision. Wakefield has just published a rather longer paper in response.

Wakefield starts off with a bit of scholarly kung-fu. Kendler says that the precursors to the modern DSM, the 1972 Feighner and 1975 RDC criteria, didn't have a bereavement clause for depression either. But they did - albeit not in the criteria themselves, but in the accompanying how-to manuals; the criteria themselves weren't meant to be self-contained, unlike the DSM. Ouch! And so on.

Kendler's sole substantive argument against the exclusion is that it is "not logically defensible" to exclude depression induced by bereavement, if we don't have a similar provision for depression following other severe loss or traumatic events, like becoming unemployed or being diagnosed with cancer.

Wakefield responds that, yes, he has long made exactly that point, and that in his view we should take the context into account, rather than just looking at the symptoms, in grief and many other cases. However, as he points out, it is better to do this for one class of events (bereavement), than for none at all. He quotes Emerson's famous warning that "A foolish consistency is the hobgoblin of little minds". It's better to be partly right, than consistently wrong.

Personally, I'm sympathetic to Wakefield's argument that the bereavement exclusion should be extended to cover non-bereavement events, but I'm also concerned that this could lead to underdiagnosis if it relied too much on self-report.

The problem is that depression usually feels like it's been caused by something that's happened, but this doesn't mean it was; one of the most insidious features of depression is that it makes things seem much worse than they actually are, so it seems like the depression is an appropriate reaction to real difficulties, when to anyone else, or to yourself looking back on it after recovery, it was completely out of proportion. So it's a tricky one.

Anyway, back to bereavement; Kendler curiously ends up by agreeing that there ought to be a bereavement clause - in practice. He says that just because someone meets criteria for depression does not mean we have to treat them:

...diagnosis in psychiatry as in the rest of medicine provides the possibility but by no means the requirement that treatment be initiated ... a good psychiatrist, on seeing an individual with major depression after bereavement, would start with a diagnostic evaluation.

If the criteria for major depression are met, then he or she would then have the opportunity to assess whether a conservative watch and wait approach is indicated or whether, because of suicidal ideation, major role impairment or a substantial clinical worsening the benefits of treatment outweigh the limitations.
The final sentence is lifted almost word for word from the current bereavement clause, so this seems to be an admission that the exclusion is, after all, valid, as part of the clinical decision-making process, rather than the diagnostic system.

OK, but as Wakefield points out, why misdiagnose people if you can help it? It seems to be tempting fate. Kendler says that a "good psychiatrist" wouldn't treat normal, uncomplicated bereavement as depression. But what about the bad ones? Why on earth would you deliberately make your system such that good psychiatrists would ignore it?

More importantly, scrapping the bereavement criterion would render the whole concept of Major Depression meaningless. Almost everyone suffers grief at some point in their lives. Already, 40% of people meet criteria for depression by age 32, and that's with a bereavement exclusion.

Scrap it and, I don't know, 80% will meet criteria by that age - so the criteria will be useless as a guide to identifying the people who actually have depression as opposed to the ones who have just suffered grief. We're already not far off that point, but this would really take the biscuit.

ResearchBlogging.orgWakefield JC (2011) Should Uncomplicated Bereavement-Related Depression Be Reclassified as a Disorder in the DSM-5? The Journal of nervous and mental disease, 199 (3), 203-8 PMID: 21346493

Boy Without A Cerebellum...Has No Cerebellum

A reader pointed me to this piece:

Boy Without a Cerebellum Baffles Doctors
Argh. This is going to be a bit awkward. So I'll just say at the outset that I have nothing against kids struggling with serious illnesses and I wish them all the best.


The article's about Chase Britton, a boy who apparantly lacks two important parts of the brain: the cerebellum and the pons. Despite this, the article says, Chase is a lovely kid and is determined to be as active as possible.

As I said, I am all in favor of this. However the article runs into trouble is where it starts to argue that "doctors are baffled" by this:

When he was 1 year old, doctors did an MRI, expecting to find he had a mild case of cerebral palsy. Instead, they discovered he was completely missing his cerebellum -- the part of the brain that controls motor skills, balance and emotions.

"That's when the doctor called and didn't know what to say to us," Britton said in a telephone interview. "No one had ever seen it before. And then we'd go to the neurologists and they'd say, 'That's impossible.' 'He has the MRI of a vegetable,' one of the doctors said to us."

Chase is not a vegetable, leaving doctors bewildered and experts rethinking what they thought they knew about the human brain.

They don't say which doctor made the "vegetable" comment but whoever it was deserves to be hit over the head with a large marrow because it's just not true. The cerebellum is more or less a kind of sidekick for the rest of the brain. Although it actually contains more brain cells than the rest of the brain put together (they're really small ones), it's not required for any of our basic functions such as sensation or movement.

Without it, you can still move, because movement commands are initiated in the motor cortex. Such movement is clumsy and awkward (ataxia), because the cerebellum helps to coordinate things like posture and gait, getting the timing exactly right to allow you to move smoothly. Like how your mouse makes it easy and intuitive to move the cursor around the screen.

Imagine if you had no mouse and had to move the cursor with a pair of big rusty iron levers to go left and right, up and down. It would be annoying, but eventually, maybe, you could learn to compensate.

From the footage of Chase alongside the article it's clear that he has problems with coordination, albeit he's gradually learning to be able to move despite them.

Lacking a pons is another kettle of fish however. The pons is part of your brainstem and it controls, amongst other things, breathing. In fact you (or rather your body) can survive perfectly well if the whole of your brain above the pons is removed; only the brainstem is required for vital functions.

So it seems very unlikely that Chase actually lacks a pons. The article claims that scans show that "There is only fluid where the cerebellum and pons should be" but as Steven Novella points out in his post on the case, the pons might be so shrunken that it's not easily visible - at least not in the place it normally is - yet functional remnants could remain.

As for the idea that the case is bafflingly unique, it's not really. There are no less than 6 known types of pontocerebellar hypoplasia caused by different genes; Novella points to a case series of children whose cerebellums seemed to develop normally in the womb, but then degenerated when they were born prematurely, which Chase was.

The article has had well over a thousand comments and has attracted lots of links from religious websites amongst others. The case seems, if you believe the article, to mean that the brain isn't all that important, almost as if there was some kind of immaterial soul at work instead... or at the very least suggesting that the brain is much more "plastic" and changeable than neuroscientists suppose.

Unfortunately, the heroic efforts that Chase has been required to make to cope with his disability suggest otherwise and as I've written before, while neuroplasticity is certainly real it has its limits.

Premature Brain Diagnosis in Japan?

Nature has a disturbing article from their Asian correspondent David Cyranoski: Thought experiment. It's open access.

In brief: a number of top Japanese psychiatrists have started offering a neuroimaging method called NIRS to their patients as a diagnostic tool. They claim that NIRS shows the neural signatures of different mental illnesses.

The technology was approved by the Japanese authorities in April 2009, and since then it's been used on at least 300 patients, who pay $160 for the privilege. However, it's not clear that it works.

To put it mildly.

*

NIRS is Near Infra-Red Spectroscopy. It measures blood flow and oxygenation in the brain. In this respect, it's much like fMRI, but whereas fMRI uses superconducting magnets and quantum wizardry to achieve this, NIRS simply shines a near-infra-red light into the head, and records the light reflected back

It's a lot cheaper and easier than MRI. However, the images it provides are a lot less detailed, and it can only image the surface of the brain. NIRS has a small but growing number of users in neuroscience research; it's especially popular in Japan, for some reason, but it's also found plenty of users elsewhere.

The clinical use of NIRS in psychiatry was pioneered by one Dr Masato Fukuda, and he's been responsible for most of the trials. So what are these trials?

As far as I can see (correct me if I'm wrong), these are all the trials comparing patients and controls that he's been an author on:
There are also a handful of Fukuda's papers in Japanese, which I can't read, but as far as I can tell they're general discussions rather than data papers.

So we have 342 people in all. Actually, a bit less, because some of them were included in more than one study. That's still quite a lot - but there were only 5 panic patients, 30 depressed (including 9 elderly, who may be different), 38 eating disordered and just 17 bipolar in the mix.

And the bipolar people were currently feeling fine, or just a little bit down, at the time of the NIRS. There are quite a lot of other trials from other Japanese groups, but sticking with bipolar disorder as an example, no trials that I could find examined people who were currently ill. The only other two trials, both very small, were in recovered people (1,2).

Given that the whole point of diagnosis is to find out what any given patient has, when they're ill, this matters to every patient. Anyone could be psychotic, or depressed, or eating disordered, or any combination thereof.

Worse yet, in many of these studies the patients were taking medications. In the 2006 depression/bipolar paper, for example, all of the bipolars were on heavy-duty mood stabilizers, mostly lithium; plus a few antipsychotics, and lots of antidepressants. The depressed people were on antidepressants.

There's a deeper problem. Fukuda says that NIRS corresponds with the clinical diagnosis in 80% of cases. Let's assume that's true. Well, if the NIRS agrees with the clinical diagnosis, it doesn't tell us anything we didn't already know. If the NIRS disagrees, who do you trust?

I think you'd have to trust the clinician, because the clinician is the "gold standard" against which the NIRS is compared. Psychiatric diseases are defined clinically. If you had to choose between 80% gold and pure gold, it's not a hard choice.

Now NIRS could, in theory, be better than clinical diagnosis: it could provide more accurate prognosis, and more useful treatment recommendations. That would be cool. But as far as I can see there's absolutely no published evidence on that.

To find out you'd have to compare patients diagnosed with NIRS to patients diagnosed normally - or better, to those randomized to get fake placebo NIRS, like the authors of this trial from last year should have done. To my knowledge, there have been no such tests at all.

*

So what? NIRS is harmless, quick, and $160 is not a lot. Patients like it: “They want some kind of hard evidence,” [Fukuda says], especially when they have to explain absences from work. If it helps people to come to terms with their illness - no mean feat in many cases - what's the problem?

My worry is that it could mean misdiagnosing patients, and therefore mis-treating them. Here's the most disturbing bit of the article:
...when Fukuda calculates his success rates, NIRS results that match the clinical diagnosis are considered a success. If the results don’t match, Fukuda says he will ask the patient and patient’s family “repeatedly” whether they might have missed something — for example, whether a depressed patient whose NIRS examination suggests schizophrenia might have forgotten to mention that he was experiencing hallucinations.
Quite apart from the implication that the 80% success rate might be inflated, this suggests that some dubious clinical decisions might be going on. The first-line treatments for schizophrenia are quite different, and rather less pleasant, than those for depression. A lot of perfectly healthy people report "hallucinations" if you probe hard enough. "Seek, and ye shall find". So be careful what you seek for.

While NIRS is a Japanese speciality, other brain-based diagnostic or "treatment personalization" tools are being tested elsewhere. In the USA, EEG has been proposed by a number of groups. I've been rather critical of these methods, but at least they've done some trials to establish whether this actually improves patient outcomes.

In my view, all of these "diagnostic" or "predictive" tools should be subject to exactly the same tests as treatments are: double blind, randomized, sham-controlled trials.

ResearchBlogging.orgCyranoski, D. (2011). Neuroscience: Thought experiment Nature, 469 (7329), 148-149 DOI: 10.1038/469148a

Retract That Seroxat?

Should a dodgy paper on antidepressants be retracted? And what's scientific retraction for, anyway?


Read all about it in a new article in the BMJ: Rules of Retraction. It's about the efforts of two academics, Jon Jureidini and Leemon McHenry. Their mission - so far unsuccesful - is to get this 2001 paper retracted: Efficacy of paroxetine in the treatment of adolescent major depression.

Jureidini is a member of Healthy Skepticism, a fantastic Australian organization that Neuroskeptic readers have encountered before. They've got lots of detail on the ill-fated "Study 329", including internal drug company documents, here.

So what's the story? Study 329 was a placebo-controlled trial of the SSRI paroxetine (Paxil, Seroxat) in 275 depressed adolescents. The paper concluded: that "Paroxetine is generally well tolerated and effective for major depression in adolescents." It was published in the Journal of the American Academy of Child and Adolescent Psychiatry (JAACAP).

There's two issues here: whether paroxetine worked, and whether it was safe. On safety, the paper concluded that "Paroxetine was generally well tolerated...and most adverse effects were not serious." Technically true, but only because there were so many mild side effects.

In fact, 11 patients on paroxetine reported serious adverse events, including suicidal ideation or behaviour, and 7 were hospitalized. Just 2 patients in the placebo group had such events. Yet we are reassured that "Of the 11, only headache (1 patient) was considered by the treating investigator to be related to paroxetine treatment."

The drug company argue that it didn't become clear that paroxetine caused suicidal ideation in adolescents until after the paper was published. In 2002, British authorities reviewed the evidence and said that paroxetine should not be given in this age group.

That's as maybe; the fact remains that in this paper there was a strongly raised risk. However, in fairness, all that data was there in the paper, for readers to draw their own conclusions from. The paper downplays it, but the numbers are there.


*

The efficacy question is where the allegations of dodgy practices are most convincing. The paper concludes that paroxetine worked, while imipramine, an older antidepressant, didn't.

Jureidini and McHenry say that paroxetine only worked on a few of the outcomes - ways of measuring depression and how much the patients improved. On most of the outcomes, it didn't work, but the paper focusses on the ones where it did. According to the BMJ

Study 329’s results showed that paroxetine was no more effective than the placebo according to measurements of eight outcomes specified by Martin Keller, professor of psychiatry at Brown University, when he first drew up the trial.

Two of these were primary outcomes...the drug also showed no significant effect for the initial six secondary outcome measures. [it] only produced a positive result when four new secondary outcome measures, which were introduced following the initial data analysis, were used... Fifteen other new secondary outcome measures failed to throw up positive results.

Here's the worst example. In the original protocol, two "primary" endpoints were specified: the change in the total Hamilton Scale (HAMD) score, and % of patients who 'responded', defined as either an improvement of more than 50% of their starting HAMD score or a final HAMD of 8 or below.

On neither of these measures did paroxetine work better than placebo at the p=0.05 significance level. It did work if you defined 'responded' to mean only a final HAMD of 8 or below, but this was not how it was defined in the protocol. In fact, the Methods section of the paper follows the protocol faithfully. Yet in the Results section, the authors still say that:
Of the depression-related variables, paroxetine separated statistically from placebo at endpoint among four of the parameters: response (i.e., primary outcome measure)...
It may seem like a subtle point. But it's absolutely crucial. Paroxetine just did not work on either pre-defined primary outcome measure, and the paper says that it did.

Finally, there were also issues of ghostwriting. I've never been that concerned by this in itself. If the science is bad, it's bad whoever wrote it. Still, it's hardly a good thing.

*

Does any of this matter? In one sense, no. Authorities have told doctors not to use paroxetine in adolescents with depression since 2002 (in the UK) and 2003 (in the USA). So retracting this paper wouldn't change much in the real world of treatment.

But in another sense, the stakes are enormous. If this paper were retracted, it would set a precedent and send a message: this kind of p-value fishing to get positive results, is grounds for retraction.

This would be huge, because this kind of fishing is sadly very common. Retracting this paper would be saying: selective outcome reporting is a form of misconduct. So this debate is really not about Seroxat, but about science.


There are no Senates or Supreme Courts in science. However, journal editors are in a unique position to help change this. They're just about the only people (grant awarders being the others) who have the power to actually impose sanctions on scientists. They have no official power. But they have clout.

Were the JAACAP to retract this paper, which they've so far said they have no plans to do, it would go some way to making these practices unacceptable. And I think no-one can seriously disagree that they should be unacceptable, and that science and medicine would be much better off if they were. Do we want more papers like this, or do we want fewer?

So I think the question of whether to retract or not boils down to whether it's OK to punish some people "to make an example of them", even though we know of plenty of others who have done the same, or worse, and won't be punished.

My feeling is: no, it's not very fair, but we're talking about multi-billion pound companies and a list of authors whose high-flying careers are not going to crash and burn just because one paper from 10 years ago gets pulled. If this were some poor 24 year old's PhD thesis, it would be different, but these are grown-ups who can handle themselves.

So I say: retract.

ResearchBlogging.orgNewman, M. (2010). The rules of retraction BMJ, 341 (dec07 4) DOI: 10.1136/bmj.c6985

Keller MB, et al. (2001). Efficacy of paroxetine in the treatment of adolescent major depression: a randomized, controlled trial. Journal of the American Academy of Child and Adolescent Psychiatry, 40 (7), 762-72 PMID: 11437014

Delusions of Gender

Note: This book quotes me approvingly, so this is not quite a disinterested review.

Cordelia Fine's Delusions of Gender is an engaging, entertaining and powerfully argued reply to the many authors - who range from the scientifically respectable to the less so - who've recently claimed to have shown biological sex differences in brain, mind and behaviour.

Fine makes a strong case that the sex differences we see, in everything from behaviour to school achievements in mathematics, could be caused by the society in which we live, rather than by biology. Modern culture, she says, while obviously less sexist than in the past, still contains deeply entrenched assumptions about how boys and girls ought to behave, what they ought to do and what they're good at, and these - consciously or unconsciously - shape the way we are.

Some of the Fine's targets are obviously bonkers, like Vicky Tuck, but for me, the most interesting chapters were those dealing in detail with experiments which have been held up as the strongest examples of sex differences, such as the Cambridge study claiming that newborn boys and girls differ in how much they prefer looking at faces as opposed to mechanical mobiles.

But Delusions is not, in Steven Pinker's phrase, saying we ought to return to "Blank Slatism", and it doesn't try to convince you that every single sex difference definately is purely cultural. It's more modest, and hence, much more believable: simply a reminder that the debate is still an open one.

Fine makes a convincing case (well, it convinced me) that the various scientific findings, mostly from the past 10 years, that seem to prove biological differences, are not, on the whole, very strong, and that even if we do accept their validity, they don't rule out a role for culture as well.

This latter point is, I think, especially important. Take, for example, the fact that in every country on record, men roughly between the ages of 16-30 are responsible for the vast majority of violent crimes. This surely reflects biology somehow; whether it's the fact that young men are physically the strongest people, or whether it's more psychological, is by the by.

But this doesn't mean that young men are always violent. In some countries, like Japan, violent crime is extremely rare; in other countries, it's tens of times more common; and during wars or other periods of disorder, it becomes the norm. Young men are always, relatively speaking, the most violent but the absolute rate of violence varies hugely, and that has nothing to do with gender. It's not that violent places have more men than peaceful ones.

Gender, in other words, doesn't explain violence in any useful way - even though there surely are gender differences. The same goes for everything else: men and women may well have, for biological reasons, certain tendencies or advantages, but that doesn't automatically explain (and it doesn't justify) all of the sex differences we see today; it's only ever a partial explanation, with culture being the other part.

How To Fool A Lie Detector Brain Scan

Can fMRI scans be used to detect deception?

It would be nice, although a little scary, if they could. And there have been several reports of succesful trials under laboratory conditions. However, a new paper in Neuroimage reveals an easy way of tricking the technology: Lying In The Scanner.

The authors used a variant of the "guilty knowledge test" which was originally developed for use with EEG. Essentially, you show the subject a series of pictures or other stimui, one of which is somehow special; maybe it's a picture of the murder weapon or something else which a guilty person would recognise, but the innocent would not.

You then try to work out whether the subject's brain responds differently to the special target stimulus as opposed to all the other irrelevant ones. In this study, the stimuli were dates, and for the "guilty" volunteers, the "murder weapon" was their own birthday, a date which obviously has a lot of significance for them. For the "innocent" people, all the dates were random.

What happened? The scans were extremely good at telling the "guilty" from the "innocent" people - it managed a 100% accuracy with no false positive or false negatives. The image above shows the activation associated with the target stimulus (birthdays) over and above the control stimuli. In two seperate groups of volunteers, the blobs were extremely similar. So the technique does work in principle, which is nice.

But the countermeasures fooled it entirely, reducing accuracy to well below random chance. And the countermeasures were very simple: before the scan, subjects were taught to associate an action, a tiny movement of one of their fingers or toes, with some of the "irrelevant" dates. This, of course, made these dates personally relevant, just like the really relevant stimuli, so there was no difference between them, making the "guilty" appear "innocent".

Maybe it'll be possible in the future to tell the difference between brain responses to really significant stimuli as opposed to artifical ones, or at least, to work out whether or not someone is using this trick. Presumably, if there's a neural signiture for guilty knowledge, there's also one for trying to game the system. But as it stands, this is yet more evidence that lie detection using fMRI is by no means ready for use in the real world just yet...

ResearchBlogging.orgGanis G, Rosenfeld JP, Meixner J, Kievit RA, & Schendan HE (2010). Lying in the scanner: Covert countermeasures disrupt deception detection by functional magnetic resonance imaging. NeuroImage PMID: 21111834

Brain Scans Prove That The Brain Does Stuff

According to the BBC (and many others)...

Libido problems 'brain not mind'

Scans appear to show differences in brain functioning in women with persistently low sex drives, claim researchers.

The US scientists behind the study suggest it provides solid evidence that the problem can have a physical origin.

The research in question (which hasn't been published yet) has been covered very well over at The Neurocritic. Basically the authors took some women with a diagnosis of "Hypoactive Sexual Desire Disorder" (HSDD), and some normal women, put them in an fMRI scanner and showed them porn. Different areas of the brain lit up.

So what? For starters we have no idea if these differences are real or not because the study only had a tiny 7 normal women, although strangely, it included a full 19 women with HSDD. Maybe they had difficulty finding women with healthy appetites in Detroit?

Either way, a study is only as big as its smallest group so this was tiny. We're also not told anything about the stats they used so for all we know they could have used the kind that give you "results" if you use them on a dead fish.

But let's grant that the results are valid. This doesn't tell us anything we didn't already know. We know the women differ in their sexual responses - because that's the whole point of the study. And we know that this must be something to do with their brain, because the brain is where sexual responses, and every other mental event, happ
en.

So we already know that HSDD "has a physical origin", but only in the sense that everything does; being a Democrat or a Republican has a physical origin; being Christian or Muslim has a physical origin; speaking French as opposed to English has a physical origin; etc. etc.
None of which is interesting or surprising in the slightest.

The point is that the fact that something is physical doesn't stop it being also psychological. Because psychology happens in the brain. Suppose you see a massive bear roaring and charging towards you, and as a result, you feel scared. The fear has a physical basis, and plenty of physical correlates like raised blood pressure, adrenaline release, etc.

But if someone asks "Why are you scared?", you would answer "Because there's a bear about to eat us", and you'd be right. Someone who came along and said, no, your anxiety is purely physical - I can measure all these physiological differences between you and a normal person - would be an idiot (and eaten).

Now sometimes anxiety is "purely physical" i.e. if you have a seizure which affects certain parts of the temporal lobe, you may experience panic and anxiety as a direct result of the abnormal brain activity. In that case the fear has a physiological cause, as well as a physiological basis.

Maybe "HSDD" has a physiological cause. I'm sure it sometimes does; it would be very weird if it didn't in some cases because physiology can cause all kinds of problems. But fMRI scans don't tell us anything about that.

Link: I've written about HSDD before in the context of flibanserin, a drug which was supposed to treat it (but didn't). Also, as always, British humour website The Daily Mash hit this one on the head.
..

Worst. Antidepressant. Ever.

Reboxetine is an antidepressant. Except it's not, because it doesn't treat depression.

This is the conclusion of a much-publicized article just out in the BMJ: Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and SSRI controlled trials.

Reboxetine was introduced to some fanfare, because its mechanism of action is unique - it's a selective norepinephrine reuptake inhibitor (NRI), which has no effect on serotonin, unlike Prozac and other newer antidepressants. Several older tricyclic antidepressants were NRIs, but they weren't selective because they also blocked a shed-load of receptors.

So in theory reboxetine treats depression while avoiding the side effects of other drugs, but last year, Cipriani et al in a headline-grabbing meta-analysis concluded that in fact it's the exact opposite: reboxetine was the least effective new antidepressant, and was also one of the worst in terms of side effects. Oh dear.

And that was only based on the published data. It turns out that Pfizer, the manufacturers of reboxetine, had chosen to not publish the results of most of their clinical trials of the drug, because the data showed that it was crap.

The new BMJ paper includes these unpublished results - it took an inordinate amount of time and pressure to make Pfizer agree to share them, but they eventually did - and we learn that reboxetine is:

  • no more effective than a placebo at treating depression.
  • less effective than SSRIs, which incidentally are better than placebo in this dataset (a bit).
  • worse tolerated than most SSRIs, and much worse tolerated than placebo.
The one faint glimmer of hope that it's not a complete dud was that it did seem to work better than placebo in depressed inpatients. However, this could well have been a fluke, because the numbers involved were tiny: there was one trial showing a humongous benefit in inpatients, but it only had a total of 52 people.)

Claims that reboxetine is dangerous on the basis of this study are a bit misleading - it may be, but there was no evidence for that in these data. It caused nasty and annoying side-effects, but that's not the same thing, because if you don't like side-effects, you could just stop taking it (which is what many people in these trials did).

Anyway, what are the lessons of this sorry tale, beyond reboxetine being rubbish? The main one is: we have to start forcing drug companies and other researchers to publish the results of clinical trials, whatever the results are. I've discussed this previously and suggested one possible way of doing that.

The situation regarding publication bias is far better than it was 10 years ago, thanks to initiatives such as clinicaltrials.gov; almost all of the reboxetine trials were completed before the year 2000; if they were run today, it would have been much harder to hide them, but still not impossible, especially in Europe. We need to make it impossible, everywhere, now.

The other implication is, ironically, good news for antidepressants - well, except reboxetine. The existence of reboxetine, a drug which has lots of side effects, but doesn't work, is evidence against the theory (put forward by Joanna Moncrieff, Irving Kirsch and others) that even the antidepressants that do seem to work, only work because of active placebo effects driven by their side effects.

So given that reboxetine had more side effects than SSRIs, it ought to have worked better, but actually it worked worse. This is by no means the nail in the coffin of the active placebo hypothesis but it is, to my mind, quite convincing.

Link: This study also blogged by Good, Bad and Bogus.

ResearchBlogging.orgEyding, D., Lelgemann, M., Grouven, U., Harter, M., Kromp, M., Kaiser, T., Kerekes, M., Gerken, M., & Wieseler, B. (2010). Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials BMJ, 341 (oct12 1) DOI: 10.1136/bmj.c4737

Genes for ADHD, eh?

The first direct evidence of a genetic link to attention-deficit hyperactivity disorder has been found, a study says.
Wow! That's the headline. What's the real story?

The research was published in The Lancet, and it's brought to you by Wilson et al from Cardiff University: Rare chromosomal deletions and duplications in attention-deficit hyperactivity disorder.

The authors looked at copy-number variations (CNVs) in 410 children with ADHD, compared to 1156 healthy controls. A CNV is simply a catch-all term for when a large chunk of DNA is either missing ("deletions") or repeated ("duplications"), compared to normal human DNA. CNVs are extremely common - we all have a handful - and recently there's been loads of interest in them as possible causes for psychiatric disorders.

What happened? Out of everyone with high quality data available, 15.6% of the ADHD kids had at least one large, rare CNV, compared to 7.5% of the controls. CNVs were especially common in children with ADHD who also suffered mental retardation (defined as having an IQ less than 70) - 36% of this group carried at least one CNV. However, the rate was still elevated in those with normal IQs (11%).

A CNV could occur anywhere in the genome, and obviously what it does depends on where it is - which genes are deleted, or duplicated. Some CNVs don't cause any problems, presumably because they don't disrupt any important stuff.

The ADHD variants were very likely to affect genes which had been previously linked to either autism, or schizophrenia. In fact, no less than 6 of the ADHD kids carried the same 16p13.11 duplication, which has been found in schizophrenic patients too.

So...what does this mean? Well, the news has been full of talking heads only too willing to tell us. Pop-psychologist Oliver James was on top form - by his standards - making a comment which was reasonably sensible, and only involved one error:
Only 57 out of the 366 children with ADHD had the genetic variant supposed to be a cause of the illness. That would suggest that other factors are the main cause in the vast majority of cases. Genes hardly explain at all why some kids have ADHD and not others.
Well, there was no single genetic variant, there were lots. Plus, unusual CNVs were also carried by 7% of controls, so the "extra" mutations presumably only account for 7-8%. James also accused The Lancet of "massive spin" in describing the findings. While you can see his point, given that James's own output nowadays consists mostly of a Guardian column in which he routinely over/misinterprets papers, this is a bit rich.

The authors say that
the findings allow us to refute the hypothesis that ADHD is purely a social construct, which has important clinical and social implications for affected children and their families.
But they've actually proven that "ADHD" is a social construct. Yes, they've found that certain genetic variants are correlated with certain symptoms. Now we know that, say, 16p13.11-duplication-syndrome is a disease, and that its symptoms include (but aren't limited to) attention deficit and hyperactivity. But that doesn't tell us anything about all the other kids who are currently diagnosed with "ADHD", the ones who don't have that mutation.

"ADHD" is evidently an umbrella term for many different diseases, of which 16p13.11-duplication-syndrome is one. One day, when we know the causes of all cases of attention deficit and hyperactivity symptoms, the term "ADHD" will become extinct. There'll just be "X-duplication-syndrome", "Y-deletion-syndrome" and (because it's not all about genes) "Z-exposure-syndrome".

When I say that "ADHD" is a social construct, I don't mean that people with ADHD aren't ill. "Cancer" is also a social construct, a catch-all term for hundreds of diseases. The diseases are all too real, but the concept "cancer" is not necessarily a helpful one. It leads people to talk about Finding The Cure for Cancer, for example, which will never happen. A lot of cancers are already curable. One day, they might all be curable. But they'll be different cures.

So the fact that some cases of "ADHD" are caused by large rare genetic mutations, doesn't prove that the other cases are genetic. They might or might not be - for one thing, this study only looked at large mutations, affecting at least 500,000 bases. Given that even a deletion or insertion of just one base in the wrong place could completely screw up a gene, these could be just the tip of the iceberg.

But the other problem with claiming that this study shows "a genetic basis for ADHD" is that the variants overlapped with the ones that have recently been linked to autism, and schizophrenia. In other words, these genes don't so much cause ADHD, as protect against all kinds of problems, if you have the right variants.

If you don't, you might get ADHD, but you might get something else, or nothing, depending on... we don't know. Other genes and the environment, presumably. But "7% of cases of ADHD associated with mutations that also cause other stuff" wouldn't be a very good headline...

ResearchBlogging.orgN. M. Williams et al (2010). Rare chromosomal deletions and duplications in attention deficit hyperactivity disorder: a genome-wide analysis The Lancet

Shotgun Psychiatry

There's a paradox at the heart of modern psychiatry, according to an important new paper by Dr Charles E. Dean, Psychopharmacology: A house divided.

It's a long and slightly rambling article, but Dean's central point is pretty simple. The medical/biological model of psychiatry assumes that there are such things as psychiatric diseases. Something biological goes wrong, presumably in the brain, and this causes certain symptoms. Different pathologies cause different symptoms - in other words, there is specificity in the relationship between brain dysfunction and mental illness.

Psychiatric diagnosis rests on this assumption. If and only if we can use a given patient's symptoms to infer what kind of underlying illness they have (schizophrenia, bipolar disorder, depression), diagnosis makes sense. This is why we have DSM-IV which consists of a long list of disorders, and the symptoms they cause. Soon we'll have DSM-V.

The medical model has been criticized and defended at great length, but Dean doesn't do either. He simply notes that modern psychiatry has in practice mostly abandoned the medical model, and the irony is, it's done this because of medicines.

If there are distinct psychiatric disorders, there ought to be drugs that treat them specifically. So if depression is a brain disease, say, and schizophrenia is another, there ought to be drugs that only work on depression, and have no effect on schizophrenia (or even make it worse.) And vice versa.

But, increasingly, psychiatric drugs are being prescribed for multiple different disorders. Antidepressants are used in depression, but also all kinds of anxiety disorders (panic, social anxiety, general anxiety), obsessive-compulsive disorder, PTSD, and more. Antipsychotics are also used in mania and hypomania, in kids with behaviour problems, and increasingly in depression, leading some to complain that the term "antipsychotics" is misleading. And so on.

So, Dean argues, in clinical practice, psychiatrists don't respect the medical model - yet that model is their theoretical justification for using psychiatric drugs in the first place.

He looks in detail at one particularly curious case: the use of atypical antipsychotics in depression. Atypicals, like quetiapine (Seroquel) and olanzapine (Zyprexa), were originally developed to treat schizophrenia and other psychotic states. They are reasonably effective, though most of them are no more so than older "typical" antipsychotics.

Recently, atypicals have become very popular for other indications, most of all mood disorders: mania and depression. Their use in mania is perhaps not so surprising, because severe mania has much in common with psychosis. Their use in depression, however, throws up many paradoxes (above and beyond how one drug could treat both mania and its exact opposite, depression.)

Antipsychotics block dopamine D2 receptors. Psychosis is generally considered to be a disorder of "too much dopamine", so that makes sense. The dopamine hypothesis of psychosis and antipsychotic action is 50 years old, and still the best explanation going.

But depression is widely considered to involve too little dopamine, and there is lots of evidence that almost all antidepressants (indirectly) increase dopamine release. Wouldn't that mean that antidepressants could cause psychosis (they don't?). And why, Dean asks, would atypicals, that block dopamine, help treat depression?

Maybe it's because they also act on other systems? On top of being D2 antagonists, atypicals are also serotonin 5HT2A/C receptor blockers. Long-term use of antidepressants reduces 5HT2 levels, and some antidepressants are also 5HT2 antagonists, so this fits. However, it creates a paradox for the many people who believe that 5HT2 antagonism is important for the antipsychotic effect of atypicals as well - if that were true, antidepressants should be antipsychotics as well (they're not.) And so on.

There may be perfectly sensible answers. Maybe atypicals treat depression by some mechanism that we don't understand yet, a mechanism which is not inconsistent with their also treating psychosis. The point is that there are many such questions standing in need of answers, yet psychopharmacologists almost never address them. Dean concludes:

it seems increasingly obvious that clinicians are actually operating from a dimensional paradigm, and not from the classic paradigm based on specificity of disease or drug... the disjunction between those paradigms and our approach to treatment needs to be recognized and investigated... Bench scientists need to be more familiar with current clinical studies, and stop using outmoded clinical research as a basis for drawing conclusions about the relevance of neurochemical processes to drug efficacy. Bench and clinical scientists need to fully address the question of whether the molecular/cellular/anatomical findings, even if interesting and novel, have anything to do with clinical outcome.
ResearchBlogging.orgDean CE (2010). Psychopharmacology: A house divided. Progress in neuro-psychopharmacology & biological psychiatry PMID: 20828593

Marc Hauser's Scapegoat?

The dust is starting to settle after the Hauser-gate scandal which rocked psychology a couple of weeks back.

Harvard Professor Marc Hauser has been investigated by a faculty committee and the verdict was released on the 20th August: Hauser was "found solely responsible... for eight instances of scientific misconduct." He's taking a year's "leave", his future uncertain.

Unfortunately, there has been no official news on what exactly the misconduct was, and how much of Hauser's work is suspect. According to Harvard, only three publications were affected: a 2002 paper in Cognition, which has been retracted; a 2007 paper which has been "corrected" (see below), and another 2007 Science paper, which is still under discussion.

But what happened? Cognition editor Gerry Altmann writes that he was given access to some of the Harvard internal investigation. He concludes that Hauser simply invented some of the crucial data in the retracted 2002 paper.

Essentially, some monkeys were supposed to have been tested on two conditions, X and Y, and their responses were videotaped. The difference in the monkey's behaviour between the two conditions was the scientifically interesting outcome.

In fact, the videos of the experiment showed them being tested only on condition X. There was no video evidence that condition Y was even tested. The "data" from condition Y, and by extension the differences, were, apparently, simply made up.

If this is true, it is, in Altmann's words, "the worst form of academic misconduct." As he says, it's not quite a smoking gun: maybe tapes of Y did exist, but they got lost somehow. However, this seems implausible. If so, Hauser would presumably have told Harvard so in his defence. Yet they found him guilty - and Hauser retracted the paper.

So it seems that either Hauser never tested the monkeys on condition B at all, and just made up the data, or he did test them, saw that they weren't behaving the "right" way, deleted the videos... and just made up the data. Either way it's fraud.

Was this a one-off? The Cognition paper is the only one that's been retracted. But another 2007 paper was "replicated", with Hauser & a colleague recently writing:

In the original [2007] study by Hauser et al., we reported videotaped experiments on action perception with free ranging rhesus macaques living on the island of Cayo Santiago, Puerto Rico. It has been discovered that the video records and field notes collected by the researcher who performed the experiments (D. Glynn) are incomplete for two of the conditions.
Luckily, Hauser said, when he and a colleague went back to Puerto Rico and repeated the experiment, they found "the exact same pattern of results" as originally reported. Phew.

This note, however, was sent to the journal in July, several weeks before the scandal broke - back when Hauser's reputation was intact. Was this an attempt by Hauser to pin the blame on someone else - David Glynn, who worked as a research assistant in Hauser's lab for three years, and has since left academia?

As I wrote in my previous post:
Glynn was not an author on the only paper which has actually been retracted [the Cognition 2002 paper that Altmann refers to]... according to his resume, he didn't arrive in Hauser's lab until 2005.
Glynn cannot possibly have been involved in the retracted 2002 paper. And Harvard's investigation concluded that Hauser was "solely responsible", remember. So we're to believe that Hauser, guilty of misconduct, was himself an innocent victim of some entirely unrelated mischief in 2007 - but that it was all OK in the end, because when Hauser checked the data, it was fine.

Maybe that's what happened. I am not convinced.

Personally, if I were David Glynn, I would want to clear my name. He's left science, but still, a letter to a peer reviewed journal accuses him of having produced "incomplete video records and field notes", which is not a nice thing to say about someone.

Hmm. On August 19th, the Chronicle of Higher Education ran an article about the case, based on a leaked Harvard document. They say that "A copy of the document was provided to The Chronicle by a former research assistant in the lab who has since left psychology."

Hmm. Who could blame them for leaking it? It's worth remembering that it was a research assistant in Hauser's lab who originally blew the whistle on the whole deal, according to the Chronicle.

Apparently, what originally rang alarm bells was that Hauser appeared to be reporting monkey behaviours which had never happened, according to the video evidence. So at least in that case, there were videos, and it was the inconsistency between Hauser's data and the videos that drew attention. This is what makes me suspect that maybe there were videos and field notes in every case, and the "inconvenient" ones were deleted to try to hide the smoking gun. But that's just speculation.

What's clear is that science owes the whistle-blowing research assistant, whoever it is, a huge debt.

fMRI Analysis in 1000 Words

Following on from fMRI in 1000 words, which seemed to go down well, here's the next step: how to analyze the data.

There are many software packages available for fMRI analysis, such as FSL, SPM, AFNI, and BrainVoyager. The following principles, however, apply to most. The first step is pre-processing, which involves:

  • Motion Correction aka Realignment – during the course of the experiment subjects often move their heads slightly; during realignment, all of the volumes are automatically adjusted to eliminate motion.
  • Smoothing – all MRI signals contain some degree of random noise. During smoothing, the image of the whole brain is blurred. This tends to smooth out random fluctuations. The degree of smoothing is given by the “Full Width to Half Maximum” (FWHM) of the smoother. Between 5 and 8 mm is most common.
  • Spatial Normalization aka Warping – Everyone’s brain has a unique shape and size. In order to compare activations between two or more people, you need to eliminate these differences. Each subject’s brain is warped so that it fits with a standard template (the Montreal Neurological Institute or MNI template is most popular.)
Other techniques are also sometimes used, depending on the user’s preference and the software package.

Then the real fun begins: the stats. By far the most common statistical approach for detecting task-related neural activation is that based upon the General Linear Model (GLM), though there are alternatives.

We first need to define a model of what responses we’re looking for, which makes predictions as to what the neural signal should look like. The simplest model would be that the brain is more active at certain times, say, when a picture is on the screen. So our model would be simply a record of when the stimulus was on the screen. This is called a "boxcar" function (guess why):
In fact, we know that the neural response has a certain time lag. So we can improve our model by adding the canonical (meaning “standard”) haemodynamic response function (HRF).
Now consider a single voxel. The MRI signal in this voxel (the brightness) varies over time. If there were no particular neural activation in this area, we’d expect the variation to be purely noise:Now suppose that this voxel was responding to a stimulus present from time-point 40 to 80.
While the signal is on average higher during this period of activation, there’s still a lot of noise, so the data doesn’t fit with the model exactly.
The GLM is a way of asking, for each voxel, how closely it fits a particular model. It estimates a parameter, β, representing the “goodness-of-fit” of the model at that voxel, relative to noise. Higher β, better fit. Note that a model could be more complex than the one above. For example, we could have two kinds of pictures, Faces and Houses, presented on the screen at different times:
In this case, we are estimating two β scores for each voxel, β-faces and β-houses. Each stimulus type is called an explanatory variable (EV). But how do we decide which β scores are high enough to qualify as “activations”? Just by chance, some voxels which contain pure noise will have quite high β scores (even a stopped clock’s right twice per day!)

The answer is to calculate the t score, which for each voxel is β / standard deviation of β across the whole brain. The higher the t score, the more unlikely it is that the model would fit that well by chance alone. It’s conventional to finally convert the t score into the closely-related z score.

We therefore end up with a map of the brain in terms of z. z is a statistical parameter, so fMRI analysis is a form of statistical parametric mapping (even if you don’t use the "SPM" software!) Higher z scores mean more likely activation.

Note also that we are often interested in the difference or contrast between two EVs. For example, we might be interested in areas that respond to Faces more than Houses. In this case, rather than comparing β scores to zero, we compare them to each other – but we still end up with a z score. In fact, even an analysis with just one EV is still a contrast: it’s a contrast between the EV, and an “implicit baseline”, which is that nothing happens.

Now we still need to decide how high of a z score we consider “high enough”, in other words we need to set a threshold. We could use conventional criteria for significance: p less than 0.05. But there are 10,000 voxels in a typical fMRI scan, so that would leave us with 500 false positives.

We could go for a p value 10,000 times smaller, but that would be too conservative. Luckily, real brain activations tend to happen in clusters of connected voxels, especially when you’ve smoothed the data, and clusters are unlikely to occur due to chance. So the solution is to threshold clusters, not voxels.

A typical threshold would be “z greater than 2.3, p less than 0.05”, meaning that you're searching for clusters of voxels, all of which has a z score of at least 2.3, where there's only a 5% chance of finding a cluster that size by chance (based on this theory.) This is called a cluster corrected analysis. Not everyone uses cluster correction, but they should. This is what happens if you don't.

Thus, after all that, we hopefully get some nice colorful blobs for each subject, each blob representing a cluster and colour representing voxel z scores:

This is called a first-level, or single-subject, analysis. Comparing the activations across multiple subjects is called the second-level or group-level analysis, and it relies on similar principles to find clusters which significantly activate across most people.

This discussion has focused on the most common method of model-based detection of activations. There are other "data driven" or "model free" approaches, such as this. There are also ways of analyzing fMRI data to find connections and patterns rather than just activations. But that's another story...

Very Severely Stupid About Depression

An unassuming little paper in the latest Journal of Affective Disorders may change everything in the debate over antidepressants: Not as golden as standards should be: Interpretation of the Hamilton Rating Scale for Depression.

Bear with me and I'll explain. It's less boring than it looks, trust me.

The Hamilton Scale (HAMD) is the most common system for rating the severity of depression. If you're only a bit down you get a low score, if you're extremely ill you get a high one. The maximum score's 52 but in practice it's extremely rare for someone to score more than 30.

First published in 1960, the HAMD is used in most depression research including almost all clinical trials of antidepressants. It's come under much criticism recently, but that's not the point here. The authors of the new paper, Kristen & von Wolff, simply asked: what does a given HAMD score mean in terms of severity?

It turns out that people have proposed no less than 5 different systems for interpreting HAMD scores. Do they all agree? Ha. Guess.

The pretty colors are mine. Just a glance shows a lot of variability, but the obvious outlier is the second one. That's the American Psychiatric Association (APA)'s official 2000 recommendations. Their interpretations of a given point on the scale tend to be worse than everyone else's.

This is most apparent at the top end. The APA use the terminology "Very Severe", which doesn't even appear on other scales. Much of what they class as "Very Severe" (23-26), two other scales class as "Moderate" depression! Amusingly, British authorities NICE seem to have been so unimpressed with this that they simply copied the APA's scale and toned everything down a notch for their 2009 criteria.

*

Why does this purely terminological debate matter? Well. A number of recent studies, most notoriously Kirsch et al (2008), have shown that antidepressants work better in more severe cases. The cut-off for antidepressants being substantially better than placebo generally comes out as about 26 on the HAMD in these studies.

Under the APA's 2000 terminology, this is well into the "Very Severe" band. Hence why Kirsch et al wrote - in a phrase that launched a thousand "Prozac Doesn't Work" headlines -
antidepressants reach... conventional criteria for clinical significance only for patients at the upper end of the very severely depressed category.
But for Bech, 26 is simply middle-of-the-road "major depression". For Furukawa, it's borderline "moderate" or "severe". Hmm. So if they'd gone with those criteria, Kirsch et al would have written instead
antidepressants reach... conventional criteria for clinical significance only for patients with major depression, of moderate-to-severe severity.
All of these terminological criteria are arbitrary, so this isn't necessarily more accurate, but it's no less so. The irony of the fact that Kirsch et al used the American Psychiatric Associations own criteria to skewer modern psychiatry isn't lost on me and probably wasn't lost on them either.

*

But where did the APA get their system from? This is the most extraordinary thing. Here's the paper they based their approach on. It's an 1982 British study by Kearns et al. The authors wanted to see how the HAMD compared to other depression scales. So they used lots of scales on the same bunch of depressed patients and compared them to each other, and to their own judgments of severity. Here's what they found:

You'll recognize the APA's categories, kind of, but they're all shifted. Why? We can only guess. Here's my guess. The scores in that Kearns et al graph were the average HAMD scores of people who fell into each severity band. The APA must have decided that they could use these to create cutoffs for severity.

How? It's not at all clear. The mean score for "Moderate" was 18, but that's the top end of Moderate in the APA's book; ditto for "Mild". The average "Very Severe" was 30 and the average "Severe" was 21 so the cut-off should have been 25 or 26 if you just went for the midpoint, in fact the APA went with 23. And so on.

That's before we get into the question of whether you should be using these results to make cutoffs at all (you shouldn't.) And the APA seem to have ignored the fact that the HAMD did not statistically significantly distinguish between "Severe" and "Moderate" depression anyway (p=0.1). Kearns et al's graph shows that other scales, like the Melancholia Subscale ("MS"), would be better. But everyone's been using the HAMD for the past 50 years regardless.

In Summary: Interpreting the Hamilton Scale is a minefield of controversy and the HAMD is far from a perfect scale of depression. Yet almost everything we know about depression and its treatment relies on the HAMD. Don't believe everything you read.

ResearchBlogging.orgKriston, L., & von Wolff, A. (2010). Not as golden as standards should be: Interpretation of the Hamilton Rating Scale for Depression Journal of Affective Disorders DOI: 10.1016/j.jad.2010.07.011

Kearns, N., Cruickshank, C., McGuigan, K., Riley, S., Shaw, S., & Snaith, R. (1982). A comparison of depression rating scales The British Journal of Psychiatry, 141 (1), 45-49 DOI: 10.1192/bjp.141.1.45

 
powered by Blogger