I came across a LinkedIn post by Eric Arzubi, an MD and CEO of Front Job Psychiatry. It refers to the Dartmouth College study that showed that a four-week intervention in a randomized controlled trial involving 210 adults using a generative AI-powered therapy chatbot called Therabot led to a reduction in depressive symptoms and anxiety compared to a control group. Thank you,
, for highlighting the LinkedIn post. Here are his interpretations of the results:The 4-week intervention delivered:
↳ Depression symptoms reduced twice as much as control group
↳ Anxiety improved significantly more than controls
↳ Eating disorder risk factors substantially decreased
↳ Users engaged for 6+ hours on average (Source: LinkedIn)
Mr. Arzubi is a board-certified child and adolescent psychiatrist. I appreciate the infectious optimism he exudes. Dartmouth College has indeed provided important initial data on how generative AI chatbots can have an impact. Here is another quote of his post:
„While AI can't replace human therapists, it fills a critical gap between no care and human care.
The question isn't if AI belongs in mental health.
It's how quickly we can responsibly deploy it safely to reach those who would otherwise suffer alone. […] Today officially marks psychiatry’s ChatGPT moment.“ (Source: LinkedIn)
Why I disagree
Exhibit #1: Often, initial trials of any kind reveal significant treatment effects that diminish over time as studies with more participants are performed. A significant effect observed in a first study may not necessarily endure over time (Pereira et al., 2012). Is this effect larger than we will see in bigger studies with more participants?
Exhibit #2: Are we actually seeing a significant effect size in this study to begin with? The study used the Patient Health Questionnaire-9, which is a 9-item self-report tool for measuring depression, alongside a comparable 9-item self-report questionnaire (GAD-Q-IV) for anxiety. Patients were aware of whether they used the app or not, there was, for obvious reasons, no blinding.
You can test and use the PHQ-9 here . The score ranges from 0 to 27. Therabot users experienced a reduction of 6.1 points compared to 2.6 points for the control group. Thus, Therabot’s effect surpassed the waiting list control group reduction by 3.5 points. Moderate depression in the PHQ-9 ranges from 10 to 14, moderately severe from 15 to 19, and severe depression from 20 to 27. Evaluating the impact of Therabot’s additional 3.5 point reduction compared to controls on a waiting list - you be the judge.
Exhbit #3: Psychiatry actually had many moments.
It had its guided self-help moment when it was shown that guided self-help interventions, such as books, seem to be equally effective as face-to-face psychotherapy. (Cuijpers et al., 2010)
It had its CBT moment when it was revealed that CBT had "moderate to large effects compared to control conditions such as care as usual and waitlist (g=0.79; 95% CI: 0.70-0.89)“ after analyzing studies involving 52,702 patients. (Cuijpers et al., 2023)
It even had its mindfulness moment when a study involving 276 participants showed that mindfulness-based stress reduction appeared to be as effective as the antidepressant escitalopram. (Hoge et al., 2023)
But is psychiatry having its ChatGPT moment because of this study?
I still disagree.
If you put people in a professional setting, pay attention to them and offer them something defined as possibly helpful (sugar water, tapping fingers, breathing exercises, a friendly grandmother, watching a video etc), some of those people will feel a bit better for a little while compared to a control group. It's a very low bar for claiming "efficacy".
It didn’t think it was much of a study. Less than 300 people? Maybe for them it worked. But for how long? Where was the longitudinal study? I have to say—you kinda lost me in the rest of the article. I’ll also tell you a friend of mind who was all over the place, up, down, up, down - I only have contact with her by messenger, but she swore that a book about overthinking “saved” her. She’s been receiving hormone treatment anyhoo, I’ve thought she’s been bipolar all along. Would a darn piece of metal pick that up? Or for that matter, a book? I’m not saying they couldn’t be helpful, but a lot would be missed, especially repressed material. I suppose they’re all cognitive-behavioral. Well, that’s my take.
I