If you put people in a professional setting, pay attention to them and offer them something defined as possibly helpful (sugar water, tapping fingers, breathing exercises, a friendly grandmother, watching a video etc), some of those people will feel a bit better for a little while compared to a control group. It's a very low bar for claiming "efficacy".
Dear Baird, I completely agree. There is a significant impact from the intervention here, and it involves a placebo effect that is difficult to measure. I also believe that the attention AI received during the study period and continues to receive in the media contributes to this effect.
And yes, merely offering something presented as helpful, as opposed to a waiting list control, is a very low standard for asserting an effect.
It didn’t think it was much of a study. Less than 300 people? Maybe for them it worked. But for how long? Where was the longitudinal study? I have to say—you kinda lost me in the rest of the article. I’ll also tell you a friend of mind who was all over the place, up, down, up, down - I only have contact with her by messenger, but she swore that a book about overthinking “saved” her. She’s been receiving hormone treatment anyhoo, I’ve thought she’s been bipolar all along. Would a darn piece of metal pick that up? Or for that matter, a book? I’m not saying they couldn’t be helpful, but a lot would be missed, especially repressed material. I suppose they’re all cognitive-behavioral. Well, that’s my take.
This really tickled me, “a darn piece of metal” and reminded me of how my dad related to his car. I remember my daughter had a furby toy that was allegedly interactive but I beg to differ. It gave me the creeps on top of having nothing to say worth hearing. Kids do talk to their toys and pets though, so there might be space for some kind of therapeutic robot, but there would need to be so many safeguards that they might never be useful.
Dear Judi, great comment! Yes, most of it is CBT with a strong focus on techniques and interventions. That said, I recently read a book on acceptance and commitment therapy, and I was surprised by how helpful and practical I found it.
While I totally agree with your critique of the study, I've spent a lot of time working with chatbots. I've very little doubt there's massive potential coming down the pipe but it's way to early to start hailing the birth of a new dawn.
Dear Ross, that is an excellent point. Given the pace of development and exponential growth in processing power, I still cling to this notion that the AI we see today is the dumbest and slowest we will ever see. So, I agree with the massive potential that is coming down the pipe.
This development actually makes me reflect on the therapeutic element of psychotherapy. I am not yet certain that the pure replication of therapeutic interventions or dialogue is the right way forward, but I am happy to be proven wrong.
I’m not even sure that four weeks is long enough to measure past the “I’m in a research study using something that will hopefully make me feel better…oh I feel better because I’m in a study!” effect.
I agree. There is a clear effect of the intervention, and considering the attention—if not the "hype"—that AI receives, I am certain there is a placebo effect here that cannot be measured.
There are never enough replication studies to get the data that everyone wants because researchers always want to do something new every time. It’s such a big problem.
That is an excellent question, and I believe it is not feasible.
However, you could test against a chatbot that has not been trained as a therapist to see how much of the effect is attributable to AI trained in therapy.
You might also consider avoiding patient self-reports of their depression by having clinicians conduct observer-based ratings. The raters could be blinded and unaware of which individuals are in the control group and who is using the Therabot.
Such a design is more complicated and could still lead to unblinding (for example, if patients casually mention, "I used the app" at the beginning of the interview), but it would mitigate many of the issues associated with self-rating scales.
I appreciate the post as well as the comments in this thread.
I believe that Gen AI therapists can help some people under some circumstances. What we need are studies that explore the conditions under which Gen AI helps (or hurts), rather than studies that focus on overall effectiveness.
Meanwhile, a new preprint from UCSD researchers shows, for the first time, that an AI Bot can pass the Turing Test. (https://arxiv.org/pdf/2503.23674)
I think this is a mixed blessing.
On the one hand, we're going to know very soon whether there are placebo effects of the sort raised in this thread. If people can't tell whether it's a person or a Gen AI bot delivering CBT, then the impact of therapy will not be influenced by the novelty and hype surrounding AI.
On the other hand, it's scary to think that someday people will rely on Gen AI chatbots for therapy and easily forget that they're bots. Everyone - both human therapists and bots - makes mistakes on occasion, but AI has a history of occasionally disastrous mistakes that humans would not make.
If you put people in a professional setting, pay attention to them and offer them something defined as possibly helpful (sugar water, tapping fingers, breathing exercises, a friendly grandmother, watching a video etc), some of those people will feel a bit better for a little while compared to a control group. It's a very low bar for claiming "efficacy".
Dear Baird, I completely agree. There is a significant impact from the intervention here, and it involves a placebo effect that is difficult to measure. I also believe that the attention AI received during the study period and continues to receive in the media contributes to this effect.
And yes, merely offering something presented as helpful, as opposed to a waiting list control, is a very low standard for asserting an effect.
I definitely agree and my own personal term for the placebo effect is the somebody cares effect. It’s very powerful.
It didn’t think it was much of a study. Less than 300 people? Maybe for them it worked. But for how long? Where was the longitudinal study? I have to say—you kinda lost me in the rest of the article. I’ll also tell you a friend of mind who was all over the place, up, down, up, down - I only have contact with her by messenger, but she swore that a book about overthinking “saved” her. She’s been receiving hormone treatment anyhoo, I’ve thought she’s been bipolar all along. Would a darn piece of metal pick that up? Or for that matter, a book? I’m not saying they couldn’t be helpful, but a lot would be missed, especially repressed material. I suppose they’re all cognitive-behavioral. Well, that’s my take.
I
This really tickled me, “a darn piece of metal” and reminded me of how my dad related to his car. I remember my daughter had a furby toy that was allegedly interactive but I beg to differ. It gave me the creeps on top of having nothing to say worth hearing. Kids do talk to their toys and pets though, so there might be space for some kind of therapeutic robot, but there would need to be so many safeguards that they might never be useful.
Dear Judi, great comment! Yes, most of it is CBT with a strong focus on techniques and interventions. That said, I recently read a book on acceptance and commitment therapy, and I was surprised by how helpful and practical I found it.
While I totally agree with your critique of the study, I've spent a lot of time working with chatbots. I've very little doubt there's massive potential coming down the pipe but it's way to early to start hailing the birth of a new dawn.
Dear Ross, that is an excellent point. Given the pace of development and exponential growth in processing power, I still cling to this notion that the AI we see today is the dumbest and slowest we will ever see. So, I agree with the massive potential that is coming down the pipe.
This development actually makes me reflect on the therapeutic element of psychotherapy. I am not yet certain that the pure replication of therapeutic interventions or dialogue is the right way forward, but I am happy to be proven wrong.
I’m not even sure that four weeks is long enough to measure past the “I’m in a research study using something that will hopefully make me feel better…oh I feel better because I’m in a study!” effect.
I agree. There is a clear effect of the intervention, and considering the attention—if not the "hype"—that AI receives, I am certain there is a placebo effect here that cannot be measured.
There are never enough replication studies to get the data that everyone wants because researchers always want to do something new every time. It’s such a big problem.
Wow, I appreciate this so much
Since there's no blinding, how would we design a study that had blinding?
That is an excellent question, and I believe it is not feasible.
However, you could test against a chatbot that has not been trained as a therapist to see how much of the effect is attributable to AI trained in therapy.
You might also consider avoiding patient self-reports of their depression by having clinicians conduct observer-based ratings. The raters could be blinded and unaware of which individuals are in the control group and who is using the Therabot.
Such a design is more complicated and could still lead to unblinding (for example, if patients casually mention, "I used the app" at the beginning of the interview), but it would mitigate many of the issues associated with self-rating scales.
Thank you so much for taking the time to share this. It helps to hear your thoughts and insight
I appreciate the post as well as the comments in this thread.
I believe that Gen AI therapists can help some people under some circumstances. What we need are studies that explore the conditions under which Gen AI helps (or hurts), rather than studies that focus on overall effectiveness.
Meanwhile, a new preprint from UCSD researchers shows, for the first time, that an AI Bot can pass the Turing Test. (https://arxiv.org/pdf/2503.23674)
I think this is a mixed blessing.
On the one hand, we're going to know very soon whether there are placebo effects of the sort raised in this thread. If people can't tell whether it's a person or a Gen AI bot delivering CBT, then the impact of therapy will not be influenced by the novelty and hype surrounding AI.
On the other hand, it's scary to think that someday people will rely on Gen AI chatbots for therapy and easily forget that they're bots. Everyone - both human therapists and bots - makes mistakes on occasion, but AI has a history of occasionally disastrous mistakes that humans would not make.