Are Therapy Chatbots Effective for Depression and Anxiety? A Critical Comparative Review
Frontiers in AI & Mental Health: Research & Clinical Considerations
A recurring series exploring cutting edge research and clinical applications of artificial intelligence in mental health treatment
by Christopher Campbell, MD Candidate (Medical University of South Carolina) (with assistance from ChatGPT4o/Scholar AI in summarizing the research studies)
There exists a substantial gap between the number of individuals who need mental health care and the number of mental health care providers. According to the World Health Organization, more than 75% of individuals in low-to-middle income countries have no access to treatment (WHO, 2023). The main factors driving limited access to treatment include healthcare expenses, dearth of mental health providers, stigma, geographical constraints, and more (Rural Health Information Hub, 2019). Smartphones are now an ever-present part of daily life, with at least 91% of Americans owning smartphones, and approximately 56% of individuals owning them globally (Pew Research Center, 2024; GSMA, 2023). Both smartphone applications and therapy chatbots have the potential to increase accessibility through a variety of mechanisms, including scalability, ease of access, negating transportation or geographic restrictions, affordability, privacy (avoidance of stigma concerns), and more (Luxton, 2020; Zhong et al., 2024). As such, smartphone applications providing mental health interventions and, more recently, therapy chatbots, have been explored as a means to help address the mental health care treatment gap.
The idea of chatbots providing psychotherapy has become an increasingly popular topic for discussion. The concept of therapy chatbots is not new. The first therapy chatbot, ELIZA, was invented in the 1960s (Weizenbaum, 1966). However, the increasing sophistication of today’s artificial intelligence, particularly large language model (LLM)-based systems like ChatGPT, Claude, Gemini, and Grok, has sparked increased interest in chatbots specifically designed to deliver talk therapy. Notably, most currently studied chatbots are CBT-scripted (such as Woebot and Wysa), as opposed to generative AI (LLM-based) systems like ChatGPT.
The uses and implications of therapy chatbots are multifaceted. Key areas of concern include their ability to form a therapeutic alliance, potential for dream analysis, safety risks, the possibility of overreliance, treatment failure (which may discourage users from seeking human psychotherapy), and the broader devaluation of human-delivered care. Rather than addressing all of these complex issues in a single article, each will be explored in depth in future installments of this column.
To properly examine the unique role that therapy chatbots may play in mental health care, we must first examine the effectiveness of the well-established technology of mental health smartphone applications, as these serve a function similar to therapy chatbots. These apps have been in use and under study for over a decade and thus provide a baseline for comparison in relation to the efficacy of chatbots. This baseline will be established through the exploration of a meta-analysis regarding the efficacy of smartphone mental health care applications for treatment of anxiety and depression. Afterwards, we will explore a second meta-analysis examining the efficacy of therapy chatbots in the treatment of anxiety and depression, along with a very recently published randomized controlled trial (RCT) with noteworthy results. This side-by-side comparison will allow us to examine the similarities and differences of each technology and their relative efficacies. In doing so, we can evaluate whether the use of therapy chatbots is empirically justified at this time.
Clinical Evidence on Mental Health Apps and Chatbots
Meta-Analysis: Mental Health Apps for Depression and Anxiety: Findings from a Meta-Analysis of 176 RCTs (Linardon et al., 2024)
This 2024 meta-analysis by Linardon et al., demonstrates the effectiveness of general mental health care applications. This meta-analysis examined 176 RCTs with more than 20,000 participants which encompassed a wide range of mental health care applications, including those that incorporate cognitive behavioral therapy (CBT) principles (48% of applications), mindfulness (21%), cognitive training (10%), and mood monitoring (34%). Notably, 5% of the applications studied included those that utilized chatbot technology (Linardon et al., 2024).
The study found that mental health apps produced small but statistically significant improvements in symptoms of depression (g=0.28) and generalized anxiety (g=0.26) across tens of thousands of participants. Importantly, the effect size was found to be as large as g=0.38 when the app was specifically designed to treat depression. Apps specific for anxiety treatment did not demonstrate additional significant increases in effect size. Interestingly, this meta-analysis specifically noted that for the treatment of depression, effect sizes were significantly larger if the app utilized chatbot technology, with an average effect size of 0.53 (0.33‐0.74).
Effect Sizes from Linardon et al. (2024)
App Type | Depression (g) | Anxiety (g) |
All Mental Health Apps | 0.28 | 0.26 |
Depression-Specific Apps | 0.38 | — |
Apps Using Chatbot Technology (5% subset) | 0.53 (95% CI: 0.33–0.74) | 0.18 (95% CI: 0.06–0.31) |
It is necessary to note the attrition rate in regard to the use of mental health applications in order to account for an accurate real-world impact that these interventions may have. The issue of attrition is especially important when considering mental health applications versus chatbots, as one of the proposed benefits of chatbots is increased engagement (Gaffney et al., 2019). In this meta-analysis it was found that approximately 25% of participants prematurely dropped out of their respective study (Linardon et al., 2024).
Meta-Analysis: Therapy Chatbots for Depression and Anxiety: Meta-Analysis of 18 RCTs (Zhong et al., 2024)
Now that we have examined the effectiveness of more general mental health care applications, it is important to examine the latest literature on the effectiveness of therapy chatbots specifically. This systematic review and meta-analysis by Zhong et al., published in April 2024, provides a comprehensive examination of the treatment effectiveness of therapy chatbots. This study included 18 RCTs and approximately 3,500 participants, aiming to assess therapy chatbots which utilized CBT as their primary psychotherapeutic modality.
For anxiety, this study found that the treatment with therapy chatbots resulted in significant reductions of anxiety symptoms (p < 0.001) with an effect size of g = -.19 overall (note: the negative value in this instance indicates improvement in symptoms). Additionally, this effect size remained stable at 4 weeks (g = -0.18) and was found to be even greater at 8 weeks (g = -0.24). However, it is important to note that at 3-month follow up this effect size was diminished and nonsignificant, indicating a potential limitation in such interventions.
For depression, significant decreases in depression symptoms were also associated with chatbot use, with an effect size ranging from g = -0.25 to -0.33. However, by the three-month follow-up, this effect had diminished and was no longer statistically significant. While attrition was not reported in this study, a separate analysis which will be discussed in more detail below found attrition rates of approximately 21%, with fluctuations in this rate ranging from 18% to 26% depending on the length of the intervention (Ahmad Ishqi Jabir et al., 2024).
Effect Sizes from Zhong et al. (2024) – Therapy Chatbots for Depression & Anxiety
Outcome | Time Point | Effect Size (g) |
Anxiety | Overall | -0.19 |
4 Weeks | -0.18 | |
8 Weeks | -0.24 | |
3-Month Follow-up | Not significant | |
Depression | Overall | -0.25 to -0.33 |
3-Month Follow-up | Not significant |
Lastly, it should be noted that more than a year has elapsed since the publication of the Zhong et al. (2024) meta-analysis. Given the rapid development and increasing interest in the topic of therapy chatbots, ongoing examination and repeat meta-analyses are necessary to continually monitor the effectiveness of these advancing technologies.
Emerging Research on Therapy Chatbots: New Findings and Ongoing Developments
Serendipitously, as this article was nearing completion, a randomized controlled trial (RCT) examining the effectiveness of a therapy chatbot was released. The results are groundbreaking and provide a stark example of how rapidly developments occur in this area of research
This study, published by Heinz et al. in late March 2025, is a first-of-its-kind RCT examining the effectiveness of a generative-AI therapy chatbot, “Therabot,” developed at Dr. Nicholas Jacobson’s AI and Mental Health Lab at Dartmouth to treat anxiety and depression compared to waitlist controls. Specifically, this study found clinically significant reductions in patients diagnosed with Major Depressive Disorder (MDD), Generalized Anxiety Disorder (GAD), and Clinically High Risk for Feeding and Eating Disorders (CHR-FED). Importantly, the following moderate-to-large effect sizes were demonstrated:
Condition | 4-Week Effect Size (d) | 8-Week Effect Size (d) |
Major Depressive Disorder (MDD) | 0.845 | 0.903 |
Generalized Anxiety Disorder (GAD) | 0.794 | 0.840 |
Clinically High Risk for Feeding and Eating Disorders (CHR-FED) | 0.627 | 0.819 |
These effect sizes exceed those commonly reported for SSRIs in clinical trials, and approach or match the effect sizes observed for first-line psychotherapy, an especially notable finding given the digital format. It is also striking that the effect sizes for each disorder were found to be increased at 8-week follow up compared to 4-week follow up, as this suggests at least some persistence of therapeutic effect even after cessation of treatment (contrary to the two aforementioned meta-analyses, which showed a loss of effect after 3-month follow up.)
Moreover, this study found that over a period of 4 weeks, users engaged with Therabot on average for 6 hours during the treatment period, demonstrating encouraging levels of spontaneous user engagement. Interestingly, it was noted that study participants reported the therapeutic alliance with Therabot to be similar to that of human therapists—a surprising finding that is consistent with prior work showing that users can develop a perceived therapeutic alliance with chatbots (Beatty et al., 2022).
The results of the Heniz et al., RCT are striking, with effect sizes considerably greater than those found in the Zhong et. al, meta-analysis. If such results can be replicated in similar studies, and positive safety profiles can be established, this could merit considerations regarding the use of therapy chatbots in first-line care for MDD, GAD, and CHR-FED.
Discussion: Comparing the Effectiveness of Therapy Chatbots, Mental Health Apps, and Traditional Treatments
Mental health smartphone applications appear to have an effect size of g=0.28 for depression and g=0.26 for anxiety; comparatively, therapy chatbots demonstrated g=0.29 for depression and g=0.19 for anxiety [not accounting for the recent results of the Heinz et al., (2025) study]. For reference, it is important to consider the effect sizes for common first-line treatments: the effect sizes are approximately 0.31 for antidepressant medications, 0.85 for psychotherapy (general), and 0.97 for short-term (40 hours or less) psychodynamic psychotherapy (Turner, Matthews, Linardatos, Tell, & Rosenthal, 2008; Smith, Glass, & Miller, 1980; Abbass, Hancock, Henderson, & Kisely, 2006). According to Cohen’s conventional benchmarks, effect sizes of ~0.20 are considered small, ~0.50 moderate, and ~0.80 or higher large, placing most app- and chatbot-based interventions in the small to low-moderate range, and traditional psychotherapy in the moderate to large range (McGough & Faraone, 2009).
The results of the two meta-analyses just noted suggest that therapy chatbots are relatively equivalent in effectiveness to smartphone mental health applications.
Though therapy chatbots appear to be equal in effectiveness at this time, there are several interesting considerations that indicate therapy chatbots may eventually prove to be more effective than smartphone mental health applications. Firstly, though the overall results of the two aforementioned meta-analyses did show relatively equivalent effect sizes, Linardon et al. (2024) did specifically find that apps using chatbot technology for depression had significantly higher effect sizes (g=0.53) than those that didn’t (g=0.28). Furthermore, as therapy chatbot technology is continually studied, refined and improved, it is plausible that effect sizes may continue to increase, as demonstrated by the recently published Heinz et al. (2025) RCT.
Comparison of Effect Sizes for Depression and Anxiety Treatments
Intervention | Depression (Effect Size, g/d) | Anxiety (Effect Size, g/d) |
SSRIs | ~0.31 | ~0.30 |
Psychotherapy (General)¹ | ~0.85 | ~0.85 |
Short-term Psychodynamic Psychotherapy² | ~0.97 | ~0.97 |
Mental Health Apps (Linardon et al., 2024) | 0.28 (overall) / 0.38 (depression-specific) | 0.26 |
Mental Health Apps Using Chatbot Technology (Linardon et al., 2024) | 0.53 (95% CI: 0.33–0.74) | 0.18 (95% CI: 0.06–0.31) |
Therapy Chatbots (Zhong et al., 2024) | 0.25 – 0.33 | 0.19 |
Therapy Chatbot – Heinz et al., 2025 (RCT) | 0.845 (4-wk) / 0.903 (8-wk) | 0.794 (4-wk) / 0.840 (8-wk) |
- (Smith, Glass, & Miller, 1980)
- (Abbass, Hancock, Henderson, & Kisely, 2006)
Attrition and Engagement: Therapy Chatbots vs Mental Health Apps
One of the proposed benefits of chatbots is a hypothesized decrease in attrition rates when compared to other forms of internet-delivered mental health care (such as smartphone applications) due to the more interactive and engaging nature of these conversational LLMs (Gaffney et al., 2019). Therefore, it is of crucial importance to examine the attrition rate between more general mental health applications and therapy chatbots.
While the aforementioned meta-analysis did not examine the attrition rate for therapy chatbots, a separate study, a recent meta-analysis published in February 2024 by Ahmad Ishqi Jabir et al. analyzed attrition rates specifically in therapy chatbot studies and found an average attrition rate of approximately 21%, with short-term studies (less than 8 weeks) demonstrating an attrition rate of approximately 18% and long-term studies (greater than 8 weeks) demonstrating an attrition rate of approximately 26.5%. In comparison to mental health care apps, chatbots do appear to have a slight advantage in terms of treatment adherence in the short-term, though this advantage is subtle. It is unclear whether the same cohort of individuals who fail to adhere to smartphone mental health applications would also fail to adhere to therapy chatbots; it is possible that individuals may have a preference for, and thus increased utilization, of one modality vs the other.
Clinical Guidelines for Using Mental Health Apps and Therapy Chatbots
What Professional Mental Health Organizations and Experts Say About Apps and Chatbots
The American Psychiatric Association does not appear to have an official position on therapy chatbots. Additionally, it does not take a particular position for or against the use of mental health care applications; however, they do suggest that (1) caution be used if choosing applications to avoid unhelpful or harmful applications, and (2) applications can be beneficial if chosen with proper evaluation. They advise that clinicians work with their patients to make an informed, personalized decision regarding whether or not the use of such applications would provide meaningful benefit to the patient. To assist in the evaluation of mental health applications, the American Psychiatric Association has created an extensive, evidence-based that they encourage all mental health care clinicians to utilize when exploring mental health applications. This model encourages clinicians to assess applications on a broad range of criteria, including safety, privacy, proposed effectiveness, and more (The App Evaluation Model, n.d.).
Additionally, on March 12. 2025, the American Psychological Association published an article urging that significant harm is posed by the use of generic AI chatbots for mental health support , though the articles does not explicitly address chatbots specifically designed for mental health care purposes (Abrams, 2025).
Notably, the lead researchers and developers of Therabot, Dr. Nicholas Jacobsen and Dr. Michael Heinz, caution that generative AI agents are not yet ready for autonomous deployment in mental health care. They implore that the risks of such modalities should be better studied before widespread, autonomous implementation (Kelly, 2025):
““While these results are very promising, no generative AI agent is ready to operate fully autonomously in mental health where there is a very wide range of high-risk scenarios it might encounter.” – Dr. Michael Heinz, as quoted in Dartmouth News (Kelly, 2025)
“There are a lot of folks rushing into this space since the release of ChatGPT, and it’s easy to put out a proof of concept that looks great at first glance, but the safety and efficacy is not well established… This is one of those cases where diligent oversight is needed.” – Dr. Nicholas Jacobsen, as quoted in Dartmouth News (Kelly, 2025)
Lastly, the Psychotherapy Action Network (PsiAN) has released a policy statement on smartphone mental health applications and therapy chatbots which presents a detailed examination of the ethical and clinical implications. This statement acknowledges the significant mismatch between the number of providers versus the number of those needing care, and the desire for a “disruptive solution.” However, this PsiAN position statement argues that while technology has potential to improve mental health care access, many current apps and platforms are misleading, underregulated, and profit-driven, offering superficial fixes at the expense of clinical integrity and confidentiality.
“We liken the situation to a hypothetical antibiotic shortage. Imagine if the response were merely to sell diluted antibiotics or untested remedies. That’s what we have today. Too many mental health technology offerings are either watered down versions of safe, effective treatments or some form of digital snake oil.” – PsiAN Position Paper on Mental Health Apps/Technology
Should Clinicians Use Mental Health Apps and Therapy Chatbots?
Given the dearth of official policy and clear recommendations regarding the use of therapy chatbots and smartphone mental health applications as treatment interventions, it is ultimately up to individual clinicians and their patients as to whether or not to incorporate such interventions. If either are to be used, it is strongly encouraged to consult the App Evaluation Model created by the American Psychiatric Association to guard against ineffective, unsecure applications or therapy chatbots. Furthermore, the use of these interventions would likely be best suited for patients who are clinically stable, those with a sub-clinical or mild mood disorder, and those with significant accessibility issues to traditional mental health treatments. Importantly, given the novelty of these treatment modalities, close monitoring and regular oversight by a clinician is recommended to remain mitigate over-reliance or decompensation.
Considering the small yet clinically meaningful effect of these two modalities, combined with their ability to significantly increase accessibility (via decreased cost, decreased geographical constraints, etc), it is reasonable to consider the use of either of these two modalities as an adjunct to traditional treatments, though at this time their modest effect sizes (and the limited body of research on therapy chatbots) restrict their use as primary interventions. However, if additional research studies consistently demonstrate effect sizes for therapy chatbots that approach those of other first-line therapies, as illustrated by the recent Heinz et. al, 2025 study, utilizing therapy chatbots as a primary, standalone intervention could indeed one day be advantageous for public health outcomes.
Are Therapy Chatbots the Future of First-Line Mental Health Care?
The idea of delegating a therapeutic practice as precious as psychotherapy to an autonomous artificial intelligence-powered agent can be unnerving, even heretical. Psychotherapy has been a sacrosanct practice for more than a century, and the roots of this practice extend for millennia in forms such as conversation with religious leaders or guidance provided from community elders. The art and science of healing through therapeutic relationship and dialogue is deeply established. Human-delivered psychotherapy should not be dispensed with. Even if one day the effect size of therapy chatbots exceeds human-delivered psychotherapy, human psychotherapists will still be necessary. Different patients have different needs and preferences regarding treatment: Some prefer the authenticity and connection that in-person therapy provides as opposed to tele-health visits. Others prefer (or require) psychotherapy to medication, and it is reasonable to expect that some will always prefer psychotherapy with a human over therapeutic dialogue with a chatbot.
Additionally, our fundamental obligation is to the health of our patients and the public. We must prioritize that when considering policy recommendations for therapy chatbots. Rigorous scientific study and ongoing data collection are essential to evaluate and balance key tradeoffs, including accessibility gains, comparative effectiveness, potential harms, and the consequences of limiting access to such tools.
In an ideal world, there would be a sufficient supply of mental health care clinicians to meet the demand and needs of the patient population; however, the disparity between the number of mental health care providers and the number of patients needing mental health care has only worsened in recent years (Health Resources & Services Administration, 2024). We must acknowledge this mismatch in supply and demand, and recognize the fact that economically, socially, and geographically disadvantaged and marginalized populations are most likely to benefit from easily accessible and affordable interventions. We must ensure that our apprehensions are driven by genuine and empirically validated concern, rather than our own biases towards self- and career-preservation.
Simultaneously, others of us, myself included, must remain empirically grounded, realistic, and critical of such emerging technologies, to avoid falling prey to the allure of a “techno-magic” fix that may in fact be “digital snake oil” (PsiAN, 2019). Ultimately, to guard against both techno-optimism and luddism, we must commit to drafting evidence-based policies. These should be guided by emerging empirical evidence and aimed at determining whether the effect sizes, risk profiles, and real-world usability of therapy chatbots justify their use as adjunct or first-line treatments—or whether they belong among the many failed mental health interventions lost in the sands of time.
References:
Abrams, Z. (2025, March 12). Using generic AI chatbots for mental health support: A dangerous trend. APA Services. https://www.apaservices.org/practice/business/technology/artificial-intelligence-chatbots-therapists?utm_source=chatgpt.com
Abbass, A. A., Hancock, J. T., Henderson, J., & Kisely, S. (2006). Short-term psychodynamic psychotherapies for common mental disorders. Cochrane Database of Systematic Reviews, (4), Article No. CD004687. https://doi.org/10.1002/14651858.CD004687.pub3
Ahmad Ishqi Jabir, Lin, X., Martinengo, L., Sharp, G., Yin Leng Theng, & Lorainne Tudor Car. (2023). Attrition in conversational agent delivered mental health interventions: A systematic review and meta-analysis (Preprint). Journal of Medical Internet Research. https://doi.org/10.2196/48168
Beatty, C., Malik, T., Meheli, S., & Sinha, C. (2022). Evaluating the therapeutic alliance with a free-text CBT conversational agent (Wysa): A mixed-methods study. Frontiers in Digital Health, 4, 847991. https://doi.org/10.3389/fdgth.2022.847991
Gaffney, H., Mansell, W., & Tai, S. (2019). Conversational agents in the treatment of mental health problems: Mixed-method systematic review. JMIR Mental Health, 6(10), e14166. https://doi.org/10.2196/14166
GSMA. (2023). The state of mobile internet connectivity report 2023 – Mobile for development. https://www.gsma.com/r/somic/
Health Resources & Services Administration. (2024). State of the behavioral health workforce, 2024 (pp. 1–16). https://bhw.hrsa.gov/sites/default/files/bureau-health-workforce/state-of-the-behavioral-health-workforce-report-2024.pdf?utm_source=chatgpt.com
Kelly, M. (2025, March 27). First therapy chatbot trial yields mental health benefits | Dartmouth. Dartmouth News. https://home.dartmouth.edu/news/2025/03/first-therapy-chatbot-trial-yields-mental-health-benefits
Linardon, J., Torous, J., Firth, J., Cuijpers, P., Messer, M., & Fuller-Tyszkiewicz, M. (2024). Current evidence on the efficacy of mental health smartphone apps for symptoms of depression and anxiety: A meta-analysis of 176 randomized controlled trials. World Psychiatry, 23(1), 139–149. https://doi.org/10.1002/wps.21183
Luxton, D. D. (2020). Ethical implications of conversational agents in global public health. Bulletin of the World Health Organization, 98(4), 285–287. https://doi.org/10.2471/blt.19.237636
McGough, J. J., & Faraone, S. V. (2009). Estimating the Size of Treatment Effects: Moving Beyond P Values. Psychiatry (Edgmont), 6(10), 21. https://pmc.ncbi.nlm.nih.gov/articles/PMC2791668/
Our therapy approach makes the difference | Austen Riggs. (n.d.). Austen Riggs Center. https://www.austenriggs.org/our-treatment/riggs-difference
Pew Research Center. (2024, November 13). Mobile fact sheet. https://www.pewresearch.org/internet/fact-sheet/mobile/
PsiAN. (2019). PsiAN Tech Position Paper. https://drive.google.com/file/d/1xqLumXvbwi-kIzrDqwGJ4qMutA0q9gYi/view
Rural Health Information Hub. (2019). Barriers to mental health treatment in rural areas – RHIhub toolkit. https://www.ruralhealthinfo.org/toolkits/mental-health/1/barriers
Schleider, J. L., Dobias, M. L., Mullarkey, M. C., & Ollendick, T. (2020). Retiring, rethinking, and reconstructing the norm of once-weekly psychotherapy. Administration and Policy in Mental Health and Mental Health Services Research, 48(1), 4–8. https://doi.org/10.1007/s10488-020-01090-7
Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psychotherapy. Baltimore, MD: Johns Hopkins University Press.
The App Evaluation Model. (n.d.). American Psychiatric Association. https://www.psychiatry.org/psychiatrists/practice/mental-health-apps/the-app-evaluation-model
Turner, E. H., Matthews, A. M., Linardatos, E., Tell, R. A., & Rosenthal, R. (2008). Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358, 252–260.
Weizenbaum, J. (1966). ELIZA – A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. https://doi.org/10.1145/365153.365168
WHO. (2023). Mental Health Gap Action Programme (mhGAP). World Health Organization. https://www.who.int/teams/mental-health-and-substance-use/treatment-care/mental-health-gap-action-programme
Zhong, W., Luo, J., & Zhang, H. (2024). The therapeutic effectiveness of artificial intelligence-based chatbots in alleviation of depressive and anxiety symptoms in short-course treatments: A systematic review and meta-analysis. Journal of Affective Disorders, 356. https://doi.org/10.1016/j.jad.2024.04.057