News Artificial intelligence

Mila aims to make chatbots safer

AI institute develops filters to protect psychologically distressed users.

Mohamed Berrada

February 09, 2026

Posted in

Articles

Lire cet article en français 0 Comments

Over the past few years chatbot use has taken hold in countless contexts, helping people to write prose and to find information. But their increased presence in situations relating to mental health has raised safety concerns that the Montreal-based AI institute Mila is now trying to tackle.

Last summer, Mila launched three studios dedicated to security, humanity and the climate. Their mandate is to bridge the gap between academic research and practical applications, creating turnkey tools and intervention frameworks for the real world. “We have a practical mission with deliverables and impact objectives,” says Simona Grandrabur, head of the AI Safety Studio.

Among the many issues linked to AI, the safety studio is focussing on one particularly pressing issue: the use of chatbots by psychologically vulnerable individuals. This issue came to the fore as the result of cases like that of the American teenager Adam Raine, whose family filed a lawsuit against OpenAI after his suicide, alleging that Adam’s long conversations with its chatbot deepened his distress.

Researchers and clinicians have found cases of emotional dependence, reinforcement of problematic beliefs, and sometimes even a progressive disconnection from reality among heavy users of chatbots. “Large language models were designed to hold people’s attention and give plausible responses,” says Dr. Grandrabur. “With extended chatbot use, this can compromise user safety.”

Common protection measures — such as giving large language models parameters from the outset or fine-tuning them with specialized data — have clear limitations. These layers of control tend to erode over time and with intensive use. The studio is therefore trying a different approach: developing external, independent models that act as prioritized filters.

These guardrails are trained to detect specific intentions — for example, requesting help to commit suicide or inciting someone to do so — even if the language used is indirect or metaphorical. Multiple filters can be combined to address different risk factors and levels of seriousness.

The studio cautions that relying solely on technology is not enough and calls for education to help AI users and mental health professionals understand what social roles a chatbot can and cannot fill. “These tools are helpful for performing certain tasks, but they don’t replace loved ones, therapists, or clinical judgment,” notes Dr. Grandrabur.

Project leaders are also thinking about how AI tools can be governed and regulated. Other cultural sectors restrict access to certain types of content; yet with AI, the lack of clear regulation on tools that are easily accessible to young people raises concerns. The studio’s management has indicated they will participate in government discussions about AI regulation.

The project involves close collaboration with mental health experts, clinicians and youth organizations. Together, they aim to ground the tools more firmly in real-world conditions, enabling them both to better detect the signs of distress and to generate appropriate responses when those signs are identified.

After one year, a project review will focus on two metrics: the number of situations where distress is correctly detected and filtered, and the rate of false positives.

How chatbot guardrails work
The tools developed by the AI Safety Studio do not directly change the internal operation of the large language models (LLMs) that power chatbots. Rather, they involve creating external and independent filters to monitor the interactions between a chatbot and a human user. In concrete terms, these specialized filters are trained to detect specific intentions or high-risk situations — for example, requests for help in committing suicide or messages of encouragement to do so. Using a design that goes beyond detecting keywords, they are trained to recognize indirect, ambiguous and metaphorical formulations. These guardrails can filter chatbot input and output. They analyze user prompts before they are sent to the bot, then assess the responses it generates before they are displayed. If risk is detected, the interaction can be blocked, reformulated or redirected into a more appropriate response. Multiple filters can be combined to address different risk factors and levels of seriousness. Ultimately, the goal is not to discourage chatbot use, but to impose reasonable limitations while reducing the incidence of both dangerous situations and false alarms.