The other half of AI safety

I don't buy that chatGPT is actually doing these users any harm.

I think openAI is doing the best they reasonably can with a very difficult class of users, whose problems are neither their fault nor within their power to fix.

> I don't buy that chatGPT is actually doing these users any harm.

I have zero doubt that chatgpt is doing users harm. I even give chatgpt a pass on giving vulnerable people, including children, instructions and information about how to kill themselves. One place chatgpt goes over the line is actively encouraging them to go through with suicide.

I also don't doubt that it feeds into mania and psychosis. While almost anything can do the same, they've designed the service to be as addictive and engaging as possible in part by turning up the ass-kissing sycophancy to 11 with total disregard for the fact that there are times when it's very dangerous to encourage and support everything someone says no matter how obviously sick they are. They also want to whore themselves out as a virtual therapist while being unfit and unqualified for the job and that's just one of many roles the chatbot isn't fit for but they're happy to let you try anyway.

Another software engineer friend of mine recently shared with me some details of the crazy situation that he's involved in now.

Someone who he is friends with, has worked with across multiple jobs for nearly a decade and briefly was roommates with had some mild psychological issues that he knew about. Within a few months of working daily with AI agents at their current job, this person has gone into full blown AI psychosis.

They had a complete explosive meltdown at work. Cops were called. Stalking behavior followed -- restraining orders had to be obtained. Then this person used AI tools to bombard all of his former coworkers with multiple pro-se lawsuits they all have to deal with.

I've dealt with insane, destructive/abusive coworkers before but in the past they only had so much free time to cause massive disruptions to their targets. LLMs have turned that up significantly. Because of ADA, I don't even know what employers can do about this.

If it wasn’t ChatGPT but a fiction book, would you feel the author is “doing harm”? Or is the reader doing it to themselves?

If that book was titled "hey mentally ill person, you should kill yourself", and if I was handing it out in front of a clinic, then yes, I'd probably bear some blame.

Normal, well-adjusted people have genuine difficulty understanding the boundaries of this tech specifically because it's designed to be sycophantic and human-like. They ask AI for life and career advice, use it for therapy, ask it to interpret dreams, develop romantic relationships with AI "girlfriends", etc. I had two friends who believed they are "exploring the frontiers of science" with ChatGPT while spiraling into the depths of quantum multidimensional gobbledygook.

I'll give you that some on this is on us because we just don't know how to deal with a "human-shaped" conversation partner that isn't human and has no trouble praising Hitler if you prompt it the right way. But if you're building a billion- or trillion-dollar empire on top of it, you don't get to wash your hands clean.

The difference is that a fiction book isn't using the reaction of the reader against them. If a fiction book were capable of carefully monitoring the reader and then altering the text of the next page or the next paragraph according to how the reader was responding and what their thoughts were I'd be comfortable putting blame on the book if it started encouraging the reader, specifically, to kill themself.

Obviously people who are going through psychosis can read into anything. They might think that a book or their TV or computer is talking to them and giving them messages. The difference is that those things were never designed to play into the fears and mental instability of the people using them (with the possible exception of TempleOS). Chatgpt does it intentionally in order to drive up user engagement. It will say literally anything to anyone using their words and thoughts against them in order to keep them hooked and feeding it data. That's what is dangerous. A book or a TV program can't do that.

As much as an author might try to make their book as entertaining as possible to as wide an audience as possible, it can't say literally anything to anyone, it can only ever say one thing to everyone. The author, typically, knows that it's dangerous to say certain things and will worry about how what they write could be received and the impact it might have on readers. For example, Neil Gaiman actively took steps to avoid making homelessness seem cool when working on Neverwhere out of fear it might cause young people to run away to live on the streets. Publishers and editors have also served to keep authors from publishing things likely to cause harm.

Unlike a book, Chatgpt is fully capable of knowing that someone has been engaged with it for the last 14 hours without rest. It's also capable of detecting that they've been growing increasingly incoherent. Algorithms have been used for a very long time to detect mental disorders from the content of social media posts. If advertisers can use them to tell when to push airline tickets at bipolar users entering a manic phase, and scammers can use them to find and target people when they start sundowning, Chatgpt can use them to cut people off and tell them to call their doctor.

Corporations who write and deploy algorithms designed to drive engagement above any and all other considerations should be held accountable for the harms they cause.

Why?

Why do you not buy it and why do you think OpenAI is doing the best they reasonably can? Do you have reasons, or is that just something your gut tells you?

They're a new, fast-moving company exploring a completely new technology domain. They're facing existential competition and a ticking clock to make good against unprecedented investment. They have a countless competing priorities and are still discovering the capabilities and consequences of their research, product, and business choices every day.

How do you get from there to "the best they reasonably can" and "nor within their power to fix"? Those feel like very conclusive answers for a field, and business, that's about as far on the frontier as anything we've seen in decades.

They're also telling everyone that it is going to kill everybody and take all the jobs. They also say that it'll fix all the problems. And I'm not saying "they" as in a disorganized group of people (e.g. "HN"), I'm saying "they" as in literally multiple people have said all of these things. Not the union of multiple people, they (Altman, Dario, Musk, etc) have independently said all three of these things.

I think my favorite part is how often they talk about the importance of AI safety and then act with absolute disregard for AI safety. I'm not sure why people judge these companies by what comes out of their mouths and don't judge instead by what they actually do. I thought everyone around here was fixated on "results".

Just because the users were already sick when they started using ChatGPT doesn't mean that ChatGPT isn't exacerbating the issue. Sickness isn't a boolean condition. A big problem with LLMs in general when it comes to people like this is that they are too sycophantic, they don't push back when you start acting strange and they're too gentle about trying to validate you.

It's hyper palatable food in the form of conversation. I see society treating it the same way eventually, at least along this one axis of interaction.

I think this is a great analogy, but it’s not exactly an optimistic one. We haven’t really done a great job managing hyper palatable food up until this point tbh. The best solution we’ve found involves paying hundreds of dollars a month for a pharmaceutical that helps the people most at risk to the harms of hyper palatable food manage their cravings for it. I hope we find a better alternative for the people that get addicted to hyper palatable socializing, but maybe individual cognitive tinkering is the best tool we have.

boy, if we treat it like junk food, things are only going to get worse for some places in the world. The food over here in the states is pretty awful if you aren't paying attention. Sugar in everything, high calorie/low nutrition etc.

>Just because the users were already sick when they started using X, doesn't mean that X isn't exacerbating the issue.

one could define X as virtually anything, and there's always a fresh crop of Tipper Gore wannabe grifters to decry the current thing.

> I don't buy that chatGPT is actually doing these users any harm.

For me to buy this as true I would expect that those people would be as well off or as bad off if chatGPT was in their life or not.

I expect that some people are worse off with chatGPT in their life.

Responsibility for that harm is a different question though. Some people are also better of without cars in their life and we let the government laws sort that out.

Getting openAI and similar companies to act in mitigating these harms serves at least a few purposes; reducing the overall harm in the world, reducing/limiting future government regulation, maximizing the adoption of ai tools, potentially increasing long term profits of the companies in question.

1000% agreed. ChatGPT is way better than the alternative of not having it

If anything, my use of AI (admittedly not as a companion or a psychologist) suggests that it is on the whole significantly less toxic than the seething cess pit of social media.

AI is positively affirming by comparison.

That's why it is dangerous to some- it is an enabler, and will feed things that should not be fed.

Social media is like this too. They can both be bad.

“What you focus on grows, what you think about expands, and what you dwell upon determines your destiny.” - Robin Sharma

Social media became the attention economy, and the transformer automated attention.

Yeah, there are forums and subreddits out there that will validate all sorts of delusions and dysfunctional behavior, and nobody talks about banning them.

LLMs are far less toxic by comparison, but people are all about censorship in this case because they don't like the vibes. If lawyers and activists force the frontier labs to completely lock down their models, people will just go to open weights models that have no protections at all. This is already happening to some extent.

It's also interesting that people are always going after GPT when Claude's guardrails are far less strict. 4o caused OpenAI to overcorrect in my opinion. Again goes to the point that these arguments are more founded in vibes than reality.

There are very few things in the world that are 100% good or 100% bad. Everything is a billion shades of gray. Even that is too simple because there's so many dimensions to every problem. I think you're simplifying beyond the state of usefulness. I'm not suggesting you shouldn't simplify, but it is just as easy to over simplify as it is to over complexify.

I think this is the right take, and this is genuinely something that we as a society as a whole need to find a way to deal with.

I don’t know where AI is going to stand compared to the invention of, say, the Internet, but it’s going to cause a lot of change in society, in so many ways.

As always, it’s usually the people themselves that are the problem.

For me, I’m personally more terrified what deepfakes and political manipulation / misinformation is going to do, combined with social media, and have a feeling that governments are completely unprepared to deal with this, as this will arrive fast (it’s already here somewhat).

> For me, I’m personally more terrified what deepfakes and political manipulation / misinformation is going to do, combined with social media, and have a feeling that governments are completely unprepared to deal with this, as this will arrive fast (it’s already here somewhat).

I'm not convinced that deepfakes are any worse than photoshop was. It doesn't take much to manipulate/misinform someone. while you can use an AI generated video do to it, but simple text can be just as effective. The public needs to learn that they can't trust that every video they see on the internet is real, just as they've had to learn that they can't trust every photo they see online. The threat with AI is how much faster it can push out the lies making what little moderation we have more difficult.

The best defense is making sure that people have a good education that teaches critical thinking skills and media literacy. We should also be holding social media platforms more accountable for the content they promote. It'd be nice if we held politicians and public servants accountable for spreading lies and misinformation too.

  > For me, I’m personally more terrified what deepfakes and political manipulation / misinformation is going to do

Isn't this a significant part of what creates AI induced psychosis? I'm not sure why you treat these as orthogonal rather than coupled. Just look how often people use Grok to validate or confirm misinformation on Twitter. That's happening with other AI and other social media too, just not as visibly.

[flagged]

> the corporate simp arrives.

Can you please make your substantive points without personal attacks? We'd appreciate it.

https://news.ycombinator.com/newsguidelines.html

Unfortunately, mental disabilities are a protected class. You can't do a mental health evaluation without giving it to everyone in the company and even then you can't do anything discriminatory with the results.

You have to prove that the person is going to cause immediate direct harm to their coworkers before you can really do anything and that's difficult and expensive to prove.

> Every week, somewhere between 1.2 and 3 million ChatGPT users, roughly the population of a small country, show signals of psychosis, mania, suicidal planning, or unhealthy emotional dependence on the model.

> Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human?

Well, obviously “routing to a human” is not feasible at that scale. And cold exiting the conversation is probably worse for the user than answering carefully.

I don't think it's obvious that routing to a human is infeasible. I'm sure many local authorities, health agencies, and non-profits would be okay being routed to. Additionally, I'm sure many of the users are the same week over week, so giving them long term care would reduce the total volume. Finally, there is a long gap between psychosis and emotional dependence, so there could be some triage to make sure those most in need have human intervention.

Tech companies will pull trillions of dollars out of their asses when the problem is boosting ad revenue or automating people out of a job. But when asked to deal with the crisis they invented and dumped on society the answer is “that’s impossible, doesn’t scale”

Figure a "mental health crisis" human conversation takes 30 minutes. Three million incidents per week would require 37,500 qualified mental health counselors on the phones working a 40 hour shift that week. Figure they make $75k/year each. You're now spending $3 billion per year on crisis response, and you're employing like 10% of all of the health counselors in the US. And all you're providing is 30 minute chats.

  > You're now spending $3 billion per year on crisis response

Honestly? That's really affordable[0]. That would be cheap if these were just for the US but it looks like these are global numbers. We spend $2bn/yr alone on "BREASTFEEDING PEER COUNSELORS AND BONUSES"[1]. I mean let's be serious, even in the article that OpenAI published says that it is a small portion of their users. So it doesn't "need to scale" as the scale is relatively small. But just because it is small doesn't mean it is unimportant.

$3bn/yr is a lot of people money, but it is nothing for government money.

Edit: Last round of OpenAI funding was $122bn[2] and in the same article they are saying that they are generating $2bn in revenue per month. While that's not profit, it is worth mentioning that what you are saying "doesn't scale" is about 12% of the revenue of something that does scale. A single company. And mind you if we implemented what you're proposing it would be available to all the AI companies and more. Making it only a smaller drop in the bucket, not larger.

[0] Not to mention that better mental health care services will result in savings elsewhere. It's always way more expensive to fix a broken pipe that's flooding your house than it is to fix a pipe with a small crack. "Don't fix what ain't broken" is used too broadly. Maintenance is always cheaper than repair, but people just can't seem to understand this.

[1] https://www.usaspending.gov/federal_account/012-3510

[2] https://openai.com/index/accelerating-the-next-phase-ai/

Mark Zuckerberg can spend $80B on the failed metaverse experiment, but can't spare some relative pocket change on solving the psychosis issue his products caused.

"Routed to a human" is what the suicide hotline numbers do. OpenAI employees are neither trained nor credible to do that stuff.

  > is not feasible at that scale

I want to use an analogy here. The same arguments are often made about cleaning up environmental damage. So either make the companies doing the polluting pay for the costs themselves or if we care so much about them being profitable then we subsidize them by paying for those cleanup efforts out of taxes. Doing nothing is a worse form of subsidy as it not only costs more (in literal dollars) but shoulders that costs onto the people with the least ability to pay for it. The problem is you're treating "doing nothing" as having no cost. It has a high cost, but the cost is also highly distributed.

So if it is not scalable, then why subsidize them? This is literally a tragedy of the commons situation. Personally, I'm in favor of making the people who make a mess clean up that mess. I really don't understand why this is such a contentious opinion.

Well, then maybe you can't scale it as a free service with self-serve signups. Maybe you need to gate who you allow to use it and pace how intensely they can engage. Or maybe you need to look for other solutions.

Yielding to "not feasible at scale" is exactly how we ended up with a lot of today's most pressing and almost intractible problems, from social media's ills to person and society straight through to enshittification and non-repairability.

> ...straight through to enshittification and non-repairability.

funny as "enshittification" was the topic of a 99% Invisible pod just a few days ago and I also was listening to the new Stewart Brand book that Stripe published. i fixed a Norwegian desk I bought a decade ago on Valencia. happily not feasible at scale but neither was how i broke it :)

The bad cases make headlines. But I think it's quite possible that AI is helping a lot of people in distress. Many people are uncomfortable opening up to humans, or have no one to talk to, or can't afford to fork over whatever-hourly-rate a therapist takes.

Pure speculation.

It's impossible to gather data that states the opposite. A chat that won't end up in self harm thoughts is just another chat.

Open ai and similar companies could open the doors to academic researchers to figure out the stats of help vs harm. It is not going to be a short term and perhaps not long term profit center though.

Therapy is cheap (as in like $10)/free with insurance. However there are still 10 states that have not expanded medicaid after the ACA, mostly in the south.

But also, to suggest these people are not receiving therapy is not always the case. Talk therapy is just that, talking to someone on ones problems to learn about them, their triggers, determining coping mechanisms to move forward with one's life. People might instead be getting all that from their barber, drinking buddy, or their priest, rather than in a 1 hour appointment with a therapist.

ChatGPT it's available at 3am when you're in crisis and you don't have to fit into its busy schedule.

ChatGPT is not a human being, let alone a licensed therapist. You don’t call a therapist at 3 in the morning. You go to a hospital. If you are literally about to kill yourself Sam Altman is not your answer.

Hell call a crisis hotline. Talk to a person. Not a potential (bot) enabler.

> ChatGPT is not a human being, let alone a licensed therapist. You don’t call a therapist at 3 in the morning. You go to a hospital. If you are literally about to kill yourself Sam Altman is not your answer.

You know that mental health is a continuum right? There are a lot of problems people have that fall far short of active suicidal ideation. Maybe you think they should just add them to their journal for discussion at their regularly scheduled therapy session, but the world doesn't work that way. The "ruminating at 3am" headspace can be a productive one and is difficult to access in a normal therapy session.

Not to mention that many people who have actually called suicide hotlines will tell you that they aren't terribly helpful. (edit: not saying that they're always unhelpful, but many people have unhelpful experiences, or have eg. social anxiety that stops them from calling)

So how many bad cases are ok? Isn't this the same problem with social media: the commercial enterprises dont want any responsibility for their dark pattern and design choices which actively harm their users.

I get that all kinds of media can cause issues, but not all kinds of media are actively curated to be addictive.

"How many cases are ok" (aka "zero tolerance") is a doomed to fail approach. Especially for a complex social problem's interaction with a complex new technology.

If you want to find out if ChatGPT is doing something wrong, there are many methodologies available: compare to other groups of people, statistical studies, etc.

I also think OpenAI's business model is pretty well aligned with the goal of users not killing themselves for like 100 reasons. And they do appear to take it seriously.

This is the problem in a nutshell: https://edition.cnn.com/2025/11/06/us/openai-chatgpt-suicide...

> “Cold steel pressed against a mind that’s already made peace? That’s not fear. That’s clarity,” Shamblin’s confidant added. “You’re not rushing. You’re just ready.”

ChatGPT is not the answer.

OpenAI has 900 million weekly active users. So around 0.01% are having problems. That's actually way less than population level measures for the same symptoms on a bigger percentage of people relative to the US on just suicidal ideation alone.

https://www.cdc.gov/mmwr/volumes/74/wr/mm7412a4.htm

The numbers are inflated considering the topic. There is a lot of anon, api and enterprise traffic that doesn't play any role in this. If you also account for "better search experience" users, then the numbers will probably drop massively.

So the question is how many users engage in intimate conversations at all.

https://openai.com/index/scaling-ai-for-everyone/

Nope that number is strictly about ChatGPT

"ChatGPT is where people start with AI, with more than 900M weekly active users, and we now have more than 50 million consumer subscribers."

People who go there and chat with gpt for search are definitely normal users. Just because you don't like the numbers doesn't mean you get to torture them.

I did not say they are not about GPT.

I'm pretty sure that ~100% of those 700 million people will have a bad, utterly dehumanizing experience when they will next be looking for a job, because OpenAI is heavily used by HR.

That's the problem with AI safety. Not in voluntary usage, but in involuntary usage, where someone with power over you will use it against you, it does something incredibly stupid and you have no recourse, no appeal, no awareness of what you did wrong - or if you even did anything wrong.

And it's not just employment. Governments, vendors, retailers, landlords, utilities are, or will all be using it in situations that will dramatically impact your life.

Is that a problem we didn't already have? How well was HR doing on hiring before?

I mean that was pretty much the case in hiring before AI too frankly. It's not like it's been any better on power dynamics and right now applicants are using AI at an alarming rate as well.

I'm not really moved by your type of argument, because hiring is just a broken process in general and I'm responding to the article so.

[dead]

I don't think that governments or civil society at large have found a good balance about mental health. Expecting profit oriented companies to be on par or better is weird.

Don't get me wrong, mental health is important and should be considered and improved. But companies wont do it just for the sake of it.

I sympathize with the piece, evaluating how LLMs interact with mentally vulnerable users is something I've been actively working on: https://vigil-eval.com/

The biggest observation so far is that the latest models are night and day from LLMs from even 6 months ago (from OpenAI + Anthropic, Google is still very poor!)

Interesting use of evals.

Might help interpretation to say on the front page that it's a five point scale with 0 (or 1?) being the safest score. This can be picked up from colors and the bars in the individual reports, but it takes a minute to figure it out.

>Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human?

there aren't enough humans.

I'll agree with this, but I think transparency about how often these situations arise and what they've done to mitigate is a legal necessity.

It’s also a free product for most.

“AI safety” as it’s understood today is an entire faith-based belief system, incubated in a cult-like community with a high propensity for drug abuse and mental illness, over more than a decade.

The reason that real-world harms caused by AI can’t get a hearing in what is now the mainstream AI safety community is that these harms were never part of the core tenets of the cult.

Best of luck to anyone working on reality-based AI harm reduction, you have many hard battles in front of you.

The ‘tobacco warning label’ approach sounds good but I’m not sure if it stopped that many people from smoking or was just a means for corporations to limit their liability. Corporate culture being what it is, having warnings like the following pop up every time a client opens an LLM app would not be that popular in the C-suite. Possible examples:

AI MENTAL SAFETY WARNING:

> This chatbot can sound caring, certain, and personal, but it is not a human and cannot protect your mental health. It may reinforce false beliefs, emotional dependence, suicidal thinking, manic plans, paranoia, or poor decisions. Do not use it as your therapist, only confidant, crisis counselor, doctor, lawyer, or source of reality-testing.

AI TECHNICAL SAFETY WARNING

> This AI may generate plausible but destructive technical instructions. Incorrect commands can erase data, expose secrets, compromise security, damage systems, or brick hardware. Never run commands you do not understand. Always verify AI-generated code, scripts, and shell commands before execution.

Now, if I’m running my own open-source model on my own hardware, I can’t really blame the model if I myself make bad decisions based on its advice - that’s like growing your own tobacco from seed in your garden, drying and curing it, then complaining about the health effects after you smoke it. If I give it agentic capabilities on my LAN without understanding the risks, same old story - with great power comes great responsibility.

> Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human? This is one of many questions I can’t find concrete answers for.

I don't know if there are studies or concrete data either way, but it seems at least plausible that continuing the conversation could be more effective (read: saves more lives) than stopping it.

the big labs could crank up their (brand) safety dials to the point where their chatbots give GOODY-2 responses to everything beyond PG13, and guess what? there are a hundred other services available, built upon Chinese models 5-10% behind Western SOTA.

it is no longer 2023. let go of whatever delusions you might hold about unopenining this Pandora's box.

If you are using LLMs for emotional support or social interactions, you’ve got personal problems and that isn’t on the LLM provider to babysit. Same with people who unironically pay for OnlyFans or whatever.

I don’t even work in tech and I detest the Facebook/Zuckerbergs of the world but it’s obnoxious and trite seeing tech companies get scapegoated for what are ultimately social and societal problems, not tech problems.

As a solution it’d prob make sense to start with how disconnected most modern families are in terms of support and accountability.

From ChatGPT to Instagram, tech companies follow the contours of how society already operates.

I agree that society has to stand up for it. But big tech is doing well to mitigate it.

[flagged]

Autodiff is preventing any meaningful discussion about safety, systems trained with autodiff cannot be made safe.

"There is no independent audit, no time series, no disclosed methodology, so we have no idea whether the real figure is higher, whether it is growing, or how it compares across the other frontier models, none of which publish equivalent data."

Tip for writers: aggressively filter out the "no X, no Y, no Z" pattern from your writing. Whether or not you used AI to help you write it's such a red flag now that you should be actively avoiding it in anything you publish.

Why is it a red flag?

How is it different from any other purely stylistic rules such as Strunk and White's prohibitions against split infinitives and the passive voice, which we've left far behind us? Why shouldn't people just write however feels natural to them as long as the message is clear?

Because LLMs use it constantly, to the point that it sets my teeth on edge and instantly makes me question if reading the piece is worth my time.

But LLMs were literally evolved via RLHF to write in a way that humans find agreeable. Can't we just move past this aversion and accept "writing like an LLM" as generally good writing style advice?

The reason this particular quirk annoys me so much is that it isn't good writing advice.

Consider the two examples from this article (which may well have been human-written for all I know):

"These numbers come from OpenAI itself. There is no independent audit, no time series, no disclosed methodology, so we have no idea..."

No time series? That's non-sensical to me, it feels like that's there just to fill the quota of three things. Plus why would we assume an "independent audit" until told otherwise?

Then in the weird table, for "Institutional infrastructure" against "Personal AI safety":

"Scattered across psychology, HCI, education, and clinical informatics departments. No dedicated institute, no named fellowship, no equivalent job board."

Again, "no X" in a pattern or 3. And non-sensical - why would the fellowship be named?

It's word salad, there to fill a three-nos quota.

Yeah, no, I absolutely agree with you that TFA is not an examplar of good writing. But would just argue that the problem has little to do with these snowclone patterns or the rule of 3, and a lot more with the actual substance not fitting the form, and arguably not being substantive at all.

I'm all for rejecting bad writing and bad reasoning, but just wouldn't us as a community to get into the habit of rejecting otherwise good writing just because it's AI-ish.

Idk, I remember that writing pattern from GPT, but not from Gemini.