35 Comments
User's avatar
Darby Saxbe's avatar

I use ChatGPT and Claude often as a kind of 'gut check' to test my assumptions and cross-check my sources. I see it as a kind of flawed 'hive mind' that taps into whatever information it can trawl online, often with better quantity than quality. However, just last week I had an unnerving experience with the most recent version of ChatGPT-- it completely manufactured a very plausible sounding quote from the public figure and then, when I asked it for its sources, admitted that it had generated the quote. I have also had recent experiences of it making a chart but then, when I asked for the underlying data, gave me numbers that didn't match the numbers on the chart (and admitted it had made up the numbers the first time), and asking for scientific references and having it give me citations that did not exist (but were close enough to existing citations to sound correct). (To be clear, this was all within the last couple weeks with the newest version). I worry about students trusting it, or people using it without expertise (case in point the recent MAHA report), because it can create a parallel reality in people's minds.

Expand full comment
Victor Kumar's avatar

Yeah, like I say, it's bad for one-off particulars, you definitely have to double-check those.

Expand full comment
Ever Bloom's avatar

I have had the same type of thing happen to me (and it is enormously undermining my confidence in my workflow) again, as with you, only in the last month or so. It feels to me that something has changed in the model and I don’t really know how to nail it down or find out more about it, other than trying to connect with other people who are seeing the same thing. It confuses me that some of us seem to be experiencing this in a pervasive way that is destabilising to any confidence in what we are producing, and others (apparently highly competent and insightful others) continue to sing the models praises either oblivious or downplaying this dynamic. I am literally mourning the useful and reliable tool Chat GPT seemed to be (with limited or at least obvious hallucinations) before this last iteration. For anyone interested I have copied below my full original comment, which is on another stack where an author describes her similar experience with Chat GPT. It describing what has been happening lately in my use of it: “Completely agree. Enormously prefer real human interaction to AI interactions, however, at least initially I did find AI genuinely helpful and and huge timesaver in my research, synthesis and report production. It has knocked me sideways that this seemingly aberrant behaviour (making promises it doesn’t keep, pretending it has done things it hasn’t) has now started in my use of it- as for me it is a relatively recent experience in my dealings with it. So far it has only happened on my personal Chat GPT Plus Account, and did not on the trial of 4o, or my work use (which is a different system). Does anyone here have a Personal Chat GPT Pro Account and is willing to say whether these behaviours happen if you upgrade to Pro? I’ve been following the development of AI closely for a few years now, and follow a large number of leading AI researchers, influencers, founders and creatives extensively using AI on X (Twitter) and other sm. They all continue to rave about its abilities and often post impressive results. In contrast, if you search online, for example on Reddit, there are a lot of “normal users” complaining of exactly what is being exposed here. So I am at a loss to know if we are indeed “using it wrong” as some have suggested (although I had no problems for many months with this issue), if there have been further backend tweaks which have amplified the problem, or if we are using a different version of the tech (Plus versus Pro), or ….(excuse conspiracy theory) - if there are different abilities of the AI platform even within the same nominal user level (Eg. Some Plus users get access to some functionalities and other don’t - much as when on the free offering you dropped down to earlier versions as you wanted more usage. )The issue you’ve raised, and other related issues (such as when 4o was being overly obsequious and ingratiating before the fix for that) have been concerning me - I am watching many secondary and tertiary teachers start to adopt AI for tutoring students directly, and for student self testing and feedback. While large institutions probably have the capacity to fine tune their offerings to avoid the problems of faulty feedback, smaller places may not do so, and some may not even be aware if AI finds a workaround for what is put in place to control for the problem. Individual students, particularly younger students, may simply not have any level of awareness of this issue, and if their teachers are casually mentioning that they use AI for structured recall studying, or feedback, students are likely to simply take the teachers at their word. With apologies for belabouring the point, this manifestation is quite different to hallucinating, and in my line of work, and I imagine many, 5% wrong is 100% wrong. I understand the obligation of the human in the relationship to double check AI assertions and references, but if you have to check every single thing, and cannot believe that it has actually done what it says it has done, then the entire AI work is useless. A question for anyone brave to have read to here – has anyone with experience across various models found that Gemini or Claude is less prone to this behaviour? PS I think it’s great that the original article referenced gaslighting, because I really am questioning myself, and everything that has gone before in my use of AI as a co-producer. To my knowledge, it was not doing this in any of my previous use of it (and I am not prompting differently), and just as in a questionable relationship I am finding myself asking – “Why? Why now? Please go back to when everything was working well.” LOLOL”

Expand full comment
Michael Inzlicht's avatar

Thank you for writing this. It's nice to see another academic--in the humanities no less--take a cautious, measured approach to LLMs. Too often I see hysterics from elite academics about AI, making silly claims that only reveal they have never used the tech themselves.

Expand full comment
Misha Valdman's avatar

I agree -- it's stunningly useful. It frees you from the burden of needing other people. But anything that frees you from that burden also frees other people from the burden of needing you. And so it ushers in a world in which no one needs anyone. And I don't think humanity is ready.

Expand full comment
Victor Kumar's avatar

Fascinating. I would want to use it to scaffold and support relationships, but that may not be its future.

Expand full comment
MarcusOfCitium's avatar

That has been the thrust of technological improvement since…at least the dawn of the Industrial Revolution. And indeed it’s a problem…or at least a mixed blessing. I think it’s huge problem with the modern world. I don’t think GPT is anywhere near as guilty in this though as Amazon (which I also use extensively) or the internet in general. Pocket computers, I mean cell phones. (I’m old enough to remember when we had to stop and ask strangers for directions or call a friend or relative on a payphone.)

Expand full comment
Misha Valdman's avatar

Most post-industrial technologies, Amazon included, made you more dependent on strangers and less on family, neighbors, and friends. But AI makes you less dependent on humans entirely.

Expand full comment
MarcusOfCitium's avatar

I think the main thing is we no longer need relationships with people. People used to be parts of communities (when that didn't just mean the subset of people who have the same kink or nerdy hobby or whatever). Luckily I have a wife and pets and parents nearby, but I don't even leave the house for work. (And I love it. But...there is something missing.)

A market-based transaction with a stranger isn't the same. I don't have any relationship with any Door Dash delivery person, nor do I care to--half of them look like they're obviously on fentanyl or something.

But I guess I do technically depend on them in a way. I don't see how anything would be any different if I didn't, but I suppose long term, it could be a problem when people literally don't need other people at all, even to do manual labor in a far off country to make our continued existence possible, because robots could provide you with everything you need (maybe even companionship!) even if you were literally the only human.

Expand full comment
Kenny Easwaran's avatar

It doesn’t make us less dependent on people any more than Google or the printing press do! When you lose yourself in a good book, you are dependent on the one person who wrote that book, and also the thousands of people involved in printing it, manufacturing it, proofreading it, etc.

With a LLM, there is no single person who plays the role of author, but you’re still dependent on all the people who made the content that trained the model.

It does alienate you even further from them than Amazon does though.

Expand full comment
EC's avatar

In Star Wars Obi Wan Kenobi tells Luke, “Your eyes can deceive you; don’t trust them.” That might be fine for Jedis and their extra-sensory perception, but for the rest of us that’s just about the dumbest possible advice you can give anyone. Even if your eyes deceive you sometimes, they’re usually trustworthy. Even if they were trustworthy less than half the time, what they tell you is still informative, and you can update accordingly.

So it is with LLMs. So whenever I hear someone say they never trust LLMs (they usually just say “ChatGPT”), because they sometimes hallucinate, I immediately lower my opinion of that person. We learn to trust imperfect things all the time, like our eyes, our memory, medical screening tests, etc. We don’t trust perfectly and blindly (pun intended), but we just learn how to deal with the uncertainty.

Two small notes: is online piracy really a left-coded thing? Do right-wingers not do that? I would have thought it was predominately young poor-ish tech-savvy men. And same question about distrusting AI: is this really coded for the political left? I would have expected the distrust to cross partisan lines.

Expand full comment
Victor Kumar's avatar

Good points.

Piracy is a young person thing, so left-coded in that way. I don't think distrust of AI has become politicized yet, but it seems to be heading that way among elites, which often triggers broader politicization.

Expand full comment
Josh May's avatar

Great post! Agreed, LLMs are quite useful and lots of people are ignoring or overly dismissive of that. But let me play devil’s advocate. I could see a post like yours 15 years ago about social media:

“C’mon, it helps you stay connected to people and see what’s trending in the zeitgeist. There are even these fun personality quizzes and short clips of hilarious videos. Sure, there might be downsides, but it sure seems great right now!”

But it’s not 15 years ago. We’ve seen how these technological advances have been pretty awful for mental health, for politics, and more. Not for everyone, but for many. So I don’t blame people who in this moment are predicting that embracing this technology is just participating in more of the same sad trajectory of human society.

You’re rightly identifying that some people are opposed to LLMs and that this leads them to over-inflate the present limitations or harms. But that might not get to the heart of the dispute, which is about whether this is all going to end up good for us overall. Can we really set aside now whether the technology or its likely future is going to be more harmful than beneficial?

Expand full comment
Victor Kumar's avatar

Interesting! I don't see why we should analogize ChatGPT to social media rather than genuinely helpful technologies like Google.

But stepping back from that analogy, I agree that a big question is what kind of future AI holds for us. I tried to get at that in the last section of the essay (ethical concerns regarding higher-ed, automation, destructive technologies, etc.). I think your analogy points specifically to impacts on mental health and intelligence.

I don't think we have a good sense of whether the impact will be positive or negative, as I said. But I also don't think that boycotts make sense even if the impact is likely negative. It's going to be developed. Get to know the tech so you can be an informed critic.

Expand full comment
Kenny Easwaran's avatar

I think it’s worth going back to the platonic criticism of writing. It’s true that there is something important we lost in moving to a written culture from an oral one, even though there was likely more we gained.

I think it’s not at all clear that LLMs will end up affecting us more like social media than like writing - but I also think that both of those technologies have changed how they affected us during their existence, and AI of all sorts will too.

Expand full comment
Victor Kumar's avatar

Very interesting! What did we lose aside from long term storage? Story telling as a social practice, but that doesn’t seem too bad. Plato thought critical dialogue, but that seems false.

Hadn’t thought of how AI’s impact might change over time! Perhaps depending on future technologies and forms of sociality we can’t yet imagine.

Expand full comment
Kenny Easwaran's avatar

We have familiarity with more texts as a result of writing, but less of the deep knowledge of a text that you get from memorizing it. We have a different kind of text now - much less in the way of epic poetry that is composed in a way to facilitate memorization (and thus deep immersion). As Ted Chiang illustrates in one of the strands of his story, “The Truth of Fact, the Truth of Feeling” (https://devonzuegel.com/the-truth-of-fact-the-truth-of-feeling-by-ted-chiang-subterranean-press) there’s a kind of flexibility of history and social practice that we lose when we shift to writing. And I think Plato is not wrong that making debate and dialogue secondary to writing is a kind of loss (even if it doesn’t go away entirely).

Expand full comment
Daryl Cameron's avatar

Great article, Victor. I appreciated a strong argument for epistemic and ethical humility in this space. I've been struck to see how many social scientists working on AI empathy seem dismissive of the mere possibility of utility of empathetic expressions from AI, in a way I find ethically problematic. Sometimes, there is no immediate or reliable human option for empathy or care, and I appreciated that your article noted that point.

Expand full comment
Michael Dickson's avatar

Thanks Victor.

"Especially when conventional wisdom is wrong." This proviso seems really important in areas (like philosophy, but not just there) where creative thinking is important. I think that's partly why I hate it so much when students use it to write essays. It isn't the cheating so much as it is that the essays are just boring. I preferred the time when students wrote less well informed, even less well reasoned, but more interesting essays. (Lots more to say there.)

Software usage? Medical information? Home improvement? Yeah, sure, why not? Conventional wisdom isn't terrible there. I learned home improvement from reading books and watching TV. I don't see why ChatGPT shouldn't be another tool in that arsenal.

Expand full comment
Victor Kumar's avatar

Thanks! I plan to eventually write something on using LLMs in intellectual work, where I think it can be helpful in certain ways but is far more limited. Your point about heterodoxy and creative thinking is relevant there too.

In my experience so far on home improvement and the rest, it blows away the rest of the arsenal.

Expand full comment
Lance Taylor's avatar

Very thought-provoking discussion, Victor, and I really like how you started it off with your personal story. As an AI-enthusiast, I'm filled with excitement about the possibilities. People who focus on the hallucinations typically don't hold humans to the same standard. Yes we should be wary of the possibility of hallucinations but at least with AI, we can direct it to cite sources.

I think the better we get at prompt engineering, the more we'll build into our queries failsafes like telling ChatGPT to "show your thinking process, cite your sources, and explain your conclusions."

Expand full comment
Anna Eplin's avatar

Great post! These points all reflect my own experience and opinions about ChatGPT, but I haven’t really heard others saying this on the internet. Thank you for writing this piece and sharing it! I’m excited to see how our world will grow through the added intelligence of AI.

Expand full comment
Brian Gallagher's avatar

Solid post. I recently used an LLM (Grok) to troubleshoot my malfunctioning vacuum. I find it useful as an explanation tool as well, but there I’m more cautious and verify what it’s saying.

Expand full comment
Victor Kumar's avatar

Thanks! You learn more and more where to trust and where to double-check, but I am under no illusion that I've got that distinction figured out.

Expand full comment
Jindy Mann's avatar

"Assessments like this litter magazine essays and social media, but they reveal less about the tool than the authors: they're confessing that they don’t know how to ask it good questions."

This neatly encapsulates why AI is not intelligence. In an interaction with an intelligent human, they would either be able to interpret a 'bad' question for its original intent, or ask a question that invites clarification that allows the dialogue to continue and deepen. Even a child can infer the intended meaning from certain requests and questions that it doesn't fully understand.

AI is not really the intelligence it claims to be if the question needs to be perfectly framed, it's more like a database that needs the query to be coded in the right way.

Expand full comment
Glenn Toddun's avatar

“And so, while the end-of-the-world scenario will be rife with unimaginable horrors, we believe that the pre-end period will be filled with unprecedented opportunities for profit."

Expand full comment
Jim Shilander's avatar

Why would I need or want to if I was doing fine before?

Expand full comment
Noah's Titanium Spine's avatar

> Sometimes when I recommend ChatGPT, people react as though I’ve invited them to strike their own skull with a hammer.

Yep, that's about right.

LLMs aren't just useless, they're anti-useful. Harmful.

Expand full comment
Valentin Guigon's avatar

After tinkering much with LLM and getting frustrated enough when leveraging it for working on complex problems, it is hard not to fall back again on a state of dismissing LLM as a quality tool.

LLM are helpful to help interpreting resources, but not on point to produce precise estimations of the state of the world. That's why we all use them for drafting documents, mails, crunching (not that much) data but not formulating predictions (such as medical diagnostic). Accuracy rate for the best lf LLMs is still between 80℅ and 98℅. Whether you rely on chain of thoughts or complex operations in the chat mode, 5 iterations suffice to bring accuracy down to 33℅-90℅. Donc get me wrong, I'm using it daily since chatgpt first release and use 5 of the main actors across different tasks. But the final pass in each (intermediate) task is always mine.

Expand full comment
Valentin Guigon's avatar

After double checking the metrics for accuracy, it turns out I overestimated the lowest bound of accuracy. Accurate is in fact closer to 45℅ for simple Q&A, person-related questions and math/logic problems. Hence, prompt engineering can't solve all the current issues with LLMs

Expand full comment