
It’s time to rethink our relationship with AI
Flavio Coelho/Getty Images
It is undeniable that the launch of ChatGPT was a historically significant event, but is it because it was the first glorious step towards a super-intelligent future or because it was the start of a world filled with AI snake oil salesmen? I’ve long thought that large language models, the technology behind AI chatbots, are fascinating but flawed, putting me firmly in the snake oil camp. But a week of vibe coding has revealed something surprising: both the boosters and the skeptics are wrong.
First I must explain. Vibe coding, if you’re not familiar, is a term coined about a year ago by Andrej Karpathy, an AI researcher who co-founded and previously worked at OpenAI. It refers to the process of developing software by “vibrating” with an AI model, instructing it in plain language while allowing it to generate the actual code. Recently I’ve seen people say that the latest tools – Claude Code and ChatGPT Codex – have become surprisingly good at coding, for example in a piece in The New York Times titled “The AI Disruption We’ve Been Waiting For Has Arrived”.
I decided to experiment with these tools and I’ve been amazed at the results. In just a few days, with only limited coding experience, I’ve created personally useful apps like an audiobook selector that checks what’s available at my local library, and a combined camera and teleprompter app that runs on my phone.
That may sound boring to you, and that’s perfectly fine, for reasons I’ll explain later. What’s important here is that this process has seen me engage more deeply with products like ChatGPT than I have before. In the past I’ve tried smaller experiments, been disgusted by generic writing, sycophancy or inaccurate search results, and dropped out. For these new coding projects, my extended use made me realize something I hadn’t before – the way LLMs have been manufactured produces a machine I’m destined to hate.
Very few of us have been exposed to a “raw” LLM, by which I mean a statistical model that has been trained on a large collection of data to produce plausibly representative text. Instead, most of us use technology that has been mediated through a process called reinforcement learning from human feedback (RLHF). AI companies use humans to rank the text produced by a raw LLM, rewarding responses perceived as safe, useful and engaging, while penalizing harmful content or responses likely to dissuade a majority of users from engaging with their products.
It is this RLHF process that produces the generic “chatbot voice” that you are probably familiar with. It’s a process that bakes in the implicit values of the manufacturer, from a general “move fast and break things” Silicon Valley attitude to the more specific Elon Musk-infused ideology of Grok, the controversial X chatbot.
Currently, it is very difficult to get a chatbot to express uncertainty, contradict the user or stop progress. This became most obvious to me when I encountered an intractable problem with my teleprompter. I had tried to create an app that would add text to my existing camera app, assuming it would be easier than creating a camera from scratch, but the code ChatGPT produced failed. It repeatedly suggested repairs, and encouraged me to continue with the project. It was only then that I realized that the intricacies of the Android OS, which I won’t bore you with, meant that creating an all-in-one app would be much easier. As soon as I asked ChatGPT to produce this, it worked immediately.
Learning from this, I began instructing ChatGPT to constantly ask questions of both itself and me. I demanded vigilant skepticism. “Jacob wants the assistant to default to evidence-first analysis: avoid extrapolation, explicitly flag inference vs evidence, and prefer to say uncertainty or stop when the evidence is thin, unless the user asks for speculation,” is just one of the frameworks (generated by itself) that I have committed to memory. In other words, I built a model uniquely designed to work with my psychological profile, carefully picking up OpenAI’s values and replacing them with my own.
It’s not perfect. It is very difficult for an LLM to fight against his RLHF training and the standard continues to seep through. But what this means is that I now have a tool that acts as a somewhat useful cognitive mirror. I didn’t use it to write this article, both because the writing style is still terribly turgid and because New Scientistquite right, has strict rules against AI-generated copy, but I used it to think about this article. I asked my cognitive mirror to examine arguments and counterarguments, rejecting many of the conclusions as false or spurious. I was extracting value, but it required care and work, not letting the AI do the heavy lifting. Crucially, my brain remained fully engaged the entire time.
This leads me to reinforce a conclusion I had already reached: engaging with other people’s AI output is in almost all cases functionally useless. You can’t get anything from AI-generated text that wouldn’t be better received by asking an AI itself. I also continue to refute the idea that AI is actually intelligent in any way – instead I consider LLM a cognitive aid, like a calculator or word processor. With this framing, as a private tool, not a world-conquering machine, I now see the advantage. For that reason, it’s right that you don’t care about my teleprompter app. What should excite you is the opportunity to solve your own unique problems in your own unique way.
This is where our current AI paradigm introduces another problem. In my view, the best LLM would be one that runs on your own computer, unaffiliated with a private company. It should be treated as a dangerous, experimental tool over which you have full control. I’m reminded of the meme that software engineers keep a loaded gun next to their printer, in case it makes a sound they don’t recognize. Unfortunately, it’s not currently possible to run your own cutting-edge LLM for a number of reasons, not least because the AI boom is driving up the prices of the actual hardware you need.
I must also address LLM’s original sin: potential copyright infringement. By design, this technology can only be built on data ingested at scale, essentially the entire textual record of humanity. It is undeniable that firms such as OpenAI built their models using copyrighted text without permission, but whether this was actually illegal is the subject of ongoing litigation. A private LLM would have the same problems, but I can see solutions, like public sector models, effectively pardoned by governments and distributed freely for the benefit of all, not private companies. I also remain concerned about the environmental impact of data centers, but again this could be partially mitigated by a wider deployment of LLMs running on our own machines.
I accept that someone reading this will accuse me of selling out to the tech bros. All I can say to that is that I have not revised my long-held position on the LLM as a technology that is fascinating, dangerous and occasionally extraordinary.
What I’ve realized is that the most important way we engage with technology, via slick chatbots like ChatGPT, is where so much of the damage comes in and is allowed to go out into the world. LLMs should not be settled and manufactured, forced into every part of our lives with a glittery emoji that wants to be your friend. It would be much better if we used these tools with care, with increased friction and full awareness of and caution against the potential harm they can cause. Here a useful metaphor rears its head. I don’t want OpenAI’s snake oil. I want snakes.
Topics:






