The value-destroying potential of AI
A lot of the people trying to deploy AI as a business solution are doing it because they don't know how to measure what's valuable about their business
When my company was trying to sell our solution for theory of mind to autonomous vehicle companies, the biggest problem we had was convincing them that what we were offering worked well enough to deploy in vehicles. In part, this was because we were trying to sell them something—a software module to judge the state of mind of pedestrians—that they hadn't ever considered needing before. If we were selling them a module to detect pedestrians, we would have had a much simpler job of it. When they asked how well our system worked we would say, well, it detects 98% of pedestrians, they'd compare that to other solutions, and off we would go. But when we said "well, it gets 98% of the way to accurately representing the diversity of human answers to the question of whether that person wants to cross the street", or whatever, they would give us blank looks. It wasn't a meaningful metric for them. The way that we measured our value wasn't the way that they measured our value, because they didn't have a way to measure our value.
The real problem was deeper than that. The reason we were trying to sell them our solution was not because it was ineluctably obvious that an autonomous vehicle should be able to make judgments about what a pedestrian is thinking and planning. The value proposition we offered was that if they installed our solution in the autonomous vehicle, the autonomous vehicle would be able to drive better. An autonomous car with our software installed would behave better around people, and would not make the kind of mistakes that an autonomous vehicle without our software would make. The real value of our software is that it enabled better driving. That, proving our worth in terms of how well the vehicle was able to drive, is where we hit the real problem.
That problem is that delivering a measurable improvement in driving quality depends on defining what "good driving" is in the first place. When we started going out to potential customers, we assumed that they would have rich, sophisticated metrics for what good driving looks like. That they would know far more than we did about what their vehicles should do in a given road situation. That was not the case, at all. The internal metrics our potential customers used to determine if their vehicles were doing the right thing started and ended at "obey the law", "don't hit anybody" and "don't scare the safety driver into taking over". They were in many cases aware that these metrics were insufficient—and they are vastly insufficient—but they simply didn't know how to come up with anything better.
I was reminded of this conundrum when talking to people about the prospect of replacing human writing online with ChatGPT. Technically unsophisticated executives across a broad swath of industries and from web publishing to filmmaking, and a Greek chorus of technophilic boosters online, are positively giddy about the possibility of reducing or eliminating expensive human creative labor with ChatGPT and LLMs, as well as generative art models. There are lots of extremely sound labor and societal arguments for why this is a poor idea, but my immediate reaction was to wonder if a lack of metrics led them to forget what writing is, fundamentally, for.
Broadly, writing, and language more is the way that we get insight into other minds. It's the way that we can understand somebody else's perspective on the world. That perspective could be grand and abstract—feelings, thoughts, bold ideas—or it could be entirely straightforward—here is what I can see around this corner—but one way or the other, we use language to share the contents of one mind with another mind. The thing that makes language endlessly compelling is not the factual or stylistic content of the words, as such, but what they reveal about how somebody else sees the world.
When we idly browse the internet—let alone when we go to a theater to see a movie—what we're after is that picture of the world offset from our own. We want to inhabit another consciousness, however briefly, and learn to see something the way somebody else sees it. When we read a movie review or watch a TV show, what is enjoyable is experiencing the world through different consciousness, briefly, and learning from that. This is true even for seemingly boilerplate factual information. When we look up a how-to—how to build a shelf, how to bake a pie—the list of steps that we're after only has value because it is a list of steps that some person with specific knowledge believes to be the correct ones. We infer that the writer has an expertise we lack, and we see the process through their eyes. By communicating, even asynchronously, with them, we have expanded our perspective on the world and how to act in it.
ChatGPT and LLMs like it don't have a perspective on the world. As I've written, they don't want anything in particular except to be ingratiating. When you ask a factual question of ChatGPT, it is not judging its answer against its own sense of its own expertise. It is not imagining how it can distill its vastly broader and more general understanding of the issue at hand into something you can understand. It is simply trying to come up with the sequence of words most likely to get you to say "oh, ok, that answered my question". More so, when you ask ChatGPT to produce something more stylized or opinionated, it is not drawing on its own experience. It is not attempting to come up with a response that distills its own understanding of the world and what matters in it into something that fits the brief. It is trying to figure out what is the answer that is most precisely the modal satisfying answer that people would generally give to your question. It is not trying to—it is not capable of—give you a unique perspective on the world. It is trying to give you the absolute most generic perspective on the world, because that is all that it can do.
That does not at all render ChatGPT's outputs useless. There are applications like software development, where the language being produced has a functional purpose comprehensively disjoint from allowing one person to see through the eyes of another. The modal response to a prompt like "design an API for communication between a web application and a data store" is, in most cases, precisely what you want. If you understand the nature of the downstream task—if you understand what you're trying to accomplish, in the end—it is entirely possible to find applications where ChatGPT is an enormous labor saver.
There are even applications where it is interesting and enjoyable to see the "perspective" "taken" by the "mind" of ChatGPT. By any measure the killer app for large language models so far is the ChatGPT web API. Having coversations in that system, playing around and seeing what it will say, is fascinating and a little bit unearthly. I've spent hours at a stretch probing ChatGPT with questions about theory of mind and agency, and seeing the way those sorts of questions interact with its peculiar combination of statistical parroting and simplistic ingratiatingness is intriguing and compelling. I don't think there's any meaningful way to call it an "intelligence", but it sure is an interesting computer program, and there's per se value in that.
What is common to the applications where ChatGPT is useful is that the things that ChatGPT does—produce the broadly acceptable modal response to pretty much any prompt—map to the value of the downstream application. Being able to produce large amounts of boilerplate code helps you produce reasonably reliable software more quickly. Having a giant, uncanny association cortex with a strange and limited set of goals and desires gives you a chatbot that's fun to talk to. The tool moves the needle on the most directly appropriate measurements of value.
The problem is that many if not most of the executives and other boosters enthusiastic about ChatGPT have never had to—or been able to—measure the value of the creative humans at the core of their systems, because they're never even been able to consider working without them. They hope to use ChatGPT in applications where the reward of taking the perspective of another human is at the absolute center of the value proposition. The idea of asking ChatGPT to write even a first draft of a screenplay is an idea that could only be had by somebody who has never had to think seriously and with real stakes about what producing a screenplay entails.
Asking ChatGPT to write how-to guides will result in how-to guides that are overwhelmingly unrewarding to refer to, no matter how correct the necessary steps may be.
Asking ChatGPT or its ilk to write boilerplate listicles for entertainment websites is to invite a catastrophic collapse of interest in killing time with that content.
In all of these cases, the metric by which the ultimate product is best measured is how well it helps the consumer of that product—the moviegoer, the web browser, the DIYer—experience the perspectives, understanding, and expertise of others. But that is not the metric that has been used, because it is not something that these enterprises know how to measure. It is not amenable to quantification or summarization, and so in modern business terms, it is hard to act on. The executives running these enterprises have instead turned to more easily quantified metrics that may be partial and indirect, but are available.
But to measure the success of a creative endeavor by metrics like SEO score, inbound clicks, cost per produced screenplay is, in a world with commercially available large language models, to sow the seeds of your own doom. The businesses considering the value of ChatGPT and the like in terms of the metrics they've used to measure business success with humans creating the content are opening the door to adopting business strategies where they (entirely obliviously) wipe out the whole of their actual value. By lacking the right metrics to understand the value that humans qua humans bring to their operations, they allow the sustaining core of their business to be hollowed out.
In some sense, it's hard to blame them for this (I mean, it's not actually that hard, but bear with me) because in many cases—in moviemaking as with driving—it has not previously been possible to measure the value of having a human to do a given job, because it has been impossible to not have a human doing that job. The answer to the question "what skills does a human uniquely bring to the act of driving a car" has typically been "well, they drive it", and similarly, the answer to the question "why do you hire humans to write screenplays" has been "well, somebody's gotta". There has been no particular business need to measure this contributions because it was impossible to imagine a world without these contributions. In a modern, Taylorized, quantified enterprise, that means that as a matter of decision-making, those contributions don't really exist.
When we faced the question of how to sell customers on our value to the downstream tasks that they cared about, we eventually found ourselves having to invent metrics that our customers could use to understand if they were succeeding at the task they had set themselves at all. Perhaps the value of ChatGPT to publishers and movie producers will end up being that it allows them to see in starkly quantifiable terms the centrality value that real human insight and perspective brings to the products they hope to sell. Because they are going to learn that one way or another. The question is if it will happen too late.