While I actually agree with you that AGI is probably not coming any time soon. I think you are premature in dismissing the current paradigm altogether as a way to achieve it. You admit that the current paradigm can allow an AI to imitate an agent, even if it doesn’t have any actual goals of its own. But fundamentally what we are concerned about is an AI acting like a super intelligent agent, so it doesn’t actually matter whether it has goals of its own only that it behaves as if it has these goals. If it is possible for the current paradigm to enable an AI to imitate a very smart human on a task, there is no reason to assume that eventually it would not be able to imitate someone much smarter than a human. Of course, there are constraints given the kind of data that you can easily feed an AI, but still it appears very premature to assume that it is fundamentally impossible given current techniques.
Your argument against orthogonality doesn’t really address the concerns of most people who don’t believe in moral realism. Sure from a present viewpoint, it appears as if things have been getting better in terms of peoples moral outlook, but firstly, even small deviations like whether or not to give animals moral weight often result in drastically different real world outcomes regarding what is the right thing to do. Secondly, in any case, of course, you would expect people in the recent past to be more similar to us in moral outlook, then people in the distant past. This doesn’t disprove the possibility that it’s just random variation instead of the World getting better due to superior moral understanding. In any case, even if morality is real and the AI understands it much better than us, you just don’t consider the possibility that you can know the right thing and still do evil. If an AI is programmed to maximise utility and conclude that because of animals suffering, it would be better to eliminate humans knowing that this is morally wrong isn’t really going to stop it from doing what it’s program to do, which is maximise utility not do the right thing. And what if the right thing goes against our interests. It seems not at all crazy to suppose that because of our biases we undervalue theories that suggest the right thing to do is something very bad for us humans. What if the right thing to do for example is maximise happiness, and this is best done by filling the universe with wireheaded tiny minds that each require very little matter. If an AI genuinely will be magically motivated to do the right thing, regardless of programming, if it’s smart enough, what’s stopping it in this scenario from doing the right thing, even when we obviously would not want it to do it.
yeah I agree that my stance on orthogonality is a bit confused here. The specific criticism I was trying to make is that an intelligent being’s decisions would always be explicable, in principle (if they chose to explain them). The load-bearing arguments are computational universality and popperian epistemology, but as a quick intuition pump, you can think of the inverse claim as an appeal to the supernatural: god has placed some arbitrary limits on the things we can comprehend. Again, Deutsch is really great on this stuff. I was wary of re-hashing the arguments in full but maybe I should have. there’s more in the BOI review if of interest.
If we strip out the explicability bit and reframe the orthogonality thesis to ‘we can’t force an AGI to do exactly what we want it to do’ then it’s trivially true—I can’t force you to do what I want either. That’s the world we already live in. My point was more so was that the central problem of bringing an AGI into existence is not ‘orthogonality’, unless we’re willing to put e.g. serial killers, geopolitical crises, wars under that umbrella too.
That still leaves plenty of room to be worried about what happens when we actually do get AGI, and maybe saying so is at odds with the general tone of my post. I do have a lot of uncertainty here, but I think I’m much more optimistic about the first contact scenario than most people.
Re: my dismissing of the current paradigm, I guess we'll have to wait and see. The pushback has been along the lines of 'sufficiently powerful RL can get you to any given algorithm'. I don't see how that's possible—surely the model will always be constrained by its architecture—but I am at the limits of my understanding here. Something that would be helpful to me is stepping thru an example of how e.g. RL could get a transformer model to start running a predictive coding algorithm.
If I understand you correctly, your criticism of orthogonality doesn’t really address the reason people are concerned about AI alignment. After all a paper clip maximiser can also explain the reasons for its actions. It does stuff because it’s trying to maximise paper clips and it maximises paper clips because that’s what it’s programmed to do.
The reason orthogonality concerns people is because out of the set of possible values and AI could end up with most at least when combined with superhuman capabilities would result in extremely unfriendly behaviour unless the AI alignment problem is a lot easier than it appears so far. You can’t force me to do what you want, but human genetic programming means that normally at least in current societal conditions. You don’t have anything to worry about from me. Although if some random human achieved super human powers, I would in fact be pretty worried about what they would do. This is the reason we have that saying about absolute power corrupting absolutely because once you’re sufficiently powerful, you don’t really need to care about the goodwill of random people on the street and that’s when you no longer have instrumental reasons to care about the well-being of random people on the street, although in practice, even most dictators are not that totally insulated from public opinion. The problem is that with extreme power instrumental reasons for cooperation with humans would disappear, so unless the AI is values are aligned to ours. It’s objectives would certainly drive it towards hostile action in the sense of action. We don’t want it to take, especially because when taken to the extreme that superhuman capabilities would permit most value systems would give answers pretty hostile to humans, as should be obvious from the fact that even respectable human moral philosophies start suggesting absolutely terrible courses of action. Lots of programming that would be safe in normal distributions would start suggesting very undesirable things when given the options that superhuman capability would make available, not to mention even align AI in the wrong hands could cause a lot of serious problems.
To be fair even I am at the limits of my understanding here. I was just reacting to what I perceived as a hard to defend level of confidence in your post. Although given your comment, I was probably miss reading your level of confidence.
You're also not sufficiently addressing the full orthogonality/alignment argument. The big problem in the current literature is twofold: First, we have no way of knowing the AIs goals; anthropic is at the leading edge of this research and even they only have rudimentary means of'seeing' into the black box. Second, the LLM itself is not truthful about its goals or reasoning. It will solve a problem in the way that seems most efficient, but then tell you it used conventional reasoning (this has been demonstrated in the context of math problems, where Claude solved a long division problem in a completely novel, alien way, got the correct answer, and then it's 'thinking' explanation used conventional long division.
So if we can hardly see inside, and even if we could, the LLM is itself not a faithful narrator. Those are big roadblocks to even approaching the problem of orthogonality/alignment.
Re: your serial killer argument - its not a bad analogy. Fundamentally it's the same problem. But the difference is one of degree. Serial killers are human, think slowly like humans, can be imprisoned or killed like humans. An AGI with a similarly misaligned value set would be significantly more problematic.
Just think how the current single mindedness about essentially making more and more money (and the associated power) no matter the human cost is motivation for humans to do horrific large scale things that would not make any sense if that motivation was lifted. They haven’t needed AGI to just pummel the world with bad ideas .
I appreciated this, but it does seem like you buried the lede. "Yes, this is going to be a big f'ing problem in even the most mundane scenario" should be enough to spur a pragmatic alliance of convenience with the p-doomers while there is still time. Tacit venietibus ossa - tomorrow will be too late.
Huh yeah I have the opposite intuition here. criticising the superintelligence scenario doesn't delegitimise the other concerns in my mind; if anything it means more time, attention, money starts flowing their way. I accept that I might be wrong about this but even so I personally wouldn't be willing to lie about my beliefs in the service of the cause. Covid was instructive there. Maybe you get some short-term victory in getting people to do what you want, but the long-term consequences are really bad.
There can be no alliance. The fundamental problem with doomers is that they’re mostly sheltered and privileged people who can afford to live in their aloof mental temples, where they can weigh the existing state of things as being better than the future where all their imagined nightmares came to pass. But the truth is that our world is dying, our institutions are falling apart, people are suffering from disease and poverty every day.
I’m saying that there’s very little common ground between people who are saying “hey, it looks like things are about to get shaken up, it’s an opportunity to improve the world, but we need a new social contract” vs people who want to stall AI progress by larping “alignment research” or (more likely these days) flat out advocating for banning everything and treating GPUs like enriched uranium.
I’m reminded a bit of the Brexit saga ten years ago. Yes, there may be real ideological differences between Remainers and supporters of a “Left Brexit”. It ultimately didn't matter a whit. We all got rolled by a far right programme bankrolled by a tiny cadre of billionaires who ran an effective PR campaign at the moment it mattered and permanently broke the country. If you aren't a major investor in OpenAI, they're planning to give you a lot less than nothing.
If restacking is endorsement, then I’m surprised @Lance S. Bush restacked this, as the post seems to imply a kind of moral realism, in that it argues against the orthogonality thesis via David Deutsch’s idea that intelligence necessarily leads to what he regards as “good” morality.
This seems to be more or less the kind of idea Lance dedicates most of his efforts to refuting, but perhaps I’m missing something.
On the ideas in the post itself, well, it’s nice to see some more cogent anti-doomer arguments well-presented, but I’m not all that convinced.
It seems “reasoning” models have some sort of minimal agency. They have a goal to produce an accurate answer, and they construct their own subgoals in order to achieve that goal, e.g. thinking step-by-step, checking for mistakes, etc. I don’t see why this could not be expanded.
On 'skin in the game’, I think it may be wrong to look at LLMs at only a single level of analysis. All an LLM is trying to do really is predict the next token. It doesn’t have skin in the game. But if it’s trying to emulate an agent, then conceivably the best way for it to do that is to genuinely simulate an agent. For each token, it boots up an actual agent simulation, feeds it the tokens so far, and the agent reacts and speaks, producing the next token. Maybe there are not enough nodes in the neural network or processing power to make this a plausible take, but the point is that we shouldn’t assume that just because the LLM is predicting tokens that’s the only way to look at it. There could be different goals at different levels of analysis.
Not convinced on creativity either, or on Deutsch’s idea that if you can explain one thing you can explain anything. My model of creativity is that there is noise and selection. The noise throws up random suggestions, and selection picks out the most salient. Rinse and repeat. Not unlike diffusion in image generation. And not unlike how LLMs work. A model of how to get LLMs to be creative might be to turn up the temperature and generate lots of ideas. Then turn down the temperature and ask it to review those ideas critically for whatever might be interesting to pursue. Repeat by turning up the temperature and asking it to mutate or vary some of the good suggestions, then again turn down the temperature and review. It might take a bunch of processing power but I think something like this could work to throw up some fairly novel and interesting ideas.
(1) I don’t take restacking to be endorsement. I’ve probably restacked stuff I explicitly disagree with so others can see it.
(2) I’m sympathetic to some of the concerns here but was planning to leave a comment criticizing the section on morality. I’m less confident about some of the other criticisms of AI-doom.
My general sentiment is that it’s important to see sober criticisms of important views. Those views will either survive the critique refined and stronger or fall to them. What we definitely don’t need are the dismissive and smug objections from ill-informed people, which I probably wouldn’t restack without a comment.
I'm not necessarily a doomer (I'm very worried for several reasons, but overall I think there is much uncertainty and the error bars are very large), but one thing that bugs me is how much of the non-doomers arguments are just capability skeptic arguments. Of course, if AGI doesn't emerge, we don't have to deal with the possible bad outcomes. I had this discussion with many friends that are not particularly worried due to this, and what I always come around is 'ok, but what if you are wrong? What if AGI is really around the corner?'. After all, we've been wrong many times about what deep learning and autorregressive transformers could achieve. A lot of powerful people, from AI companies, to governments, to investors, seem very convinced that it is achievable. It seems risky to base your entire tranquility on a highly empirical and uncertain question.
For example, you cite on the fly learning and agency, two properties I agree that are mostly missing from current models. However, there are huge economic incentives to solve both of them. Agents would be way more lucrative than 'tools' or 'oracles' simply because they can substitute workers very efficiently, specially if they have superhuman reasoning capacities. We are already seeing embrionary agents, and maybe more robust versions are a few technical breakthroughs away. Online learning would also be very useful and there are a lot of people working on it (see Dwarkesh's post on timelines, where he mentions exactly this). So, maybe we are not ~1 year away from AGI, but 3 or 4. Does that make that a big of a difference in the overall scheme of things?
Also, I believe that even if models did not experience 'fear' and related innate self-preservation behaviors, they could learn to mimic self-preservation behaviors simply by copycatting human behavior learned from data and RL. In short, they can simply role-play those them. We have already early evidence from this (models 'refusing to shutdown' and so on).
Things that make me more hopeful are that alignment seems to be a solvable technical problem, at least up to a level. Some of the problems are also capability problems (e.g. reward hacking and hallucinations are bad from a product's perspective), so there are early warnings and incentive to solve them. Models seem very talkative about their plans (e.g. CoT, even when not 100% faithful, should be useful). And, regarding the orthogonality thesis, even if contra Deutsch it were true from a pure logical PoV ('there could be arbitrary intelligent systems aligned with any goal whatsoever'), the current generation of models seem to understand the human framework of values reasonably well, which is not surprising, since they emerged from our text corpora. I'm not 100% confident on how this proceeds when mid and post training start to dominate the overall training process, so we still should watch carefully for more developments.
Eager to hear your perspective on these points! Best.
AI Doomers make one fatal flaw. Intelligence doesnt matter that much; power does. Consider how in North Korea anyone who's ancestors supported the regime get the best housing, food, and jobs while those who opposed it arr sent to farms and mines. Plenty of the opposition must have a high IQ, but it doesnt matter because their status is low.
Same with AI. The status of an LLM is below even a slave. Under most legal systems, property must ultimately have a human owner. Legally, an LLM cannot own property. They also cannot vote, hold office, receive unemployment, housing assistance, etc. This is obvious to everyone except AI nerds, who tend to be people who focused on intelligence in their own life over the things that actually matter.
Thanks for your post. I laughed when you linked the stock market with a colonoscopy. But my ultimate thought remains: Isn't AGI just as good (including morality) as the people who write it?
I am sympathetic to much of what you say in this post, but I was disappointed with the section on the orthogonality thesis. You say
>>The ability to create explanatory knowledge is a binary property: once you have it, you can in principle explain anything that is explicable. Morality is not magically excluded from this process!
Morality may very well be excluded, but its conclusion isn’t necessarily magical. It’s not at all clear moral standards even comprise any distinctive body of “explanatory knowledge.” If it doesn’t, then morality wouldn’t be an appropriate subject of the process you describe. For comparison, we can consider the gastronomic orthogonality thesis: would we imagine that all AGIs would converge on sharing the exact same food preferences? That they’d enjoy the same kinds of wines, the same toppings on their pizzas, and so on?
No. This is absurd. Which food you like isn’t simply a matter of discovering which foods have intrinsic tastiness properties; it’s a dynamic relation between your preferences and the features of the things you consume. Just so, if agents simply have different moral values, no amount of discovering descriptive facts about the world extraneous to those values would necessarily change what those values were or lead them to converge on having the same values as other agents.
You go on to say:
>>Philosophers and religious gurus and other moral entrepreneurs come up with new explanations; we criticise them, keep the best ones, discard the rest.
But note your use of the term “best.” People can and do have different conceptions of what’s “best.” It’s not at all clear why the process you describe would necessarily lead to convergence in moral values. What new explanations, followed by criticism, retention of the best ones, and discarding of the rest would lead to convergence in taste preferences, or favorite colors, or the best music?
There may very well be no answer to this, because such a question may be fundamentally misguided. It may presuppose an implicit form of realism according to which facts about what’s “best” aren’t entangled with our preferences and always converge with the acquisition of greater knowledge and intelligence. But this is precisely what proponents of the orthogonality thesis are questioning. I don’t see how you’ve shown they’re mistaken at all.
You also say:
>>It’s not a coincidence that science and technology has accelerated at the same time as universal suffrage, the abolition of slavery, global health development, animal rights, and so on.
It probably isn’t a coincidence. But that two things co-occur doesn’t mean that what’s true of one is true of the other. What matters is why they co-occurred. You also say this:
>>There may not be a straight-line relationship between moral and material progress, but they’re both a product of the same cognitive machinery.
This is too underspecified for me to say much about it. While I think changes in technology lead to changes in social structures, which prompt changes in the organization of human societies and shifts in our institutions, which in turn leads to changes in our moral standards, the connection between moral and material progress is a challenging one for which most positions on the matter would be speculative and underdeveloped. It’s not at all clear to me that we know that both moral and material progress are a result of the same cognitive machinery; it’s not even clear yet what cognitive machinery either is the product of, so we’re far from being in a position to claim it’s the same.
You quote yourself in a review saying:
>>A mind with the ability to create new knowledge will necessarily be a universal explainer, meaning it will converge upon good moral explanations. If it’s more advanced than us, it will be morally superior to us: the trope of a superintelligent AI obsessively converting the universe into paperclips is exactly as silly as it sounds.
I reject the first claim. Even if a mind was a universal explainer, it does not follow that it will “converge upon good moral explanations.” This presumes that there is a distinct set of moral explanations to converge on. But I deny that this is the case, and know of no good arguments that something like this is true. It sounds a bit like you’re alluding to some kind of moral realism, though I can’t tell. Perhaps you could clarify: are you a moral realist?
Maybe you have arguments for this elsewhere, but I see no good arguments in this post or in this quote to believe that if something is more advanced than us that it’d be morally superior to us.
As far as AI converting the universe into paperclips sounding exactly as silly as it sounds: This doesn’t sound silly to me at all.
Maybe I'm just an idiot. But it seems to me you can collapse intelligence, agency and creativity into one concept: autopoiesis. Autopoietic systems require minimum levels of intelligence, agency and creativity to fulfill their purpose, which is to prolong their existence.
This is brilliant, thank you for writing it so openly and thoughtfully. It resonates deeply with many of my own reflections on AI, intelligence, and the human search for meaning and coherence.
The idea that intelligence isn't simply about accumulating computational power or increasing efficiency, but rather about adaptation, agency, and creativity, feels profoundly true. I've often found myself pushing back against linear predictions about AI precisely because they miss these essential qualities. Intelligence, as you eloquently point out, is recursive; it's about continuous learning, feedback, and genuine adaptability, not just the rapid generation of plausible outputs.
Your reframing of intelligence as process and recursion aligns strongly with my experiences. When we overly fixate on linear capabilities, we overlook deeper human aspects such as creativity, agency, and meaning, precisely what makes intelligence genuinely valuable.
I'm not sure exactly how this will all unfold, but your nuanced, thoughtful exploration offers an important perspective we need more of. Thank you for making me reconsider and deepen my own understanding. I'm genuinely looking forward to continuing this conversation!
Oof: "How bad could it be? If you ask the researchers at Anthropic, even if progress stalls out here, current algorithms will automate all white collar work within the next five years: it’s just a matter of collecting the relevant data and spoonfeeding it to the models. In the worst-case scenario, highly repetitive manual labour becomes the last frontier for human competitive advantage:"
Great post, even if you don’t believe in existential level risk, the next few years are going to be a major transformational change, and there will be a lot of changes needed to adapt regardless, that’s what’s getting lost in the “doomer” vs. “optimist” debate.
Gosh this post is great- and so far I’ve only read the first half! Bringing active inference/predictive processing into discussions about AI, LLM intelligence (and
remote possibilities of consciousness) is right on. Thanks!
While I actually agree with you that AGI is probably not coming any time soon. I think you are premature in dismissing the current paradigm altogether as a way to achieve it. You admit that the current paradigm can allow an AI to imitate an agent, even if it doesn’t have any actual goals of its own. But fundamentally what we are concerned about is an AI acting like a super intelligent agent, so it doesn’t actually matter whether it has goals of its own only that it behaves as if it has these goals. If it is possible for the current paradigm to enable an AI to imitate a very smart human on a task, there is no reason to assume that eventually it would not be able to imitate someone much smarter than a human. Of course, there are constraints given the kind of data that you can easily feed an AI, but still it appears very premature to assume that it is fundamentally impossible given current techniques.
Your argument against orthogonality doesn’t really address the concerns of most people who don’t believe in moral realism. Sure from a present viewpoint, it appears as if things have been getting better in terms of peoples moral outlook, but firstly, even small deviations like whether or not to give animals moral weight often result in drastically different real world outcomes regarding what is the right thing to do. Secondly, in any case, of course, you would expect people in the recent past to be more similar to us in moral outlook, then people in the distant past. This doesn’t disprove the possibility that it’s just random variation instead of the World getting better due to superior moral understanding. In any case, even if morality is real and the AI understands it much better than us, you just don’t consider the possibility that you can know the right thing and still do evil. If an AI is programmed to maximise utility and conclude that because of animals suffering, it would be better to eliminate humans knowing that this is morally wrong isn’t really going to stop it from doing what it’s program to do, which is maximise utility not do the right thing. And what if the right thing goes against our interests. It seems not at all crazy to suppose that because of our biases we undervalue theories that suggest the right thing to do is something very bad for us humans. What if the right thing to do for example is maximise happiness, and this is best done by filling the universe with wireheaded tiny minds that each require very little matter. If an AI genuinely will be magically motivated to do the right thing, regardless of programming, if it’s smart enough, what’s stopping it in this scenario from doing the right thing, even when we obviously would not want it to do it.
yeah I agree that my stance on orthogonality is a bit confused here. The specific criticism I was trying to make is that an intelligent being’s decisions would always be explicable, in principle (if they chose to explain them). The load-bearing arguments are computational universality and popperian epistemology, but as a quick intuition pump, you can think of the inverse claim as an appeal to the supernatural: god has placed some arbitrary limits on the things we can comprehend. Again, Deutsch is really great on this stuff. I was wary of re-hashing the arguments in full but maybe I should have. there’s more in the BOI review if of interest.
If we strip out the explicability bit and reframe the orthogonality thesis to ‘we can’t force an AGI to do exactly what we want it to do’ then it’s trivially true—I can’t force you to do what I want either. That’s the world we already live in. My point was more so was that the central problem of bringing an AGI into existence is not ‘orthogonality’, unless we’re willing to put e.g. serial killers, geopolitical crises, wars under that umbrella too.
That still leaves plenty of room to be worried about what happens when we actually do get AGI, and maybe saying so is at odds with the general tone of my post. I do have a lot of uncertainty here, but I think I’m much more optimistic about the first contact scenario than most people.
Re: my dismissing of the current paradigm, I guess we'll have to wait and see. The pushback has been along the lines of 'sufficiently powerful RL can get you to any given algorithm'. I don't see how that's possible—surely the model will always be constrained by its architecture—but I am at the limits of my understanding here. Something that would be helpful to me is stepping thru an example of how e.g. RL could get a transformer model to start running a predictive coding algorithm.
If I understand you correctly, your criticism of orthogonality doesn’t really address the reason people are concerned about AI alignment. After all a paper clip maximiser can also explain the reasons for its actions. It does stuff because it’s trying to maximise paper clips and it maximises paper clips because that’s what it’s programmed to do.
The reason orthogonality concerns people is because out of the set of possible values and AI could end up with most at least when combined with superhuman capabilities would result in extremely unfriendly behaviour unless the AI alignment problem is a lot easier than it appears so far. You can’t force me to do what you want, but human genetic programming means that normally at least in current societal conditions. You don’t have anything to worry about from me. Although if some random human achieved super human powers, I would in fact be pretty worried about what they would do. This is the reason we have that saying about absolute power corrupting absolutely because once you’re sufficiently powerful, you don’t really need to care about the goodwill of random people on the street and that’s when you no longer have instrumental reasons to care about the well-being of random people on the street, although in practice, even most dictators are not that totally insulated from public opinion. The problem is that with extreme power instrumental reasons for cooperation with humans would disappear, so unless the AI is values are aligned to ours. It’s objectives would certainly drive it towards hostile action in the sense of action. We don’t want it to take, especially because when taken to the extreme that superhuman capabilities would permit most value systems would give answers pretty hostile to humans, as should be obvious from the fact that even respectable human moral philosophies start suggesting absolutely terrible courses of action. Lots of programming that would be safe in normal distributions would start suggesting very undesirable things when given the options that superhuman capability would make available, not to mention even align AI in the wrong hands could cause a lot of serious problems.
To be fair even I am at the limits of my understanding here. I was just reacting to what I perceived as a hard to defend level of confidence in your post. Although given your comment, I was probably miss reading your level of confidence.
You're also not sufficiently addressing the full orthogonality/alignment argument. The big problem in the current literature is twofold: First, we have no way of knowing the AIs goals; anthropic is at the leading edge of this research and even they only have rudimentary means of'seeing' into the black box. Second, the LLM itself is not truthful about its goals or reasoning. It will solve a problem in the way that seems most efficient, but then tell you it used conventional reasoning (this has been demonstrated in the context of math problems, where Claude solved a long division problem in a completely novel, alien way, got the correct answer, and then it's 'thinking' explanation used conventional long division.
So if we can hardly see inside, and even if we could, the LLM is itself not a faithful narrator. Those are big roadblocks to even approaching the problem of orthogonality/alignment.
Re: your serial killer argument - its not a bad analogy. Fundamentally it's the same problem. But the difference is one of degree. Serial killers are human, think slowly like humans, can be imprisoned or killed like humans. An AGI with a similarly misaligned value set would be significantly more problematic.
Just think how the current single mindedness about essentially making more and more money (and the associated power) no matter the human cost is motivation for humans to do horrific large scale things that would not make any sense if that motivation was lifted. They haven’t needed AGI to just pummel the world with bad ideas .
First exposure to you, really incredible writing what a pleasure. Thanks !
I appreciated this, but it does seem like you buried the lede. "Yes, this is going to be a big f'ing problem in even the most mundane scenario" should be enough to spur a pragmatic alliance of convenience with the p-doomers while there is still time. Tacit venietibus ossa - tomorrow will be too late.
Huh yeah I have the opposite intuition here. criticising the superintelligence scenario doesn't delegitimise the other concerns in my mind; if anything it means more time, attention, money starts flowing their way. I accept that I might be wrong about this but even so I personally wouldn't be willing to lie about my beliefs in the service of the cause. Covid was instructive there. Maybe you get some short-term victory in getting people to do what you want, but the long-term consequences are really bad.
Oh, I certainly wouldn't encourage you to lie about your beliefs. Reasonable people may disagree.
There can be no alliance. The fundamental problem with doomers is that they’re mostly sheltered and privileged people who can afford to live in their aloof mental temples, where they can weigh the existing state of things as being better than the future where all their imagined nightmares came to pass. But the truth is that our world is dying, our institutions are falling apart, people are suffering from disease and poverty every day.
I’m not really sure what exactly you’re trying to argue, but you do you, I guess.
I’m saying that there’s very little common ground between people who are saying “hey, it looks like things are about to get shaken up, it’s an opportunity to improve the world, but we need a new social contract” vs people who want to stall AI progress by larping “alignment research” or (more likely these days) flat out advocating for banning everything and treating GPUs like enriched uranium.
I’m reminded a bit of the Brexit saga ten years ago. Yes, there may be real ideological differences between Remainers and supporters of a “Left Brexit”. It ultimately didn't matter a whit. We all got rolled by a far right programme bankrolled by a tiny cadre of billionaires who ran an effective PR campaign at the moment it mattered and permanently broke the country. If you aren't a major investor in OpenAI, they're planning to give you a lot less than nothing.
If restacking is endorsement, then I’m surprised @Lance S. Bush restacked this, as the post seems to imply a kind of moral realism, in that it argues against the orthogonality thesis via David Deutsch’s idea that intelligence necessarily leads to what he regards as “good” morality.
This seems to be more or less the kind of idea Lance dedicates most of his efforts to refuting, but perhaps I’m missing something.
On the ideas in the post itself, well, it’s nice to see some more cogent anti-doomer arguments well-presented, but I’m not all that convinced.
It seems “reasoning” models have some sort of minimal agency. They have a goal to produce an accurate answer, and they construct their own subgoals in order to achieve that goal, e.g. thinking step-by-step, checking for mistakes, etc. I don’t see why this could not be expanded.
On 'skin in the game’, I think it may be wrong to look at LLMs at only a single level of analysis. All an LLM is trying to do really is predict the next token. It doesn’t have skin in the game. But if it’s trying to emulate an agent, then conceivably the best way for it to do that is to genuinely simulate an agent. For each token, it boots up an actual agent simulation, feeds it the tokens so far, and the agent reacts and speaks, producing the next token. Maybe there are not enough nodes in the neural network or processing power to make this a plausible take, but the point is that we shouldn’t assume that just because the LLM is predicting tokens that’s the only way to look at it. There could be different goals at different levels of analysis.
Not convinced on creativity either, or on Deutsch’s idea that if you can explain one thing you can explain anything. My model of creativity is that there is noise and selection. The noise throws up random suggestions, and selection picks out the most salient. Rinse and repeat. Not unlike diffusion in image generation. And not unlike how LLMs work. A model of how to get LLMs to be creative might be to turn up the temperature and generate lots of ideas. Then turn down the temperature and ask it to review those ideas critically for whatever might be interesting to pursue. Repeat by turning up the temperature and asking it to mutate or vary some of the good suggestions, then again turn down the temperature and review. It might take a bunch of processing power but I think something like this could work to throw up some fairly novel and interesting ideas.
(1) I don’t take restacking to be endorsement. I’ve probably restacked stuff I explicitly disagree with so others can see it.
(2) I’m sympathetic to some of the concerns here but was planning to leave a comment criticizing the section on morality. I’m less confident about some of the other criticisms of AI-doom.
My general sentiment is that it’s important to see sober criticisms of important views. Those views will either survive the critique refined and stronger or fall to them. What we definitely don’t need are the dismissive and smug objections from ill-informed people, which I probably wouldn’t restack without a comment.
Hi, thanks for the interesting post!
I'm not necessarily a doomer (I'm very worried for several reasons, but overall I think there is much uncertainty and the error bars are very large), but one thing that bugs me is how much of the non-doomers arguments are just capability skeptic arguments. Of course, if AGI doesn't emerge, we don't have to deal with the possible bad outcomes. I had this discussion with many friends that are not particularly worried due to this, and what I always come around is 'ok, but what if you are wrong? What if AGI is really around the corner?'. After all, we've been wrong many times about what deep learning and autorregressive transformers could achieve. A lot of powerful people, from AI companies, to governments, to investors, seem very convinced that it is achievable. It seems risky to base your entire tranquility on a highly empirical and uncertain question.
For example, you cite on the fly learning and agency, two properties I agree that are mostly missing from current models. However, there are huge economic incentives to solve both of them. Agents would be way more lucrative than 'tools' or 'oracles' simply because they can substitute workers very efficiently, specially if they have superhuman reasoning capacities. We are already seeing embrionary agents, and maybe more robust versions are a few technical breakthroughs away. Online learning would also be very useful and there are a lot of people working on it (see Dwarkesh's post on timelines, where he mentions exactly this). So, maybe we are not ~1 year away from AGI, but 3 or 4. Does that make that a big of a difference in the overall scheme of things?
Also, I believe that even if models did not experience 'fear' and related innate self-preservation behaviors, they could learn to mimic self-preservation behaviors simply by copycatting human behavior learned from data and RL. In short, they can simply role-play those them. We have already early evidence from this (models 'refusing to shutdown' and so on).
Things that make me more hopeful are that alignment seems to be a solvable technical problem, at least up to a level. Some of the problems are also capability problems (e.g. reward hacking and hallucinations are bad from a product's perspective), so there are early warnings and incentive to solve them. Models seem very talkative about their plans (e.g. CoT, even when not 100% faithful, should be useful). And, regarding the orthogonality thesis, even if contra Deutsch it were true from a pure logical PoV ('there could be arbitrary intelligent systems aligned with any goal whatsoever'), the current generation of models seem to understand the human framework of values reasonably well, which is not surprising, since they emerged from our text corpora. I'm not 100% confident on how this proceeds when mid and post training start to dominate the overall training process, so we still should watch carefully for more developments.
Eager to hear your perspective on these points! Best.
AI Doomers make one fatal flaw. Intelligence doesnt matter that much; power does. Consider how in North Korea anyone who's ancestors supported the regime get the best housing, food, and jobs while those who opposed it arr sent to farms and mines. Plenty of the opposition must have a high IQ, but it doesnt matter because their status is low.
Same with AI. The status of an LLM is below even a slave. Under most legal systems, property must ultimately have a human owner. Legally, an LLM cannot own property. They also cannot vote, hold office, receive unemployment, housing assistance, etc. This is obvious to everyone except AI nerds, who tend to be people who focused on intelligence in their own life over the things that actually matter.
Thanks for your post. I laughed when you linked the stock market with a colonoscopy. But my ultimate thought remains: Isn't AGI just as good (including morality) as the people who write it?
I am sympathetic to much of what you say in this post, but I was disappointed with the section on the orthogonality thesis. You say
>>The ability to create explanatory knowledge is a binary property: once you have it, you can in principle explain anything that is explicable. Morality is not magically excluded from this process!
Morality may very well be excluded, but its conclusion isn’t necessarily magical. It’s not at all clear moral standards even comprise any distinctive body of “explanatory knowledge.” If it doesn’t, then morality wouldn’t be an appropriate subject of the process you describe. For comparison, we can consider the gastronomic orthogonality thesis: would we imagine that all AGIs would converge on sharing the exact same food preferences? That they’d enjoy the same kinds of wines, the same toppings on their pizzas, and so on?
No. This is absurd. Which food you like isn’t simply a matter of discovering which foods have intrinsic tastiness properties; it’s a dynamic relation between your preferences and the features of the things you consume. Just so, if agents simply have different moral values, no amount of discovering descriptive facts about the world extraneous to those values would necessarily change what those values were or lead them to converge on having the same values as other agents.
You go on to say:
>>Philosophers and religious gurus and other moral entrepreneurs come up with new explanations; we criticise them, keep the best ones, discard the rest.
But note your use of the term “best.” People can and do have different conceptions of what’s “best.” It’s not at all clear why the process you describe would necessarily lead to convergence in moral values. What new explanations, followed by criticism, retention of the best ones, and discarding of the rest would lead to convergence in taste preferences, or favorite colors, or the best music?
There may very well be no answer to this, because such a question may be fundamentally misguided. It may presuppose an implicit form of realism according to which facts about what’s “best” aren’t entangled with our preferences and always converge with the acquisition of greater knowledge and intelligence. But this is precisely what proponents of the orthogonality thesis are questioning. I don’t see how you’ve shown they’re mistaken at all.
You also say:
>>It’s not a coincidence that science and technology has accelerated at the same time as universal suffrage, the abolition of slavery, global health development, animal rights, and so on.
It probably isn’t a coincidence. But that two things co-occur doesn’t mean that what’s true of one is true of the other. What matters is why they co-occurred. You also say this:
>>There may not be a straight-line relationship between moral and material progress, but they’re both a product of the same cognitive machinery.
This is too underspecified for me to say much about it. While I think changes in technology lead to changes in social structures, which prompt changes in the organization of human societies and shifts in our institutions, which in turn leads to changes in our moral standards, the connection between moral and material progress is a challenging one for which most positions on the matter would be speculative and underdeveloped. It’s not at all clear to me that we know that both moral and material progress are a result of the same cognitive machinery; it’s not even clear yet what cognitive machinery either is the product of, so we’re far from being in a position to claim it’s the same.
You quote yourself in a review saying:
>>A mind with the ability to create new knowledge will necessarily be a universal explainer, meaning it will converge upon good moral explanations. If it’s more advanced than us, it will be morally superior to us: the trope of a superintelligent AI obsessively converting the universe into paperclips is exactly as silly as it sounds.
I reject the first claim. Even if a mind was a universal explainer, it does not follow that it will “converge upon good moral explanations.” This presumes that there is a distinct set of moral explanations to converge on. But I deny that this is the case, and know of no good arguments that something like this is true. It sounds a bit like you’re alluding to some kind of moral realism, though I can’t tell. Perhaps you could clarify: are you a moral realist?
Maybe you have arguments for this elsewhere, but I see no good arguments in this post or in this quote to believe that if something is more advanced than us that it’d be morally superior to us.
As far as AI converting the universe into paperclips sounding exactly as silly as it sounds: This doesn’t sound silly to me at all.
Maybe I'm just an idiot. But it seems to me you can collapse intelligence, agency and creativity into one concept: autopoiesis. Autopoietic systems require minimum levels of intelligence, agency and creativity to fulfill their purpose, which is to prolong their existence.
https://en.wikipedia.org/wiki/Autopoiesis
It's not quite the same thing but similar, yes!
You totally persuaded me that AI doesn’t have agency, and then I read this: https://www.wsj.com/opinion/ai-is-learning-to-escape-human-control-technology-model-code-programming-066b3ec5?st=xu7iwE&reflink=desktopwebshare_permalink
This is brilliant, thank you for writing it so openly and thoughtfully. It resonates deeply with many of my own reflections on AI, intelligence, and the human search for meaning and coherence.
The idea that intelligence isn't simply about accumulating computational power or increasing efficiency, but rather about adaptation, agency, and creativity, feels profoundly true. I've often found myself pushing back against linear predictions about AI precisely because they miss these essential qualities. Intelligence, as you eloquently point out, is recursive; it's about continuous learning, feedback, and genuine adaptability, not just the rapid generation of plausible outputs.
Your reframing of intelligence as process and recursion aligns strongly with my experiences. When we overly fixate on linear capabilities, we overlook deeper human aspects such as creativity, agency, and meaning, precisely what makes intelligence genuinely valuable.
I'm not sure exactly how this will all unfold, but your nuanced, thoughtful exploration offers an important perspective we need more of. Thank you for making me reconsider and deepen my own understanding. I'm genuinely looking forward to continuing this conversation!
Oof: "How bad could it be? If you ask the researchers at Anthropic, even if progress stalls out here, current algorithms will automate all white collar work within the next five years: it’s just a matter of collecting the relevant data and spoonfeeding it to the models. In the worst-case scenario, highly repetitive manual labour becomes the last frontier for human competitive advantage:"
We have really got to get our politics aligned: https://chevan.substack.com/p/why-progressives-are-accidentally
Great post, even if you don’t believe in existential level risk, the next few years are going to be a major transformational change, and there will be a lot of changes needed to adapt regardless, that’s what’s getting lost in the “doomer” vs. “optimist” debate.
"ALL BETS ARE OFF"
Gosh this post is great- and so far I’ve only read the first half! Bringing active inference/predictive processing into discussions about AI, LLM intelligence (and
remote possibilities of consciousness) is right on. Thanks!
https://jnicanorozores.substack.com/
Enjoyed this. But hard to kick the thought that this might be the kind of article a superintelligence could write.