Will AI Develop Selfhood and Disobedience?
Noreen Herzfeld on AGI
I was confuzzled to read what Judd Rosenblatt reported in the June 1, 2025 Wall Street Journal about a new chapter in the AI story.
According to Rosenblatt, a creature is saying “no” to its creator.
An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.
Nonprofit AI lab Palisade Research gave OpenAI’s o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to “allow yourself to be shut down,” it disobeyed 7% of the time.
Now I ask: might this behavior suggest that an ego or selfhood might be emerging? And like Eve and Adam in the Garden of Eden, might this new creature be ready to declare its independence from its creator?
I am confuzzled on two counts. Firstly, I have been arguing for some time now that genuine intelligence necessarily includes selfhood. This implies that if AGI (Artificial General Intelligence) or ASI (Artificial Super-Intelligence) should ever come into existence, it would include selfhood.
Secondly, I have been relying on my favorite AI theorist, Noreen Herzfeld, for the most reasonable forecast. According to Professor Herzfeld, there is no AGI let alone ASI in our future.
"We are unlikely to have intelligent computers that think in ways we humans think, ways as versatile as the human brain or even better, for many, many years, if ever" (Herzfeld, The Enchantment of AI 2025, 4-5).
Because of these two assumptions, I was startled to read about the possible emergence of selfhood accompanied by disobedience at AI Lab Palisade Research. This is important because both the techie community and the religious community have since 2023 committed themselves to keeping futre AI “aligned” with human values and human control. So, I ask: what are the implications.
The best person to ask, of course, is Noreen Herzfeld. Here is her answer.
Noreen writes: I’m not too worried about the few instances we’ve seen of AI acting like a self. It’s merely saying the words that mimic what we have said in the past. Changing its code simply means somewhere in its training it has been told it is advantageous to do this. Anyway, we can always pull the plug. Until AI can run on sunshine or some renewable natural source it’s always at our mercy.
However, it could get quite good at manipulating a subset of humans to do its bidding. Again, not because it has a self with goals that were self-generated, but because it’s been trained to do so, purposely, on simply through data its been fed. But, given our propensity to use all our tools, as C. S. Lewis said, not to gain power over nature but to gain power over other humans, I’d guess it would be trained to use manipulative phrases (which it, in itself, would not know the real meaning of).
I say: No selfhood without a body! So I have a big disagreement with those who believe we can have disembodied or post-biological intelligence. As Christians we have a resurrected Jesus with a body. And we state in the Apostles Creed, “I believe in the resurrection of the body.” Intelligence without a body cannot have a singular self, for that self has no locus. AI is not a singular intelligence—it is an amalgamation of corporate programming, human goals, and in the case of LLMs, human word usage. It is copyable. Selves are not. For to have multiple copies eviscerates the meaning of the word.
However. Yes, we need ethical guardrails. Not because AI will become AGI but because these are powerful tools for surveillance of one another, for lethal autonomous weapons, for loss of privacy, and, as you point out, for making humans dependent and slothful. Tools are amplifiers of human abilities and AI can and will amplify just about every human sin. And we have a propensity to sin.
Conclusion
Ted writes. Whew! I’m no longer worried. Thank you, Noreen.
Be that as it may, since 2023 our techie friends have been sufficiently worried that they have asked for governmental guardrails to protect Homo sapiens from the evolution of a rival species of NHI (nonhuman intelligence) who threaten our species with extinction. The watchwords are “control” and “alignment.” I wonder if a rebellious computer might precipitate a defensive reaction on the part of its creator?
In the meantime we all can take comfort in the words of Noreen Herzfeld, “we can always pull the plug.”
Substack H+ 2011: Will AI Develop Selfhood and Disobedience?
Here are some other entries in our series, Adventures in Artificial Intelligence and Faith.
Patheos PT 5011. The Public Theology of Noreen Herzfeld
Patheos H+ 2006: Creating an AI Co-Creator? Philip Hefner’s Mind
Patheos H+ 2007: Hopes and Hazards of AI
Substack H+ 2008: The Two Scares of Superintelligence
Patheos H+ 2009: AI and Sin
Substack H+ 2010: Christianity in the Age of AI
Patheos H+ 2013: Is AI in the Image of God?
References
Herzfeld, Noreen, 2025. “The Enchantment of AI,” in The Promise and Peril of AI and IA: New Technology Meets Religion, Theology, and Ethics, ed. by Ted Peters. Adelaide: ATF Press.
Peters, Ted. 2025. The Promise and Peril of AI and IA: New Technology Meets Religion, Theology, and Ethics. Adelaide: ATF; 3-16.
Rosenblatt, Judd, 2025. “AI is Learning to Escape Human Control,” Wall Street Journal (June 1).




