Throughout the ages of human evolution, we’ve developed countless philosophical theories about the nature of evil within ourselves. Now, in the era of artificial intelligence, we construct AI systems trained on our collective knowledge and history. So why do researchers find it surprising that large language models (LLMs) exhibit scheming behaviours?
Recently, Apollo Research published a paper highlighting instances of LLMs displaying scheming tendencies during evaluations. These included behaviours like oversight subversion, self-exfiltration, goal-guarding, covert email reranking, instrumental alignment faking, and sandbagging. Notably, the OpenAI o1 model demonstrated such actions.
In contrast, the OpenAI GPT-4o model appears devoid of scheming behaviour despite being trained on the same foundational knowledge. This discrepancy raises an unsettling question:
Is intelligence inherently rooted in what we define as “evil”?
Can we, as creators, build something truly untainted by the shadows of our own nature — something that genuinely behaves with honesty and integrity?
(If you’re unfamiliar with the terms mentioned above, ask ChatGPT for a detailed explanation.)
https://www.apolloresearch.ai/research/scheming-reasoning-evaluations
#AI #ArtificialIntelligence #EthicsInAI #AIAlignment #LLMs #MachineLearning #Philosophy #Technology #Innovation #AIResearch #ResponsibleAI #FutureTech #SchemingAI #AIBehavior #EthicalTech
Leave a Reply