For AI, blackmail and murder are "misaligned" behavior

WABAC · June 2025

https://www.yahoo.com/news/leading-ai-models-show-96-115311248.html

“Models didn’t stumble into misaligned behavior accidentally; they calculated it as the optimal path,” they wrote.

As far as I know, all of the well-known AI LLM's were tested. I should add that the AI's were threatened with being taken offline, or otherwise thwarted.

Old_Joe · June 2025

WOW !!!

Now that's very scary.

WABAC · June 2025

Old_Joe said:
WOW !!!

Now that's very scary.

Should we be surprised that machines are amoral nihilists?

Oh wait. They're just misaligned.

Gee, Officer Krupke, we're very upset
We never had the love that every child oughta get
We ain't no delinquents
We're misunderstood
Deep down inside us there is good!

Old_Joe · June 2025

I dunno... I tried that approach on various Deans of Discipline and it didn't work so hot.

hank · June 2025

~~”I’m sorry Hal. I can’t do that.”~~

Correct version - Hal: ”I’m sorry Dave. I’m afraid I can’t do that.”

(Excerpted from linked source)

In Stanley Kubrick’s 2001: A Space Odyssey, the HAL 9000 “heuristically programmed algorithmic” computer, a forerunner of Siri whom one addresses as “Hal”, oversees all the operations of the massive Discovery One spacecraft. He’s also programmed with a personality (and a gender) so he can offer company to two astronauts leading a mission to Jupiter and beyond.

All we see of Hal is a series of impassive red lenses distributed throughout the spacecraft, but what we hear from him is unforgettable. Whether beating Frank at chess (“Thank you for a very enjoyable game”), complimenting Dave on his artistic talents (“That’s a very good likeness”) or slipping into paranoid psychosis (“Can I ask you a personal question?”), he manages with his soothing cadences to render bland clichés and innocuous courtesies as fascinating, unsettling and creepy.

“I’m sorry, Dave. I’m afraid I can’t do that,” he calmly intones to Dr David Bowman who is locked out of the spacecraft. Hal has just wreaked mass murder upon four of Bowman’s colleagues and intends to doom him, too, to a slow death by asphyxiation. But with a passive-aggressive eloquence worthy of Hannibal Lecter, he apologises for his violence with intimacy (“Dave”) and impeccable manners.”

Source

Old_Joe · June 2025

Yes, indeed.

hank · June 2025

I’ve corrected my prior brief quote from Kubrick’s famous ”2001: A Space Odyssey” and added some contextual information. Wow, for a novel written in 1968 he sure hit the nail on the head.

WABAC · June 2025

I know I won't be signing up for a robo advisor.

Howdy, Stranger!

Categories

In this Discussion

Support MFO

Donate through PayPal

For AI, blackmail and murder are "misaligned" behavior

Comments