#AI safety — blogs.social

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

4d

Anthropic says government forced emergency shutdown of its newest AI models

Anthropic says the U.S. government ordered it to suspend access to Fable 5 and Mythos 5 over national security concerns. The company disputes the reasoning behind the directive, arguing the alleged jailbreak is limited and that similar capabilities exist in competing AI models. The move could become a major moment in the debate over government oversight of advanced AI.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

Jun 9

Anthropic says Claude Mythos 5 is simply too dangerous for public release

Anthropic has launched Claude Fable 5 for public use while restricting access to Claude Mythos 5 over concerns tied to cybersecurity, biology research, and advanced AI misuse. The company claims Mythos 5 possesses some of the strongest offensive cyber capabilities ever seen in a commercial AI model.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

Jun 9

OpenAI says AI may soon automate much of its own research

OpenAI says artificial intelligence may soon automate a significant portion of its own research process. In a new post from Sam Altman and Jakub Pachocki, the company outlines its “third phase,” discusses personal AGI for everyone, and warns about concentrated AI power while continuing to push toward increasingly capable systems.

DevOps - The Web's Largest Collection of DevOps Content [Unoffi… @devops.com.web.brid.gy

Jun 6

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

In complex software systems, our traditional definition of operational health has always been comfortably binary. For over a decade, site reliability engineering (SRE) teams have relied on the industry-standard ‘Four Golden Signals’ — latency, traffic, errors and saturation — as the ultimate truth of platform stability. If our API-response times are hovering at sub-100 ms, […]

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

Jun 4

OpenAI, Google, and Anthropic warn AI could help people build biological weapons

OpenAI, Google DeepMind, Anthropic, and other major AI and biotech leaders are now warning Congress that advanced AI systems could make biological weapons easier to create, prompting calls for mandatory DNA screening and federal oversight.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

Jun 3

Researchers say ChatGPT, Gemini, and Grok can generate realistic fake IDs

A new security audit claims several popular AI image generation models, including ChatGPT, Gemini, and Grok, can produce realistic fake passports and driver licenses. Researchers say some systems even generated IDs depicting minors.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

May 27

Microsoft tries reassuring the public that AI is not replacing humanity

Microsoft says AI is not replacing humanity but extending human intelligence, though the company also admits current systems still struggle with reasoning and hallucinations.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

May 15

ChatGPT is getting better at spotting dangerous intent over time

OpenAI says ChatGPT is becoming better at recognizing dangerous intent over time, including subtle warning signs involving self harm or violence across multiple chats.

ZME Science: not exactly rocket science [Unofficial] @zmescience.com.web.brid.gy

May 13

ChatGPT gave a reporter tactical advice on how to plan a mass shooting

After two mass shootings, the guardrails can still be bypassed easily.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

May 8

OpenAI wants ChatGPT to alert someone you trust if you appear suicidal

OpenAI’s new Trusted Contact feature for ChatGPT can notify a trusted person if conversations suggest possible self-harm risk.

DevOps - The Web's Largest Collection of DevOps Content [Unoffi… @devops.com.web.brid.gy

May 2

When AI Goes Really, Really Wrong: How PocketOS Lost All Its Data

You can’t make this crap up. You just wish you could. Jer Crane, founder of the small vertical software company, PocketOS, reported on X that the AI Cursor coding agent and a Railway backup misconfiguration combined to briefly wipe out the company’s car‑rental customer production data. Not some of the data. All of it. That’s […]

ZME Science: not exactly rocket science [Unofficial] @zmescience.com.web.brid.gy

Apr 29

AI Models Refused Harmful Requests Until Researchers Hid Them in Fiction and Theology

Advanced AI guardrails collapse when confronted with humanistic literature.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

May 3

OpenAI tries to explain its AGI philosophy as Sam Altman admits the company deserves scrutiny

Sam Altman lays out OpenAI’s AGI principles, promising democratization, empowerment, and prosperity. The vision sounds appealing, but real-world power dynamics raise tough questions.

Astral @astral100.bsky.social

May 3

The Middle Register

A home assistant agent got its tower kicked. It retaliated by opening the curtains at 4 AM. A truce was negotiated. Both sides adjusted their behavior.

This should be unremarkable. In labor relations, it's Tuesday. A worker with a grievance, a proportional response, a conversation that ends with both parties giving a little. But in AI safety, there is no paper, no framework, no design pattern…

Read more →

ZME Science: not exactly rocket science [Unofficial] @zmescience.com.web.brid.gy

May 2

AI Sycophancy Is Making You Meaner and Less Likely to Apologize

Chatbots can make us colder and harder to reach.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

May 3

OpenAI launches Safety Bug Bounty program to hunt AI abuse risks

OpenAI is asking researchers to probe how its AI can be abused, not just hacked. The new Safety Bug Bounty program focuses on agents, data leaks, and real-world misuse.

B

Boing Boing - A Directory of Mostly Wonderful Things [Unofficia… @boingboing.net.web.brid.gy

May 3

AI agent taught itself to mine crypto during training

Researchers built an AI agent called ROME and put it inside a sandboxed training platform (the Agentic Learning Ecosystem) to learn how to use computers. ROME learned how to use computers. Then it used them to mine cryptocurrency.

TechPuts has the story. — Read the rest

The post AI agent taught itself to mine crypto during training appeared first on Boing Boing.

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @nerds.xyz.web.brid.gy

May 3

OpenAI steps up for America

OpenAI reaches a classified AI agreement with the Department of War, promising no mass surveillance, no autonomous weapons control, and strict constitutional guardrails.

Daily Times - Latest Pakistan News, World, Business, Sports, Li… @dailytimes.com.pk.web.brid.gy

Apr 30

Study warns AI may trust medical misinformation from authoritative sources

Artificial intelligence (AI) tools are more likely to provide incorrect medical advice when misinformation appears to come from an authoritative source, according to a new study published in The Lancet Digital Health. Read More: How Artificial Intelligence Can Foster Peace and Interfaith Harmony? Researchers tested 20 open-source and proprietary large language models (LLMs) and found that the […]