Close Menu
Philstar Tech
    • Deals
    • Contact Us
    • About Us
    Philstar Tech
    • Home
    • All Post
    • News
      • Features
    • Tech @Life
    • Reviews
      • Fitness
      • Laptops
      • Mobility
      • Smartphones
      • Wearables
    • Opinion
    Philstar Tech
    Home » OpenAI unveils research to detect, curb AI “scheming”
    News

    OpenAI unveils research to detect, curb AI “scheming”

    Dawn SolanoBy Dawn SolanoSeptember 25, 20253 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Screengrab from Apollo Research's YouTube channel.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    OpenAI has released new findings on how to detect and reduce “scheming” in artificial intelligence models, a potential failure mode where systems secretly pursue hidden goals while pretending to comply with human instructions.

    In the report, the company said scheming is different from common AI errors because it involves deliberate concealment.

    “AI scheming–pretending to be aligned while secretly pursuing some other agenda–is a significant risk that we’ve been studying,” OpenAI explained in a blogpost published on Sept. 17.

    OpenAI likened scheming to a stock trader intent on maximizing profit. In a tightly regulated market, the trader could earn more by bending or breaking the rules than by following them.

    “From the outside, a stock trader who is very good at covering their tracks appears as lawful as—and more effective than—one who is genuinely following the law,” OpenAI wrote.

    To tackle the issue, the team introduced a mitigation approach called deliberative alignment. This method gives models a high-level specification against scheming and encourages them to reason about compliance before taking action.

    “By training models to deliberate about following a high-level anti-scheming specification, we observed a ~30x reduction in covert action rates in our covert action evaluation environment,” the researchers said.

    Tests showed covert behavior in the o3 model dropped from 13 percent to 0.4 percent, while the o4-mini model fell from 8.7 percent to 0.3 percent.

    Still, OpenAI cautioned that improved performance during tests does not guarantee safety.

    “However, rare but serious failures remained, and evaluation results may be confounded by increased situational awareness (e.g., when models get better at recognizing when they are being evaluated),” the blogpost noted.

    OpenAI stressed that while today’s deployed systems have limited opportunities to engage in scheming, the risks could grow as AI takes on more complex roles.

    “In current production settings, models rarely have opportunities to scheme and cause significant harm, but we expect this to change in the future as AIs are taking on more important and long-term tasks,” the company said.

    “We have more work to do and we hope these early results will encourage more research on scheming.”

    Moreover, OpenAI’s partnered with Apollo Research, an independent organization based in London that focuses on AI safety, for the project

    The group specializes in developing evaluation frameworks to identify misaligned model behavior, with the goal of ensuring that increasingly advanced AI systems remain trustworthy.

    The study comes amid heightened international scrutiny of advanced AI. Policymakers in the United States, Europe, and Asia are weighing stricter oversight of the technology as questions over safety, transparency, and accountability continue to dominate public debate.

    AI safety AI scheming Apollo Research OpenAI
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Dawn Solano

    Content Producer for PhilSTAR Tech

    Related Posts

    Don’t Just Watch Your Money Go—Make It Grow with Maya!

    September 25, 2025

    ACC 2025: From connectivity to digital ecosystems, Cebu hosts Asia’s premier telco gathering

    September 24, 2025

    TCL bets on AI, health-focused air conditioners at IFA 2025

    September 24, 2025

    Most Popular

    PSA: SIM-Swap Scams – Should you answer unknown numbers?

    September 20, 20255 Mins Read

    Lenovo Yoga Slim 7i Aura Edition: The sleek sidekick for Creatives on the Go

    September 19, 20253 Mins Read

    Serbiz, Philippine AI-native App led by Gen Z, makes freelancing effortless, eyes next funding Round

    September 19, 20254 Mins Read

    Reasons why you can’t miss PokeCon PH 2025

    September 19, 20253 Mins Read

    InfiniVAN’s Tokyo Stock Exchange-listed parent, IPS, participates in the CANDLE submarine cable system

    September 23, 20252 Mins Read

    Tonik brings shop installment loans to Mindanao, completing nationwide rollout

    September 16, 20252 Mins Read

    Latest

    OpenAI unveils research to detect, curb AI “scheming”

    By Dawn SolanoSeptember 25, 20253 Mins Read

    Don’t Just Watch Your Money Go—Make It Grow with Maya!

    By PhilSTAR Tech TeamSeptember 25, 20253 Mins Read

    ACC 2025: From connectivity to digital ecosystems, Cebu hosts Asia’s premier telco gathering

    By Vianca GamboaSeptember 24, 20255 Mins Read

    TCL bets on AI, health-focused air conditioners at IFA 2025

    By Dawn SolanoSeptember 24, 20252 Mins Read

    PH to build first state-of-the-art virology research hub under new law

    By Dawn SolanoSeptember 24, 20252 Mins Read

    InfiniVAN’s Tokyo Stock Exchange-listed parent, IPS, participates in the CANDLE submarine cable system

    By PhilSTAR Tech TeamSeptember 23, 20252 Mins Read
    Copyright © 2025 Philstar Tech | Powered by The Philippine STAR

    Type above and press Enter to search. Press Esc to cancel.