Trump’s AI Order Promises Early Government Tests. Critics Say It Won’t Stop Dangerous Releases

President Donald Trump signed an executive order Tuesday directing federal agencies to secure voluntary early access to the most powerful AI models for cybersecurity testing. The move comes after weeks of internal debate, a last-minute postponement and heavy industry lobbying that narrowed its scope. Yet fresh analysis from security experts warns the framework may deliver little more than the appearance of oversight.

The directive, titled “Promoting Advanced Artificial Intelligence Innovation and Security,” instructs the National Security Agency to create a classified benchmarking process. That process will decide which systems qualify as “covered frontier models” with advanced cyber capabilities. Developers can then submit those models for government review up to 30 days before wider release to trusted partners.

But the order explicitly bars any mandatory licensing, preclearance or permitting. Nothing forces companies to participate. And the government must build the necessary testing capacity on a tight timeline while recovering from staff cuts at key agencies.

The Voluntary Framework and Its Limits

Trump’s order emphasizes speed. Agencies have 30 days to prioritize cyber defenses for national security systems, establish an AI cybersecurity clearinghouse with industry and critical infrastructure operators, and issue guidance for protecting civilian systems. Within 60 days they must design the voluntary submission process and the benchmarking standards.

The White House fact sheet calls it the right balance between innovation and security. Trump himself has stressed not wanting to hinder American leadership against China. “We’re leading China… I don’t want to get in the way,” he said, according to AP News.

Yet the timeline worries insiders. The order also directs the Office of Personnel Management to expand hiring pathways for cybersecurity specialists within 60 days. It asks the Office of Management and Budget to hunt for grant funding to support vulnerability detection work. These steps read like acknowledgments that current capacity falls short.

Earlier drafts reportedly sought up to 90 days of government access. Industry pushback helped shrink that window. The final version defers heavier decisions. As Axios reported, the narrowed order lets the administration kick tougher rules for frontier models down the road.

Critics see a gap between ambition and execution. The testing depends on companies choosing transparency over competitive pressure. Even when models reach government reviewers, the short window and limited talent pool may not catch subtle risks.

Frontier systems prove especially slippery. They show emergent behaviors that shift with scale, fine-tuning, supporting software and real-world deployment. A model that looks harmless in isolated tests can turn potent once plugged into autonomous pipelines connected to actual infrastructure.

Vinh Nguyen, a senior fellow for AI at the Council on Foreign Relations, laid out the core problem. “Frontier AI systems are probabilistic, goal-directed, increasingly autonomous, and opaque. They do not have fixed capability ceilings.”

He continued, “A model that appears unremarkable in isolated testing could become a potent cyber tool when integrated into an autonomous pipeline with access to real-world digital infrastructure.” Nguyen warned that a definition for covered models that is too narrow risks letting genuinely dangerous systems ship. One too broad could overwhelm the scarce experts available to evaluate them.

Underneath it all sits what Nguyen calls an observability problem. “The government cannot assess what it cannot see, and frontier capabilities are visible only to the labs that build them.”

Matthew Ferren, an international affairs fellow in national security at the same organization, framed the order as an attempt to create a defensive head start. The goal: give U.S. defenders time to find and patch vulnerabilities before adversaries gain the same tools. “But that will likely prove difficult,” he wrote in the analysis cited by Ars Technica.

Ferren noted the challenge of patching critical systems at scale. He pointed to recent reductions at the Cybersecurity and Infrastructure Security Agency under Department of Government Efficiency efforts. Top talent left. Contracts were canceled. Institutional knowledge took a hit. The order assigns a prominent role to the Treasury Department, which Ferren suggests reflects where remaining capacity survives.

Even strong implementation faces limits. State-aligned actors already use frontier models to automate attacks. Researchers have replicated advanced vulnerability reasoning with open-weight systems. “It will likely prove difficult to develop models that are incapable of malicious hacking yet remain commercially compelling,” Ferren said.

The order responds in part to rising concerns over systems like Anthropic’s Mythos. Yet it stops short of mandating changes to deployment practices. Companies may submit for review seeking a quick approval rather than rigorous scrutiny. The 30-day government testing period, one former Trump AI advisor noted on X, leaves unclear what the intelligence community can realistically achieve to make models safer.

Dean Ball, that former advisor, questioned the practical payoff. The benefits of these voluntary reviews seemed “barely articulable” to him.

Implementation will test the administration’s priorities. The order directs the attorney general to prioritize prosecutions of those who use AI for illegal hacking, data theft or related crimes. It calls for better defenses at critical infrastructure, including rural hospitals. Binding operational directives from CISA aim to accelerate protective measures across federal systems.

But the core mechanism for frontier models remains collaborative and non-binding. No penalties for non-participation. No automatic halt on releases deemed risky. The clearinghouse for vulnerability scanning and patching relies on voluntary industry contributions.

Sen. Mark Warner, vice chairman of the Senate Select Committee on Intelligence, offered measured support. He welcomed the pre-deployment testing concept but signaled close watch for any partisan pressure on companies. His statement, reported across outlets including the White House coverage, recalled similar ideas from the prior administration’s own executive actions.

Industry voices have largely praised the approach. Representatives from OpenAI, Anthropic and Google signaled positive reactions to the voluntary framework, per AP reporting. Juan Londoño of the Cato Institute called it “imperfect but a step in the right direction.”

Still, the expert critiques cut deeper. Nguyen argued the order takes necessary steps toward classified benchmarking, voluntary evaluations and coordinated vulnerability management. The national security community will need these tools “for decades.” The danger lies in failing to evolve the assessments as fast as the technology moves. Otherwise evaluators will measure new models against outdated threats.

Ferren struck a similar note on long-term outcomes. Short-term cybersecurity gains may emerge. The broader picture stays murky. Pre-deployment testing, no matter how well designed, cannot eliminate every risk.

The order arrives amid rapid capability jumps. Models grow more autonomous. Their behaviors resist easy prediction. Integration into real systems multiplies their power in ways isolated benchmarks miss. And the government’s own workforce for this mission has shrunk in recent months.

So the voluntary framework launches with built-in constraints. Talent shortages. Tight timelines. Dependence on company cooperation. Narrow windows for action. These factors leave open the possibility that dangerous capabilities reach adversaries or the public before defenses catch up.

Trump’s team insists the United States must maintain its edge. Overly strict rules could hand advantages to foreign competitors. The order reflects that view. It promotes innovation first while layering on targeted security measures. But as Nguyen and Ferren document, the execution challenges run deep. Defining the models that matter, observing their true capabilities, recruiting evaluators fast enough, and translating test results into timely patches all demand more than good intentions.

Recent coverage reinforces the tension. Politico described the document as a downsized version following postponement last month. The industry scored victories in limiting scope and timeline. Reuters highlighted the focus on voluntary submissions and up to 30 days of testing for the most capable systems. CNBC noted the request for early model access to assess capabilities ahead of full release.

These accounts align on one point. The administration chose a lighter touch than some cybersecurity voices preferred. Whether that choice protects critical infrastructure or simply buys time remains the open question that experts now press.

The coming weeks will show how agencies translate the order into practice. CISA’s implementation directive is expected soon. The NSA’s benchmarking work begins in classified settings. Companies will decide how fully they engage. And the first covered frontier models will test whether 30 days delivers meaningful protection or merely a procedural checkpoint.

One thing looks clear already. The debate over AI oversight has not ended. It has shifted to questions of capacity, definition and execution. Trump’s order sets the stage. The hard work of making it effective starts now.

Trump’s AI Order Promises Early Government Tests. Critics Say It Won’t Stop Dangerous Releases

Notice an error?

Ready to get started?