The AI Cheating Arms Race: Why Detectors Fail Students and Schools Keep Buying Them

Schools across the U.S. and beyond poured money into software meant to catch students using artificial intelligence on assignments. The results have not matched the promises. False accusations pile up. Actual cheating slips through. And the technology meant to solve the problem has instead created new headaches for everyone involved.

Turnitin once boasted near-perfect detection rates. Independent tests told a different story. A study cited in reporting from The Conversation found the company’s tool identified AI-generated text only about 61 percent of the time. False positives remain a persistent worry. One percent error sounds small until multiplied across thousands of student submissions each semester.

Yet districts keep signing contracts.

Broward County Public Schools near Miami committed more than $550,000 over three years to Turnitin, according to an investigation by NPR. Administrators there say the software offers a starting point for conversations rather than final proof of misconduct. Other districts in Utah, Ohio and Alabama have made similar purchases despite mounting evidence of unreliability. Shaker Heights City School District in Ohio spends roughly $5,600 annually on GPTZero licenses for a handful of teachers.

More than 40 percent of sixth- through 12th-grade teachers used these detectors during the last school year, a nationally representative poll by the Center for Democracy and Technology revealed in the same NPR report. Teachers reach for them because the pressure to maintain academic integrity feels immediate. Students submit work that sometimes reads too polished, too consistent. The software provides an easy flag.

But flags can lie. Ailsa Ostovitz, a 17-year-old at Eleanor Roosevelt High School in Maryland, faced accusations on three separate assignments. Her teacher docked points without allowing much response. Ostovitz ran her papers through multiple detectors beforehand. Her mother met with the instructor. The teacher eventually expressed doubt. The episode left the student wary and her family frustrated. Cases like hers surface repeatedly.

Non-native English speakers bear a heavier burden. Stanford researchers documented the pattern years ago. Detectors flagged more than half of essays written by students taking the TOEFL exam as AI-generated. The bias persists. Zi Shi, a junior in Shaker Heights, saw his own work questioned. He attributes it partly to use of Grammarly. Carrie Cofer, a teacher in Cleveland, watched her own PhD dissertation register 89 to 91 percent AI. She now speaks out against heavy reliance on the tools.

The original article from The Next Web captured the dynamic early. AI cheating tools keep improving faster than detection can match. Humanizers rewrite machine-generated text to sound more natural. Autotypers simulate the slow, uneven pace of human typing, complete with fake typos, deletions and backspacing. Social media videos on TikTok and YouTube demonstrate exactly how to combine them. A New York Times investigation published yesterday shows the problem has only grown more sophisticated. The New York Times described an explosion of tutorials that tell students they can let AI handle the work and still avoid detection.

Big tech companies and startups market these evasion products with a knowing wink. Some videos carry ad labels. Others do not. The message stays consistent. Writing feels painful. Technology removes the pain. Detection becomes irrelevant.

Jenny Maxwell, head of education at Superhuman, which owns Grammarly, called the back-and-forth “a dead end.” She told The Next Web that the race between bigger cats and bigger mice cannot continue indefinitely. Withholding AI from students, she added, amounts to educational malpractice because they will use it in the workplace anyway.

Jack Clark, co-founder of Anthropic, offered a darker assessment in the same piece. The AI industry, he said, “has a gas pedal, but it doesn’t have a brake pedal.” His company writes much of the code that powers these systems. The incentives favor rapid deployment over caution.

Universities have started to pull back. Washington State University terminated its Turnitin AI detection contract earlier this year after estimating nearly 1,500 false positives in a single semester, according to coverage in EdCafe. One-third of academic integrity cases involving AI allegations at the school ended in acquittals between 2023 and 2025. Officials cited a cat-and-mouse dynamic that shows no sign of ending.

Students adapt in creative ways. Some record hours-long screen sessions to prove they wrote every word themselves. Others deliberately dumb down their language to avoid sounding too competent. The honest ones feel paranoid. The ones who cheat gain better tools each month.

Cheating rates themselves have not skyrocketed since ChatGPT appeared. Data from Education Week in 2024 showed the percentage of students admitting to academic dishonesty remained roughly flat. The nature of the dishonesty changed. AI makes it easier to produce passable work without deep understanding. Detectors were supposed to restore balance. Instead they have introduced doubt into the grading process on all sides.

C. Thi Nguyen, a philosopher, framed the issue through the lens of value capture. Schools optimize for grades rather than genuine learning. AI simply optimizes for the metric presented. Goodhart’s Law applies. When a measure becomes a target, it ceases to be a good measure.

Some educators have shifted strategies. They design assignments that require in-class writing, oral defenses or iterative drafts with clear process documentation. Others embrace AI as a teaching tool while demanding transparency about its use. The conversation has moved past binary questions of detection.

India took a blunt approach during medical entrance exams. Officials blocked Telegram for days to stop mass AI consultation among more than two million test-takers competing for limited spots. Such measures highlight the stakes but prove difficult to scale in ordinary classrooms.

Edward Tian, CEO of GPTZero, urges schools to treat his product as one item in a larger toolkit. Follow up with students, he says. Use the score as a conversation starter rather than courtroom evidence. Many teachers already follow that advice. Others do not. The variation creates inconsistency that students notice and sometimes exploit.

By 2026 the pattern looks entrenched. Nearly 90 percent of students report using AI in academic work at some point, according to data referenced in Copyleaks materials. More than half use it weekly. The tools have embedded themselves in note-taking apps, productivity software and everyday workflows. Preventing access entirely feels unrealistic.

So schools face a choice. They can continue spending on detectors that require constant updates and still produce errors. Or they can rethink what assignments measure and how educators evaluate authentic effort. The first path offers the illusion of control. The second demands harder work from everyone.

False positives damage trust. They discourage non-native speakers, neurodiverse students and anyone whose natural prose style diverges from the statistical average. False negatives reward those who learn the evasion tricks. The technology, in short, amplifies existing inequities while failing to solve the underlying problem.

Recent coverage from NBC News in January detailed how college students now spend time humanizing their AI drafts or spying on their own writing process to create defensible records. Lawsuits have emerged. Students expelled or suspended over disputed AI use have pushed back in court, citing emotional distress and unfair process.

The arms race continues. Each new model of language AI arrives with fresh capabilities. Each detector vendor promises better accuracy than the last. Students watch the contest on their phones and learn to play both sides. Teachers sit in the middle, expected to render perfect judgments with imperfect instruments.

Nothing suggests a technical fix lies around the corner. The question is whether institutions will keep chasing one or finally address the incentives that make outsourcing thought so tempting in the first place. The data, the cases and the expert warnings all point the same direction. Detectors alone will not restore integrity. Something deeper has to change.

The AI Cheating Arms Race: Why Detectors Fail Students and Schools Keep Buying Them

Notice an error?

Ready to get started?