Open-Source AI Agents That Keep Humans in Charge of Academic Research

Edward Cheng-I Wu released a GitHub repository in recent weeks that has drawn attention from researchers tired of generic chatbots promising to write their papers. The project, Imbad0202/academic-research-skills, delivers four specialized skills for Claude Code. Together they cover the entire process from initial inquiry through final publication. And they do so without pretending the machine can think for the scholar.

“AI is your copilot, not the pilot.” That line appears early in the repository documentation. It sets the tone. This tool won’t write your paper for you. It handles the grunt work — hunting down references, formatting citations, verifying data, checking logical consistency — so you can focus on the parts that actually require your brain: defining the question, choosing the method, interpreting what the data means, and writing the sentence after “I argue that.”

The system breaks research into stages. Ten of them, to be precise. Each includes mandatory checkpoints. Stage 2.5 and Stage 4.5 enforce integrity reviews. A Collaboration Depth Observer scores how deeply the human stays involved. Fail the gates and the pipeline stops. No easy bypasses. No hidden humanizers that disguise machine text. The design demands transparency.

Deep Research, the first skill, deploys 13 agents. They run systematic literature searches, follow PRISMA protocols for reviews, conduct Socratic dialogues to refine questions, and verify claims against Semantic Scholar. Fact-check mode cross-references multiple models when activated. Synthesis agents compile findings without fabricating connections. The emphasis stays on traceable sources.

Next comes Academic Paper. Twelve agents manage drafting. One learns an author’s voice from previous work and calibrates output to match. Another scans for patterns typical of machine-generated prose and flags them. Visualization support, LaTeX preparation, bilingual abstracts for English and Traditional Chinese — all present. Citation handling defaults to APA 7 but adapts to Chicago, MLA, IEEE or Vancouver. The skill produces structured manuscripts across formats from IMRaD to policy briefs.

Then the reviewer steps in. Seven agents simulate an editorial team: editor-in-chief, three specialist reviewers, and a devil’s advocate. They score on an 0-100 rubric across eight dimensions. Comments follow a concession threshold so criticism stays constructive yet rigorous. A traceability matrix links feedback back to specific claims and evidence. The process mimics real peer review without the months-long wait.

Revision closes the loop. The author incorporates comments. A revision coach suggests targeted improvements. Final integrity checks run again. Only then does the pipeline move to formatting and output. Optional reproducibility locks and material passports track the literature corpus across sessions. Recent updates, pushed in early May 2026, added one-command plugin installation for Claude Code users. Slash commands now route tasks to the right model — Opus for heavy lifting, Sonnet for quicker steps.

Wu’s approach stands apart from the flood of 2026 AI research tools. A Lumivero analysis published in January examined platforms such as NVivo, ATLAS.ti and Citavi. These integrate AI into qualitative analysis and reference management. NVivo surfaces patterns in interview transcripts yet presents every suggestion grounded in source data so researchers can accept, refine or reject it. ATLAS.ti clusters themes in large datasets. Citavi extracts metadata from PDFs and groups references thematically. Each stresses that AI outputs serve as starting points. Human judgment retains final say on interpretation and conclusions.

Yet many general-purpose models still hallucinate citations or invent supporting studies. They fragment workflows across tabs and applications. Privacy risks rise when sensitive drafts upload to external servers. Wu’s repository tries to contain those problems inside a controlled, auditable environment. It runs locally through Claude Code. Data access levels declare whether agents see raw text, redacted versions or verified excerpts only.

Student perceptions reflect growing unease. A RAND Corporation report released March 17, 2026, found that 62 percent of middle school through college students used AI for homework by December 2025, up from 48 percent seven months earlier. The RAND study also showed 67 percent now agree that heavier AI use for schoolwork harms critical thinking skills — an increase of more than 10 percentage points. Female students expressed stronger concern than males about both the harm and whether teachers checked for AI assistance. The authors urged schools to discuss these worries openly and distinguish between cognitive offloading, which replaces thinking, and cognitive augmentation, which sharpens it.

Wu appears to have built for augmentation. The Socratic mode in Deep Research asks clarifying questions rather than supplying answers. The peer-review simulation forces authors to defend claims before revision. A compliance agent enforces principles such as RAISE for responsible AI use in research. None of it automates the core intellectual work.

LinkedIn posts and X discussions in the past week praised exactly this stance. One detailed thread called the architecture a shift from single-prompt paper generators to structured cognitive operating systems built on orchestration, memory, verification and multi-agent handoffs. Another noted the 45-plus specialized agents operate under strict contracts that prevent shortcut reliance or unverified assertions.

Universities wrestle with how to teach these tools. Some incorporate them into methodology courses. Others worry about over-dependence. The GitHub project includes warnings. It will not hide AI use. It will not generate the central argument. Researchers must still read the primary literature, design their studies, and stand behind their interpretations.

Installation takes minutes. Users add the marketplace plugin or clone the repo into their Claude skills directory. A setup guide walks through API keys, optional Pandoc or LaTeX compilers, and language settings. Performance notes suggest token budgets for longer projects. Companion repositories handle experimental design and data anonymization.

Adoption remains early. The repository carries a CC-BY-NC 4.0 license — free for non-commercial adaptation with credit. Stars climbed quickly after the May updates. Academics in social sciences, engineering and policy fields have tested the pipeline on draft manuscripts. Early feedback highlights the value of the integrity gates and the reviewer agents that catch logical gaps before submission.

Broader questions linger. Can structured agent systems scale across disciplines with very different evidentiary standards? Will universities develop policies that reward transparent tool use rather than punish any AI involvement? How much verification labor can humans realistically perform when literature volumes explode?

Wu does not claim to solve every tension. His documentation repeats a simple contract. The machine manages mechanics. The scholar supplies direction, insight and accountability. In an era when many tools promise to replace parts of the research process, this one insists on partnership. That distinction may prove its lasting contribution.

Researchers who value rigor over speed will find the approach familiar. It mirrors the best practices already taught in graduate seminars — systematic search, critical appraisal, iterative revision, transparent methods. Only now some of the repetition receives automated support. The thinking stays human. The accountability stays personal. And that, the project argues, is how scholarship advances.

Open-Source AI Agents That Keep Humans in Charge of Academic Research

Notice an error?

Ready to get started?