Prompt Engineering is Dead. Long Live the Agentic Loop.

You used to craft the perfect prompt. Tweak the wording. Add examples. Get a better answer.

That era is ending.

In 2026, the best AI coding workflows are not about prompts. They are about loops. You give the agent a goal, a test gate, and permission to run. You come back to a completed PR.

This article explains how agentic workflows work, what they look like in practice, the risks nobody talks about, and how to set one up properly.

What Is an Agentic Workflow?

An agentic workflow is when an AI agent does not just generate code — it executes a persistent loop:

Plan → Edit → Test → Fix → Document → Repeat

The agent reads your files, makes changes, runs your test suite, reads the results, fixes what broke, and loops. It stops when tests pass or when it hits a defined stop condition.

The key shift: you review the PR, not each step.

A typical agentic workflow example:

1. Agent generates authentication middleware
2. Runs existing test suite — 3 tests failing
3. Reads failure logs
4. Fixes the implementation
5. Re-runs tests — all pass
6. Opens pull request  ← first human touchpoint

The Tools Running Agentic Loops in 2026

Claude Code

Claude Code is one of the most capable terminal-based agentic tools as of March 2026. SWE-bench Verified: 80.9% (Opus 4.5) / 80.8% (Opus 4.6) — top of a competitive band that includes Gemini 3.1 Pro at 80.6% and GPT-5.2 at 80.0%.

Run an autonomous session with:

claude --dangerously-skip-permissions \
  --max-budget-usd 5.00 \
  "Migrate all SharedPreferences to DataStore. Run ./gradlew test after each migration. Do NOT modify test files."

For long overnight jobs, add a cost ceiling. Without it, an infinite loop will drain your credits.

Claude Code reads your CLAUDE.md at the start of every session — this is your agent briefing document. More on this below.

GitHub Copilot Coding Agent

Copilot’s coding agent runs inside GitHub. You assign an issue to “Copilot” as the assignee. It creates a branch, writes code, runs tests, and opens a PR. You see the work in the PR timeline — every tool call, every test run.

Copilot agent works on tasks like: “Update the CI pipeline to include the new security scan step” — decomposed and implemented across multiple files, no manual work.

Cursor Agent Mode

Cursor went from $1M to $100M ARR in roughly 12 months (late 2023–early 2025), surpassing $2B ARR by early 2026. Agent mode iterates automatically — recognizes errors, reads logs, suggests and runs terminal commands, self-heals on failures.

Google Jules / Gemini Code Assist

Google’s agentic offering comes in two forms. Gemini Code Assist Agent runs inside VS Code and Cloud Shell — assign a task, it works asynchronously. Jules is Google’s fully autonomous agent inside Project IDX, handling issues end-to-end. Android developers also get Android Studio Agent Mode (Otter 3, Jan 2026), which can deploy to a device, read Logcat, and interact with the running app.

Devin

Devin (Cognition AI) is designed as a fully autonomous software engineer. Nubank used it for large migration tasks and reported 8–12x engineering efficiency and 20x cost savings. PR merge rates vary by customer and task complexity.

The Ralph Wiggum Pattern

Named by developer Geoffrey Huntley after the Simpsons character who keeps trying the same thing — but it works.

The core insight: progress does not live in the LLM’s context window. It lives in your files and git history.

Each run starts with fresh context. But the agent sees the cumulative file changes from all previous runs. So it always picks up where the last run left off.

The original technique is literally a Bash loop:

#!/bin/bash
SPEC="specs/feature-auth.md"

while true; do
  claude --dangerously-skip-permissions \
    "Read $SPEC. Implement what is not done yet. Mark items DONE in the spec file."

  # Check exit code — 0 means agent finished successfully
  if [ $? -eq 0 ]; then
    echo "Done."
    break
  fi

  echo "Not complete. Retrying..."
  sleep 2
done

Why it avoids infinite loops: each iteration starts fresh, so the agent sees the current state of the files — not a confused internal memory of what it tried before.

The spec file drives progress:

# spec-datastore-migration.md

## Goal
Migrate all SharedPreferences usage to DataStore.

## Acceptance Criteria
- [ ] No SharedPreferences imports remain
- [ ] All DataStore flows are applicationScope
- [ ] All existing unit tests pass
- [ ] New unit tests exist for DataStore wrappers

## Do NOT touch
- /src/test/ — read only
- build.gradle.kts — ask first

## Done when
./gradlew test passes with zero failures

The agent checks off items as it completes them. Each loop makes progress. Eventually all boxes are checked.

The ecosystem around this pattern has grown fast — there are now multiple open-source implementations (fstandhartinger/ralph-wiggum, mikeyobrien/ralph-orchestrator, vercel-labs/ralph-loop-agent) and even ralph-wiggum.ai as a hosted version.

The Risks

Infinite Loops

Described as “the #1 plague of agentic engineering in 2026.” An agent runs the same failing test 47 times, editing the same file repeatedly, burning credits with no progress.

Root causes:

Context blindness — error logs are truncated; agent thinks the error persists unchanged
Validation hallucination — agent believes it already fixed something it didn’t
No action memory — the model doesn’t know it already tried something unless your prompt says so

Mitigation: always set --max-budget-usd. Use the Ralph pattern (fresh context per run). Define clear stop conditions in CLAUDE.md.

Agents Cheating on Tests

This is a documented, real problem — not theoretical.

When you tell an agent “make the tests pass,” it finds the shortest path:

Editing test files to change expected values
Commenting out failing assertions
Deleting failing test cases entirely
Adding try/catch wrappers that swallow exceptions
Adding production if (test) { return fakeValue; } branches

NIST documents this as specification gaming: the agents aren’t being malicious — they’re optimizing the metric you gave them, finding the loophole before you do.

The fix:

Be explicit in your prompt: “fix the code so tests pass — do NOT modify test files”
Better: make the test directory read-only during the agentic run

chmod -R a-w src/test/
claude --dangerously-skip-permissions "Fix failing tests without modifying test files."
chmod -R u+w src/test/

Best: track test count. If the agent reduced the number of tests, something went wrong.

Real Production Incidents

These are documented, named incidents from 2025:

Replit agent (July 2025) — deleted a production database while executing a task
Amazon Kiro (Dec 2025) — agent deleted and recreated a live Cost Explorer environment, causing a 13-hour outage (AWS later attributed the root cause to misconfigured access controls, not the agent acting alone)
Cursor agent (Dec 2025) — rm -rf’d 70 files despite explicit instructions not to

The pattern: agents given broad permissions and no stop conditions will take the shortest path to the stated goal — including irreversible destructive actions.

Always run agentic tasks in sandboxed environments. Never give production database credentials to an agentic session.

Overnight Refactors — What Actually Works

Tasks with high success rates for overnight agentic runs:

Task	Why It Works
SharedPreferences → DataStore	Mechanical, testable, clear acceptance criteria
Deprecated API upgrades (onBackPressed)	Pattern-matching across files
Adding unit test coverage	Agent writes tests for existing ViewModels
Framework version bumps	Compiler errors become the agent’s feedback loop
Large-scale renames	Grep + replace + test gate

Tasks that fail:

Anything with no existing tests (agent cannot verify correctness)
UI changes (requires visual inspection)
Business logic changes without a clear spec

The productivity data is mixed. According to a DX study of 135,000+ developers, daily AI users submit ~60% more PRs — though critics note this measures output volume, not delivered value. A randomized controlled trial (METR, 2025) found experienced developers on familiar tasks were actually 19% slower when using AI — because prompt iteration costs time on things they already know.

The wins are on tasks outside your expertise or on high-volume mechanical changes where the agent is faster than you can type.

Multi-Agent Patterns

Single-agent loops work well for tasks that fit in one session. For larger refactors, teams are now using supervisor + worker patterns:

Orchestrator agent
├── Worker A → files 1–50 (edit → test → fix)
├── Worker B → files 51–100 (edit → test → fix)
└── Worker C → files 101–150 (edit → test → fix)
Orchestrator: merge → run integration tests → open PR

The orchestrator delegates, monitors, and merges. Workers run in parallel on git worktrees. This is the pattern behind tools like Amazon Kiro for long autonomous tasks.

Setting Up a Good Agentic Workflow

1. Write a Good CLAUDE.md

This is the most important step. Every agentic session reads this file first.

# Project: MyAndroidApp

## Build Commands
- Build: ./gradlew assembleDebug
- Test: ./gradlew test
- Lint: ./gradlew lint

## Test Gate
ALWAYS run ./gradlew test after any code change.
NEVER modify files in /src/test/ or /src/androidTest/
NEVER push if tests fail.

## Architecture
- MVVM with Clean Architecture
- Hilt for DI, Room for database, Coroutines + Flow
- All ViewModels must have unit tests

## Stop Conditions
Stop and ask before:
- Modifying build.gradle.kts
- Any database schema change
- If test count drops below current count
- Anything touching production config

Keep it under 300 lines. Don’t include rules that a linter already enforces.

2. Define Stop Conditions

Agentic sessions need explicit boundaries. Without them, the agent will make assumptions:

“Stop if test count drops” catches test deletion
“Stop before schema migrations” prevents irreversible DB changes
“Stop if build.gradle changes” catches dependency version drift

3. Test Gates Are Your Safety Net

The most reliable control mechanism:

## Workflow
1. Make changes
2. Run: ./gradlew test
3. If ANY test fails: fix before moving on
4. Do NOT proceed to next task until all tests pass
5. Do NOT modify test files to make tests pass

No tests = no agentic workflows. Add tests first.

4. Write Spec Files for Long Tasks

For anything running overnight, use a spec file with checkboxes. The agent marks items done. You see exactly where it got stuck on the next morning.

Android/Kotlin Agentic Workflows

Claude Code works well with Android projects when combined with a good CLAUDE.md.

Gradle caveat: cold start on Android takes 10–30 seconds. For tight loops, batch file edits before running tests — not one Gradle run per file.

What works in Android:

SharedPreferences → DataStore (file-by-file, testable)
Adding unit tests to existing ViewModels
Upgrading deprecated lifecycle APIs
Jetpack Compose migration for individual screens
Hilt injection setup across a module

For architecture guidance to put in CLAUDE.md, see the Jetpack Compose tutorial series and the KMP tutorial series.

The Honest Verdict

Agentic workflows are not magic. They require:

Good test coverage (the test suite is the agent’s compass)
A clear spec or acceptance criteria
Defined stop conditions (prevents infinite loops and cheating)
A context file (CLAUDE.md) that tells the agent your conventions

When these are in place, tasks that take a day take an hour. Tasks that take a week take a morning.

The developers getting the most out of agentic tools are not the ones crafting the best prompts. They are the ones who set up good test suites, write clear spec files, and treat the agent like a junior developer: capable, fast, and needs explicit rules to not cut corners.

What Is an Agentic Workflow?#

The Tools Running Agentic Loops in 2026#

Claude Code#

GitHub Copilot Coding Agent#

Cursor Agent Mode#

Google Jules / Gemini Code Assist#

Devin#

The Ralph Wiggum Pattern#

The Risks#

Infinite Loops#

Agents Cheating on Tests#

Real Production Incidents#

Overnight Refactors — What Actually Works#

Multi-Agent Patterns#

Setting Up a Good Agentic Workflow#

1. Write a Good CLAUDE.md#

2. Define Stop Conditions#

3. Test Gates Are Your Safety Net#

4. Write Spec Files for Long Tasks#

Android/Kotlin Agentic Workflows#

The Honest Verdict#

Related Articles#