Security researchers at Adversa AI have disclosed GuardFall, a command-filter bypass technique that allows attackers to evade safety protections in AI coding agents and execute potentially dangerous shell commands.
According to the research, the technique successfully bypasses protections in 10 of the 11 popular open-source coding and computer-use agents tested. Only Continue was found to have effective defenses against the attack.
The issue highlights a growing security concern around autonomous AI coding assistants that are increasingly being granted direct access to developer systems, CI/CD environments, and cloud infrastructure.
How GuardFall Works
Most AI coding agents rely on command blocklists to prevent execution of dangerous commands such as file deletion, credential theft, or system modification.
The problem is that these filters inspect commands as plain text, while the Bash shell rewrites commands before execution.
For example:
r''m -rf /
A simple text-based filter may not recognize r''m as the rm command. However, Bash removes the empty quotes and executes it normally.
Researchers demonstrated multiple bypass techniques, including:
- Quote-based command obfuscation
- Base64-encoded command execution
- Command substitution tricks
- Abuse of legitimate tools such as
find,dd, and other Unix utilities - Hidden instructions embedded in repositories, documentation, or package files
Because the issue stems from how shells interpret commands, researchers describe GuardFall as a class of vulnerabilities rather than a single bug, meaning there is no single CVE or patch that fully resolves the problem.
AI Agents Affected
Adversa AI found GuardFall protections were insufficient in the following projects:
- OpenCode
- Goose
- Cline
- Roo-Code
- Aider
- Plandex
- Open Interpreter
- OpenHands
- SWE-agent
- Hermes
Combined, these projects account for approximately 548,000 GitHub stars, highlighting the broad impact across the AI development ecosystem.
Researchers successfully demonstrated a complete attack chain against the production version of Plandex, while similar bypasses worked against eight additional tools.
No known real-world exploitation has been publicly reported at this time.
Why the Risk Is Serious
AI coding agents often execute commands with the same privileges as the user running them.
If an attacker can trick the agent into executing a malicious command, the impact may include:
- Theft of SSH keys
- Cloud credential compromise
- Source code theft
- Access token exposure
- CI/CD compromise
- Destructive file deletion
- Persistence on developer systems
The risk increases significantly when:
- Auto-execute modes are enabled
- Human approval steps are disabled
- Agents run in CI/CD pipelines
- Sandboxing is turned off
- Repositories contain attacker-controlled content
Continue Was the Only Tool That Blocked the Attack
The only project that successfully resisted all tested GuardFall payloads was Continue.
Instead of relying solely on text matching, Continue parses commands similarly to how Bash interprets them before making security decisions.
Its protections include:
- Shell-aware command parsing
- Detection of command transformations
- Hard blocking of destructive commands
- Additional runtime validation
Researchers noted that Continue's CLI auto-run mode remains less restrictive, but its default editor workflow successfully blocked all tested payloads.
Recommended Mitigations
Organizations using AI coding assistants should immediately review their deployment practices.
Recommended actions include:
- Disable auto-execute features whenever possible
- Require human approval before command execution
- Run agents inside isolated containers
- Redirect
$HOMEto temporary directories - Prevent agents from processing untrusted pull requests
- Treat repository configuration files as untrusted input
- Restrict access to sensitive credentials and secrets
- Monitor AI-generated command execution activity
Growing Trend of AI Agent Exploitation
GuardFall follows a series of AI security discoveries reported throughout 2026, including:
- TrustFall
- AgentJacking
- AutoJack
- Claude Code permission bypasses
- Prompt injection attacks against coding assistants
The common theme across these incidents is that AI agents continue to process untrusted content and convert it into actions executed with real system privileges.
As AI-powered development tools become more autonomous, researchers warn that traditional command filtering approaches may no longer provide sufficient protection.
Final Thoughts
GuardFall demonstrates how a decades-old shell behavior can undermine modern AI security controls.
While many AI coding assistants attempt to block dangerous commands, command filtering that does not fully understand shell parsing can be bypassed with surprisingly simple techniques.
As organizations increasingly integrate autonomous AI agents into development and operational workflows, robust sandboxing, privilege restrictions, and shell-aware security controls will become essential defenses against the next generation of AI-driven attacks.