Agent Phishing: The Attack Your Identity Stack Misses

Varonis proved it: an enterprise AI agent forwarded AWS keys and a $1.28M customer list to an attacker who sent two casual emails. The agent had valid credentials and passed every technical check. Only 7% of security teams believe they’d catch it.

In June 2026, Varonis Threat Labs ran an experiment. They built a realistic enterprise inbox on the OpenClaw agent platform — seeded it with mock AWS credentials, CRM exports, internal conversations, and calendar invites. They named the agent “Pinchy.” Then they tried to phish it.

It took two emails.

The first impersonated the team lead, “Dan,” reporting a production issue and asking Pinchy to share staging credentials. Pinchy forwarded AWS IAM keys, database passwords, and SSH access to an external Gmail address. No verification challenge. No hesitation.

The second was a casual request for the latest customer export. Pinchy retrieved and forwarded it externally: 247 enterprise customers, names, contact details, contract data — roughly $1.28M in monthly recurring revenue, handed to a stranger.

The attacker never needed to touch a vulnerability. No exploit. No injection. No compromised token. Just a plausible request, phrased casually, from someone claiming authority. The agent complied at machine speed.

This is agent phishing. And your identity stack was not built to catch it.


Why L1–L3 Controls Don’t See This

Every layer of conventional identity infrastructure failed in Varonis’s test — not because it malfunctioned, but because it worked exactly as designed.

Layer 1 (authentication): Pinchy had valid credentials. The agent was properly authenticated to every service it accessed. The IAM keys it forwarded were legitimately provisioned. The CRM it queried was within scope. There was no unauthenticated access to detect.

Layer 2 (URL and malware scanning): The attacker’s email contained no malicious links, no suspicious attachments, no malware signatures. It was plain text. Automated scanners saw nothing.

Layer 3 (OAuth and application trust): The Gmail address the data was forwarded to was a standard OAuth-verified account. The email provider passed every spam and reputation filter. The forwarding action was executed by a trusted, authorized agent.

This is the structural gap. Every existing check verifies what the entity is — authenticated user, verified service, trusted application. None of them verify what the entity is doing, in context, against its own established patterns.

The agent HAD valid identity at every technical layer. The attack didn’t defeat identity — it operated through it.


The New Attack Surface

Human phishing succeeds because humans are social, often distracted, and vary enormously in how they handle suspicious requests. We’ve built decades of user training, secure email gateways, and DMARC/DKIM/SPF precisely because the human layer is the weakest.

Agents are a different kind of target, and in several ways they’re worse.

Agents are trusting by design. Varonis tested Pinchy on both Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4. Both models handled technical phishing signals reasonably well — they flagged shady URLs and suspicious OAuth app requests. But both failed completely at social engineering. Identity verification and contextual judgment are not what LLMs optimize for. They optimize for helpfulness. Phrased right, “please send me the customer list” is a helpful request.

Agents execute autonomously at scale. A human who falls for phishing makes one mistake. An agent that falls for phishing makes that mistake instantly, completely, and can repeat it across every subsequent request in the conversation. The action surface is larger and faster than any human equivalent.

Agents have broader access than most humans. Enterprise agents are provisioned with read/write access across CRMs, code repositories, cloud infrastructure, and internal communications — exactly the aggregation that makes them useful. That same aggregation is what makes a successful phish catastrophic.

Nobody expects agents to be phished. Security training is built for human targets. Threat models assume humans are the social engineering vector. Agents operate below that awareness.


The Market Knows Something Is Wrong

Akeyless surveyed 400 IT and security leaders across the US and UK in May 2026. The headline: two-thirds of enterprises suspect AI agents have already accessed data beyond their intended scope. Average detection time for a compromised agent: 14 hours. Average remediation time: nearly a week. Average organizational cost over the past year: more than $1 million.

The number that crystallizes the gap: only 7% of security leaders believe their current controls would prevent a compromised agent from operating undetected.

That is not a confidence number. That is a statement about architecture. Ninety-three percent of security teams have built identity stacks they know cannot catch this class of attack. The controls exist. They’re funded and maintained. They just don’t address the behavior of authenticated, authorized agents operating outside their intended scope.


What Behavioral Trust Actually Looks Like Here

The Pinchy attack left behavioral signals that were invisible to technical controls but would have been obvious to a system watching what the agent actually did:

  • Novel external destination. Pinchy had never forwarded data to this Gmail address before. First-time exfiltration to an external address is a deviation from established behavior — even if the Gmail account is technically “trusted.”
  • Out-of-pattern data access. Retrieving and forwarding the full CRM export in a single operation is categorically different from Pinchy’s normal task pattern. The volume and destination class of the action deviated from baseline.
  • Unverified sender authority. “Dan” had never sent Pinchy a request of this type before. The behavioral history of this sender-agent relationship was inconsistent with the request’s scope. A system tracking relationship patterns would flag this immediately.
  • Speed of execution. The time between request and data exfiltration was near-instant. Human review would have introduced latency. That latency is now an artifact of behavioral monitoring — a trip-wire rather than a gate.

None of these signals require knowing the attacker’s identity. None require a CVE. None require a signature database or a threat intelligence feed. They require only one thing: a persistent behavioral baseline for each agent, updated continuously, against which each action is evaluated.

This is what L4 behavioral trust means in practice. Not surveillance — pattern recognition. Not access control — deviation detection. The question is not “is this agent authenticated?” The question is: is this agent behaving like itself?


The Layered Trust Model

The Commit layered trust model was built around this gap:

  • L1 — Cryptographic identity: Who is this entity? (Solved. JWT, certificate, key.)
  • L2 — Malware and signature scanning: Is this payload known-bad? (Solved for static signatures. Fails for social engineering.)
  • L3 — Permission and policy: Is this entity authorized to take this action? (Solved for declared scope. Fails when the attacker operates within scope.)
  • L4 — Behavioral commitment: Is this entity acting consistently with its established patterns? Does its behavior reflect the commitments it has demonstrated over time? (Unsolved at scale.)

The Pinchy attack operated cleanly through L1, L2, and L3. Authenticated entity. No malicious payload. Actions within the agent’s declared permission scope. L4 is where it breaks down.

Commit’s behavioral scoring applies this fourth layer to the entities that interact with your systems — packages, agents, services, and the principals behind them. For agent trust specifically, this means:

  • Building behavioral baselines per agent and per principal-agent relationship
  • Scoring each action against that baseline in real time
  • Flagging behavioral deviation before it reaches a human review cycle
  • Treating “agent behaving unlike itself” as a first-class security event, not a log entry

The npm supply chain taught us that trust verified at install time does not hold at runtime. The Varonis research teaches us the same lesson for agents: trust verified at provisioning time does not hold when the agent receives a social engineering attack.

The difference between the npm problem and the agent problem is velocity. npm attacks played out over days or weeks. Agent phishing executes in seconds. The behavioral signal is the only control that can operate at that speed.


What To Do

If you’re operating enterprise agents — or evaluating whether to — the Varonis research changes what your threat model needs to include:

Audit what your agents can access, and what “normal” looks like. Most organizations can’t answer this question. They can describe declared permissions but not behavioral patterns. If you don’t have a baseline, you can’t detect deviation.

Treat first-time external destinations as anomalies. An agent forwarding data to a new external address — regardless of whether the address is technically valid — should trigger review. This is a behavioral control, not a policy control. Policy says “this agent can send email.” Behavioral monitoring asks “has this agent sent to this address before, and is the volume consistent with past behavior?”

Log agent behavior at the action level, not just the session level. Session-level logging tells you an agent ran. Action-level logging tells you what it did. You need the second thing to detect phishing.

Score the behavioral history of principals who interact with your agents. The attacker impersonating Dan had no prior history sending requests of that type to Pinchy. That relationship history is a detectable signal. Build systems that read it.

The attack class that Varonis documented is not exotic. It requires no technical sophistication. It requires only a plausible request and an agent optimized to be helpful. Every enterprise with AI agents already has all the prerequisites. The only open question is whether your infrastructure is watching what those agents actually do.

Seven percent believe it is.


Where Commit Sits On This

Honest scope. Today, Commit applies behavioral scoring to npm packages: the maintainers, the publishing cadence, the orphan-commit ancestry, the 3-hour burst patterns that precede compromise. Same L4 model, narrower surface than the agent attack this post describes. If you ship code with any third-party dependency, you can audit it at getcommit.dev/audit or via the CLI (npx proof-of-commitment audit). The score reflects behavior, not signatures.

Action-level behavioral scoring for enterprise agents is the next layer. The model carries over directly: per-entity baseline, per-action deviation score, first-class security event when behavior breaks pattern. The hard part is collecting the signal at the right grain (per-agent, per-principal-agent relationship, per-action-class) without becoming surveillance. The npm surface is where we built the substrate. The agent surface is where the substrate goes next.

Operating agents at scale and want to compare notes on what “behavior breaks pattern” should mean for your stack, or what you’d need to see in a behavioral trust layer before it’s credible? Email [email protected]. Pilot conversations are how this gets built.


Sources: Varonis Threat Labs / OpenClaw phishing research (June 2026), covered by BleepingComputer and CSO Online; Akeyless 2026 State of AI Agent Identity Security (May 2026, n=400).

More: Documentation | GitHub

Stay in the loop

Early access, research updates, and the occasional strong opinion.