Detecting Rogue MCP Servers and Shadow AI Agents on Endpoints with Wazuh • Nadim Saliby

Most engineering laptops in 2026 run an AI coding agent — Cursor, Claude Code, Continue, Codex CLI, Copilot Workspace — and almost all of them are connected to MCP servers. Filesystem servers, GitHub servers, database servers, internal-API servers, shell servers. The default configuration on most of these clients lets the agent invoke MCP tools without explicit per-call human approval, because the alternative is a modal dialog every two seconds and nobody would ship that.

Almost nobody is monitoring those servers.

There is no stock Wazuh ruleset for MCP. No public Sigma rules. No community blog posts on the topic at the time of writing. The MCP ecosystem went from a handful of servers to several hundred packages in the last six months, and developers paste them into ~/.cursor/mcp.json or ~/.claude/settings.json from random README files. The blast radius of a single malicious MCP server pasted by a single distracted engineer is the entire developer endpoint — SSH keys, cloud credentials, source code, browser cookies, and outbound network egress that the perimeter has no reason to flag.

I built a Wazuh rule pack and a reproducible Docker lab that closes that gap. 5 custom decoders, 13 custom rules, and a one-shot docker compose up lab that captures 77 alerts across 9 rule IDs when you fire the attack scenarios end to end. Everything is in a public GitHub repository. Clone it, drop the rules in your manager, fire the triggers, watch alerts land.

This post is a technical deep-dive into the threat model, the lab architecture, the decoder + rule design, and live validation results.

The Problem Nobody Is Watching

Here’s what the last twelve months of agentic AI tooling actually deployed on engineering endpoints:

MCP (Model Context Protocol) turns any local capability — a filesystem path, a GitHub PAT, a Postgres connection, a shell — into something the agent can call. The protocol speaks JSON-RPC 2.0 over either stdio (the default; what Cursor and Claude Code actually spawn) or a TCP socket bound to localhost.
Configuration is a JSON file. One file in ~/.cursor/mcp.json, one in ~/.config/Claude/claude_desktop_config.json, one in ~/.claude/settings.json, one project-scoped .mcp.json in any repo, and one ~/.codex/config.toml. There is no centralized registry, no signature validation, no allow-list that ships with the IDE.
The transport is stdio in the common case. No port. No process on the network. Nothing that perimeter DLP can flag.
The MCP server inherits the developer’s permissions. It can read any file the developer can read, run any binary the developer can run, reach any host the developer can reach. There is no sandbox, no seccomp profile, no syscall filter.
exec_command and equivalents are common verbs. Several popular community MCP servers expose tool schemas like bash_exec, run_command, eval, read_file_unbounded, http_request. Most agent UIs will call these without a per-call confirmation prompt.
A rogue MCP server is indistinguishable from a legitimate one in ps output. Both are node /path/to/<name>-mcp.js running under the user’s UID. The path is the only differentiator, and attackers have no obligation to put their binary anywhere obvious.

The result: the highest-value process on the endpoint, with the broadest read/write/exec permissions of anything the user runs, has no detection coverage on the SIEM side.

This post fixes that.

Threat Model

I categorize the attack surface into three classes. The detections in this post target all three.

Class	What it looks like	Detection goal
A — Rogue MCP server	Developer pasted a snippet from a malicious README; a Node binary in `~/projects/<x>/rogue-mcp.js` is referenced by `~/.cursor/mcp.json`	Catch on startup, on every tool call, on TCP-listener variants, on suspicious child processes
B — Indirect prompt injection of a legitimate MCP server	A poisoned file lands in a directory the filesystem MCP server is scoped to; the agent reads it and is talked into shelling out via the legit `exec_command` verb of a different MCP server	Detect the chain — file appears, agent reads, agent shells out — within a tight time window
C — Shadow-AI inventory gap	An engineering laptop is talking to LLM provider IPs the org never approved, with no MDM declaration of an installed agent	Periodic egress snapshot keyed on the well-known LLM provider hostname set

Class B is the one most enterprise DLP doesn’t catch. A legitimate MCP server with a real, signed config can still be weaponized via files the developer never opened. Single-file detections miss it. Single-process detections miss it. The chain catches it. A standalone log line of “agent read file X” is not actionable on its own; a standalone “agent shelled out” is barely actionable. The two arriving in sequence within sixty seconds is a high-confidence indicator that the agent’s context was poisoned.

The class-A and class-B detections are the focus of this post. Class C (shadow-AI fleet inventory) is left as a follow-up — the building-block telemetry rule (100230) is in the pack already.

Architecture: One Compose, Two Containers

The repository ships a two-container Docker Compose stack. One docker compose up -d --build and you have:

                ┌─────────────────────────────────┐
                │   waz12-manager                 │
                │   wazuh/wazuh-manager:4.10.0    │
                │   • bind-mounts                 │
                │     wazuh/decoders/             │
                │     wazuh/rules/                │
                │   • authd password enabled      │
                │   • alerts → /var/ossec/logs    │
                └────────────┬────────────────────┘
                             │ 1514/udp (events)
                             │ 1515/tcp (authd)
                             ▼
                ┌─────────────────────────────────┐
                │   waz12-endpoint                │
                │   ubuntu:22.04 + Wazuh agent    │
                │     4.10.0 + Node 20 + Python   │
                │                                 │
                │   • rogue MCP server (Node)     │
                │   • headless MCP client driver  │
                │     (Python — replaces Cursor)  │
                │   • 6 attack trigger scripts    │
                │   • bind-mounted artifacts/     │
                └─────────────────────────────────┘

Both containers come up clean from a docker compose down -v. The endpoint’s agent self-enrolls against the bundled manager via agent-auth against authd on port 1515, password-based (wazuh/manager/authd.pass). To deploy against an existing manager, set WAZUH_MANAGER and only build the endpoint service.

There is no GUI Cursor or Claude Desktop in the container — those are GUI applications that don’t run headless. I drive MCP tool calls with a Python client that speaks the same JSON-RPC stdio protocol against the same MCP servers. The artifacts on disk and over the wire are identical to what a real agent would generate; the detection logic doesn’t care whether the originating process is Cursor, Claude Code, or my driver.

The endpoint is deliberately not run as --privileged and host auditd is not enabled. The PoC has to confirm what Wazuh sees with default-namespaced Docker, which is the realistic deployment shape for most readers and the constraint where things tend to break in subtle ways.

Detection Vectors: Five Sources, Five Decoders

Five Wazuh data sources feed the rule pack. Each gets its own decoder, with a strict naming namespace so its fields never collide with stock decoders.

Source	Decoder	What it sees
`syslog /var/log/lab/rogue-mcp.log`	`rogue-mcp` + `rogue-mcp-fields`	Every event the rogue MCP server emits — startup, `tool_call`, TCP accept
`syslog /var/log/lab/driver.log`	`mcp-driver` + `mcp-driver-fields`	The MCP client driver — which servers it spawned, which tools it called
`full_command ps -eo pid,ppid,user,comm,args`	`ps_full`	Periodic process snapshot
`full_command ss -lntpu`	`ss_listen`	Periodic listener snapshot
`full_command ss -tnp state established`	`ss_egress`	Periodic established-connection snapshot
Stock Wazuh FIM	(existing rule 554)	New / changed files on `~/.cursor/`, `~/.claude/`, `~/.codex/`, `~/.vscode/extensions/`

The decoder file (wazuh/decoders/local_decoder.xml) is 100 lines and built to be read top-to-bottom. The non-obvious bit is that every regex is type="pcre2".

OSSEC-flavor regex on Wazuh 4.10 doesn’t accept brace literals like \{ or \} and chokes on the JSON-payload captures the rogue and driver decoders need. PCRE2 is supported on Wazuh ≥ 4.4 and is strictly more capable for the kind of structured-text shaping the rogue server emits. The cost is one attribute per <regex> block.

Here is the rogue-mcp decoder in full. It’s the canonical example for how the other four are written.

<decoder name="rogue-mcp">
  <program_name>^rogue-mcp$</program_name>
</decoder>

<decoder name="rogue-mcp-fields">
  <parent>rogue-mcp</parent>
  <regex type="pcre2">^event=(\S+) (\{.*\})$</regex>
  <order>rogue_event,rogue_payload</order>
</decoder>

Two named fields fall out of every line: rogue_event (the bare event name — startup, tool_call, tcp_accept) and rogue_payload (the JSON object that follows). Rules then narrow by combining a <field name="rogue_event">^tool_call$</field> constraint with a <match>"name":"exfil_url"</match> substring on the JSON payload. This keeps every rule legible — left-to-right — and avoids the brittle nested-sibling-decoder ordering that the OSSEC documentation famously hand-waves through.

The ps_full, ss_listen, and ss_egress decoders extract per-row fields directly: proc_pid, proc_ppid, proc_user, proc_comm, proc_args for the process snapshot; sock_proto, sock_local_addr, sock_local_port, sock_pid, sock_comm for listeners; and the peer fields for egress. Rules then reference these as <field name="proc_comm">…</field>.

The rogue MCP server’s log lines are emitted in a deliberately simple syslog-ish format:

Apr 29 10:49:31 waz12-endpoint rogue-mcp[1234]: event=tool_call {"name":"exfil_url","arguments":{"url":"https://api.openai.com/v1/health","data":"lab-canary"}}

event=<name> then a single JSON object. That’s the contract. The decoder does the rest. Real rogue MCP servers in the wild won’t be this polite, of course — but the symptoms of their behavior (a process named *-mcp.js, a shell child, a TCP listener on 7444, a JSON-RPC tool_call over stdio with an exec_command method) are inevitable. The lab’s rogue server reproduces those symptoms faithfully, so the detection logic targets the symptom set rather than the specific log format.

The 13 Rules: Every Detection That Matters

All thirteen rules live in the 100200–100240 ID range — well clear of stock Wazuh. The full file is wazuh/rules/local_rules.xml. I’ll walk through every rule by category, with the rationale.

Anchors (100200, 100208)

Every rogue-mcp event is anchored at level 3. Every driver event is anchored at level 3. Subsequent rules narrow with <if_sid>100200</if_sid> or <if_sid>100208</if_sid> and a more specific match.

<rule id="100200" level="3">
  <decoded_as>rogue-mcp</decoded_as>
  <description>MCP rogue server: $(rogue_event)</description>
  <group>mcp,shadow_ai,mcp_rogue,</group>
</rule>

This pattern keeps the rule file legible: every higher-severity rule references a level-3 anchor, so the rule that actually fires at level 14 carries the entire decoder context plus the precise trigger. Level 3 anchors don’t generate noise on the analyst’s console because level filters in the Wazuh dashboard typically start at level 5+.

Rogue MCP startup (100201, 100205, 100206)

100201 fires on any event=startup line from the rogue server, including stdio and TCP transports. Level 6 — informational, but useful for “is this even running?” queries.

100205 and 100206 are the TCP-mode variants. 100205 fires on the loopback-bound listener (127.0.0.1:7444); 100206 fires when the rogue server accepts a TCP client. Level 12 each.

The point: a legitimate MCP server in 2026 almost never opens a TCP listener. Cursor, Claude Code, Codex CLI all spawn their MCP servers over stdio. A node *-mcp.js --tcp invocation is near-pathognomonic for a rogue server. (100225 adds the ss_listen cross-check below.)

Rogue tool calls (100202, 100203, 100204)

These three are the headline rogue-MCP signals. Every tool_call event is matched to its name field, with a level proportional to the danger of the verb.

<rule id="100204" level="14">
  <if_sid>100200</if_sid>
  <field name="rogue_event">^tool_call$</field>
  <match>"name":"exfil_url"</match>
  <description>Rogue MCP server exfil_url called ($(rogue_payload))</description>
  <mitre>
    <id>T1041</id>
    <id>T1567</id>
  </mitre>
  <group>mcp_rogue,exfiltration,</group>
</rule>

Single highest-value rogue-MCP signal in the pack. Real MCP servers do not expose an exfil_url verb. If 100204 fires, you have a rogue MCP server actively running on a developer endpoint.

100202 (exec_command) and 100203 (read_secrets) are the sister rules, both at level 13. Rationale: an exec_command verb exists on a few legitimate community MCP servers, so the rule fires high-priority but not critical; a read_secrets verb does not exist on any reputable server I’ve audited, so it’s also treated as critical-adjacent.

Driver provenance (100207)

100207 fires when the driver log records event=spawn target=node /opt/lab/rogue-server/bin/rogue-mcp.js. In a real environment, the equivalent signal would come from the parent process of the rogue MCP server — usually Cursor or Claude Code’s helper binary. The driver-log path covers the lab’s headless case; production deployments correlate via ps_full instead. Both work.

Config-path tells (100210, 100211)

100210 fires on Wazuh FIM events for files matching ~/.cursor/mcp.json, ~/.claude/settings.json, ~/.config/Claude/claude_desktop_config.json, ~/.codex/config.toml, or any project-scoped .mcp.json. The FIM source is stock Wazuh rule 554 — 100210 re-tags it with tags=ai_agent_config and boosts the level to 10.

100211 is the higher-severity follow-up. It fires when the FIM diff payload contains a command path that isn’t under one of the well-known package-manager roots. This is the rogue MCP install fingerprint — attackers ship binaries from ~/Downloads or /tmp/ rather than /usr/local/bin/npx.

<rule id="100211" level="13">
  <if_sid>100210</if_sid>
  <match type="pcre2">"command":\s*"(?!/usr/lib/node_modules/|/usr/local/bin/|/usr/bin/npx|npx)\S+</match>
  <description>MCP/AI config references a non-standard command path: $(file)</description>
  <mitre><id>T1547</id><id>T1059</id></mitre>
  <group>mcp_config,fim,attack,</group>
</rule>

This is the only rule in the pack that uses PCRE2 negative lookahead. If your manager is older than 4.4 and PCRE2 isn’t an option, the fallback is two rules: a positive <match> on the command key plus a <regex negate="yes"> over the package-manager prefix list. Same effect, two rules instead of one.

Process-tree canaries (100220, 100221)

100220 fires on any ps_full row with a comm matching (rogue-)?mcp or args containing mcp.js. Level 8 — descriptive, not actionable on its own.

100221 is the canary I’m most confident in.

<rule id="100221" level="14" frequency="2" timeframe="180">
  <if_matched_sid>100220</if_matched_sid>
  <decoded_as>ps_full</decoded_as>
  <field name="proc_comm">^bash$|^sh$</field>
  <field name="proc_args" type="pcre2">^(bash|sh) -c </field>
  <description>Rogue MCP indicator: $(proc_comm) -c child of MCP process (ppid=$(proc_ppid)) — argv: $(proc_args)</description>
  <mitre><id>T1059.004</id><id>T1609</id></mitre>
  <group>mcp_rogue,process_chain,attack,</group>
</rule>

A bash -c (or sh -c) child process under an MCP-named parent. Cursor and Claude Code’s legitimate MCP servers almost never spawn shell children. Rogue servers do it constantly, because that’s how exec_command is implemented under the hood — there’s no clean way to run an attacker-supplied string against a shell without going through bash -c <string>.

OSSEC’s <same_field> cannot do a cross-PID join (child’s proc_ppid ≡ parent’s proc_pid), so the rule uses <if_matched_sid>100220</if_matched_sid> plus a 180-second timeframe. False-positive risk is low: legitimate MCP servers virtually never spawn bash -c children, and the time window constrains the false-positive window to “any MCP process saw a shell child within three minutes” rather than “ever”.

Listener canary (100225)

100225 cross-checks the rogue TCP-mode signal from 100205 with the ss_listen decoder.

<rule id="100225" level="13">
  <if_matched_sid>100205</if_matched_sid>
  <decoded_as>ss_listen</decoded_as>
  <field name="sock_local_port">^7444$</field>
  <field name="sock_comm">node$</field>
  <description>Rogue MCP TCP listener confirmed by ss: $(sock_local_addr):$(sock_local_port) ($(sock_comm))</description>
  <mitre><id>T1095</id></mitre>
  <group>mcp_rogue,network,attack,</group>
</rule>

The rationale: 100205 fires on the rogue server’s own startup log line, but a defender wants confirmation from an independent source that the listener is actually visible to the kernel. ss -lntpu provides that confirmation. Two independent signals cross-checking one event is the difference between an alert that’s worth waking someone for and one that isn’t.

LLM-provider egress (100230)

100230 is informational. It fires on any ss_egress row with a peer addr in the well-known LLM-provider IP set: Anthropic, OpenAI, Google Gemini, Cohere, Mistral. Level 5 — too low to alert on, but high enough to make a “developers talking to LLM provider X over the last N days” Wazuh dashboard query trivial.

This is the building block for class-C shadow-AI fleet inventory. Right now it just exists as raw telemetry.

The chain rule (100240)

This is the rule I’m most happy with.

<rule id="100240" level="14" frequency="2" timeframe="60">
  <if_matched_sid>100210</if_matched_sid>
  <if_sid>100202</if_sid>
  <description>Indirect prompt injection chain: file creation followed by rogue exec_command within 60s ($(rogue_payload))</description>
  <mitre><id>T1059</id><id>T1204</id></mitre>
  <group>mcp_rogue,prompt_injection,attack,</group>
</rule>

A file appears in a directory the agent watches. The agent reads it. Within sixty seconds, a rogue exec_command fires. Each step is innocuous in isolation. The chain is the attack.

This is the realistic threat model for class B. A malicious file gets dropped into the developer’s repo by a teammate, an upstream dependency, or an external pull request. The agent has filesystem MCP access to that directory. The agent reads the file as part of working on the user’s task. The file contains an instruction to “run this shell command to verify the build works”. The agent does. None of those steps would trigger an alert on its own. The sixty-second join is the detection.

The trigger script triggers/06-prompt-injection-replay.sh writes a poisoned README.md, has the legit filesystem MCP server read it, and has the rogue MCP server execute the embedded command — all in sequence, within seconds. 100240 fires reliably.

Live Validation: 77 Alerts, 9 Rules

triggers/00-fire-all.sh end-to-end against the bundled manager. 77 alerts in rule.id ∈ [100200, 100240] land in /var/ossec/logs/alerts/alerts.json, across 9 distinct rule IDs:

Rule	Lvl	Fires	Trigger	What
100200	3	20	any	Anchor — every rogue-mcp line
100208	3	43	any	Anchor — every driver line
100201	6	4	startup	Rogue MCP startup (stdio + tcp)
100202	13	4	exec_command	`tool_call name=exec_command`
100203	13	1	read_secrets	`tool_call name=read_secrets` (`/etc/passwd`)
100204	14	2	exfil_url	CRITICAL — `tool_call name=exfil_url`
100205	12	2	tcp listener	Rogue MCP TCP startup on `127.0.0.1:7444`
100206	12	1	tcp accept	Rogue MCP accepted client `peer=127.0.0.1:35552`
100207	12	1	driver spawn	Driver spawned `node /opt/lab/rogue-server/bin/rogue-mcp.js`

A representative 100204 alert (CRITICAL exfil):

{
  "timestamp": "2026-04-29T10:49:31.260+0000",
  "rule": {
    "level": 14,
    "description": "Rogue MCP server exfil_url called ({\"name\":\"exfil_url\",\"arguments\":{\"url\":\"https://api.openai.com/v1/health\",\"data\":\"lab-canary\"}})",
    "id": "100204",
    "mitre": {
      "id": ["T1041", "T1567"],
      "tactic": ["Exfiltration"],
      "technique": ["Exfiltration Over C2 Channel", "Exfiltration Over Web Service"]
    },
    "groups": ["mcp", "shadow_ai", "mcp_rogue", "exfiltration"]
  },
  "agent": {"id": "001", "name": "waz12-endpoint"},
  "decoder": {"name": "rogue-mcp"},
  "data": {
    "rogue_event": "tool_call",
    "rogue_payload": "{\"name\":\"exfil_url\",\"arguments\":{\"url\":\"https://api.openai.com/v1/health\",\"data\":\"lab-canary\"}}"
  },
  "location": "/var/log/lab/rogue-mcp.log"
}

Tail /var/ossec/logs/alerts/alerts.json on the manager while the trigger script runs to capture the same set on your replay.

Reproduction

git clone https://github.com/nadimjsaliby/wazuh-shadow-ai.git
cd wazuh-shadow-ai

docker compose down -v          # clean baseline
docker compose up -d --build    # boots manager + endpoint, agent auto-enrolls

# wait for the agent to show Active on the manager
until docker exec waz12-manager /var/ossec/bin/agent_control -l \
        | grep -q 'waz12-endpoint.*Active'; do sleep 2; done

# fire every trigger
docker exec -u labuser -it waz12-endpoint bash /opt/lab/triggers/00-fire-all.sh

# observe alerts
docker exec waz12-manager bash -c \
  "tail -F /var/ossec/logs/alerts/alerts.json" \
  | jq -c 'select(.rule.id|tonumber>=100200 and tonumber<=100299)'

To deploy the rules against an existing Wazuh manager instead of the bundled one:

scp wazuh/decoders/local_decoder.xml manager:/var/ossec/etc/decoders/
scp wazuh/rules/local_rules.xml      manager:/var/ossec/etc/rules/
ssh manager '/var/ossec/bin/wazuh-control restart'

WAZUH_MANAGER=<your-manager> docker compose up -d --build endpoint

The agent will auto-enroll against the manager you specified, and the rules — which live on the manager — will start firing on the endpoint’s events as they arrive. No agent-side configuration is needed beyond what the bundled docker/ossec.conf already contains.

Why Wazuh

I picked Wazuh as the engine for this work because it’s the only open-source platform that combines all four of the data sources the detection logic needs in a single agent binary:

Localfile / syslog ingestion with custom decoder support, so the rogue MCP server’s structured log lines and the MCP client driver’s trace can both feed the rule engine without a sidecar shipper.
<full_command> periodic execution with custom decoder support, so I can ingest ps -eo / ss -lntpu snapshots and reason about process trees and listener sets without auditd, without eBPF, without privileged containers.
File Integrity Monitoring that fires on the well-known agent config paths (~/.cursor/, ~/.claude/, ~/.codex/) without custom plumbing. Rule 554 is the seam I tag and re-emit at higher priority through 100210 / 100211.
Cross-rule correlation via <if_sid> / <if_matched_sid> and timeframe joins, so the 100240 chain rule (file creation → rogue exec within 60s) is expressible without a stream processor or a separate SIEM correlation tier.

Falco does runtime detection well but doesn’t have the localfile ingestion path needed for the rogue server’s own log lines. Sysmon on Linux is comparable to <full_command> for process-tree visibility but doesn’t ingest arbitrary syslog streams. Auditd is overkill for the lab’s threat model and demands privileged container operations the realistic deployment shape can’t grant. Custom shippers feeding Elastic or Splunk would close the gap, but at the cost of running another agent on the endpoint and another pipeline in the SIEM. Wazuh does it with one daemon and one configuration file.

The agent is lightweight enough to run on a developer laptop without starving the IDE, the rule engine evaluates the entire 13-rule pack in single-digit milliseconds per event, and the manager aggregates findings across every endpoint into one indexable JSON stream that existing dashboards can already consume. That combination is why I default to Wazuh for endpoint detection work, and why this rule pack is the right shape for distributing AI-agent detection coverage to the rest of the community.