Closing the Kernel-Level Gap in Wazuh: an eBPF Sidecar with Tetragon
The previous two posts in this series — Detecting Rogue MCP Servers and Shadow AI Agents on Endpoints with Wazuh and Catching Shadow AI in the Network: DNS, Egress, and Browser Telemetry with Wazuh — both operate strictly above the syscall layer. They watch process snapshots from ps, FIM events from inotify, dnsmasq query logs, and ss -tnp output. That’s enough for the threat models they target, but it has structural blind spots.
Six gaps stock Wazuh literally cannot see, no matter how the rules are written:
- Short-lived processes. A binary that staged into
/tmp, ran for 30 ms, and self-deleted is invisible to apssnapshot taken every 60 seconds. - Memory-only payloads. A
memfd_create+execveatloader leaves zero file-system evidence for FIM to alert on. - Sub-second TCP flows. A TLS session that completes inside one
ss -tnppolling interval (3–5 s in the previous lab) never appears in any snapshot. - Process renaming via
prctl(PR_SET_NAME). Defeats any rule that allow-lists or block-lists bycommname. The previous network post had to invert its egress rule to an allow-list of trusted browsers because of exactly this evasion. - Kernel module loads. Stock Wazuh has no telemetry for
init_module/finit_moduleoutside of an explicitly configuredauditdrule. bpf()syscall use. Same — no visibility unless you wire up auditd specifically for it.
Tetragon — the Cilium project’s eBPF runtime security tool — closes all six via in-kernel BPF probes. Pairing it with Wazuh as a SIEM-side correlation tier is the obvious play, and there’s no good public guide for it.
I built that integration. A 1-decoder family that auto-handles every TracingPolicy you add later, 15 custom rules in the 100400–100440 range, 2 Tetragon TracingPolicies, an 00-fire-all-gaps.sh reproduction suite, before/after harness, k8s addon, and a multi-distro compatibility matrix that passes on Ubuntu 22.04, Ubuntu 24.04, Debian 12, and AlmaLinux 9. Everything is in a public GitHub repo. Clone it, drop the rules in your manager, fire the gap suite, watch the alerts land.
This work is contributed under the Wazuh Ambassadors program. Wazuh is an open-source SIEM + XDR platform; this post extends its detection coverage into the kernel without modifying the agent.
This post is a technical deep-dive into the threat model, the lab architecture, the decoder + rule design, the live validation results, the multi-distro compatibility matrix, the resource benchmark, and the gotchas that cost me the most time during the build.
Headline numbers
The lab boots clean and runs the full gap suite end-to-end. After one 00-fire-all-gaps.sh:
| Mode | Total alerts in 100400–100440 | Distinct paging-grade rule IDs (level ≥ 10) |
|---|---|---|
| Stock Wazuh + raw Tetragon stream (anchor only) | ~13,000 (level-3 noise) | 0 |
| With this rule pack | ~12,000 events shaped into | 10–12 distinct paging detections |
Sustained throughput in the lab over 8 minutes: 1.59 million events captured, 0 dropped, ~3,300 events/sec average. Endpoint container at 1.3 GiB RSS / ~30% of one CPU core; Wazuh manager at 91% of one core / 547 MiB. Full breakdown in the resource-benchmark section below.
Distro compatibility matrix — all four pass:
| Variant | Base image | Build | Agent active | libc | Python | Paging IDs | Paging alerts |
|---|---|---|---|---|---|---|---|
ubuntu-22.04 | ubuntu:22.04 | ✅ | ✅ | glibc 2.35 | 3.10.12 | 9 | 111 |
ubuntu-24.04 | ubuntu:24.04 | ✅ | ✅ | glibc 2.39 | 3.12.3 | 10 | 234 |
debian-12 | debian:bookworm-slim | ✅ | ✅ | glibc 2.36 | 3.11.2 | 11 | 326 |
alma-9 | almalinux:9 | ✅ | ✅ | glibc 2.34 | 3.9.25 | 12 | 428 |
(Rising paging counts are cumulative manager state across runs, not per-distro capacity. The DISTINCT distro coverage is what the table proves; reset the manager between variants for clean per-variant counts.)
Why eBPF, why Tetragon, why now
Wazuh’s agent has been one of the strongest open-source endpoint telemetry stacks for years, but its architecture predates the eBPF revolution. It reads files, watches paths via inotify, runs periodic <full_command> snapshots, and rolls everything up into syslog-shaped events the manager parses. That model works beautifully for filesystem changes, log analysis, vulnerability scans, and SCA — and it has been extended thoughtfully (the auditd integration, sysmon-on-Linux integration, etc.) — but it doesn’t see into the kernel.
eBPF does. A BPF program attached to a kernel hook (kprobe, tracepoint, LSM probe, sched event) fires synchronously inside the kernel, sees full syscall arguments, has the credentials of the calling task, and can emit structured events to user space at line rate. The fundamental advantage isn’t just “more events” — it’s events the user-space tooling literally cannot generate, like execveat of a memfd, or a tcp_connect that closes inside one polling interval.
Tetragon is the production-grade eBPF runtime security tool from the Cilium project. It exposes a high-level YAML DSL (“TracingPolicy”) that lets you declare kprobes, tracepoints, and selectors without writing libbpf C, and it ships JSON-line events to either a file or gRPC. That makes it the obvious shape for a Wazuh sidecar: Tetragon writes to a file, a small reformat process wraps each line as syslog, the Wazuh agent ingests the syslog stream, and the manager’s existing analysisd does the rule processing. No changes to the Wazuh agent. No new agent on the endpoint. No separate SIEM tier.
Falco is the other obvious candidate. It runs its own eBPF probe and ships events to the same shape of pipeline. I picked Tetragon because (a) the TracingPolicy DSL is cleaner than Falco’s macro language for the kind of rules this pack needs, (b) Tetragon’s process cache + parent-chain enrichment is more complete out-of-the-box than Falco’s, and (c) Tetragon’s documentation is currently the best for kernel feature compatibility. None of those are strict — a Falco-on-Wazuh integration is also valid; this isn’t a flame war.
What this isn’t: a replacement for Wazuh’s existing telemetry. It’s an augmentation. Stock Wazuh remains the right tool for FIM, SCA, vuln-detection, log analysis, and rule correlation. eBPF fills the kernel-level gap that the rest of the agent can’t reach.
Threat Model
Six attack-class gaps map cleanly onto the eight reproductions in this lab. Each is a thing stock Wazuh cannot see; each becomes a Wazuh alert when Tetragon’s events flow through this rule pack.
| # | Attack class | Stock Wazuh sees… | + Tetragon catches… |
|---|---|---|---|
| 1 | Short-lived process exec (<1 s) | Nothing (60 s ps snapshot) | execve event at process-creation time, regardless of lifetime (rule 100410) |
| 2 | Fileless memfd_create + execveat payload | Nothing (no file write to FIM) | execveat with binary resolved to /dev/fd/<n> (rule 100411) |
| 3 | Sensitive credential file in argv | Nothing (FIM doesn’t fire on reads) | execve with argv inspection (rule 100412) |
| 4 | Setuid privilege escalation | uid only via ps (no auid) | uid=0 with user-range auid pattern (rule 100413) |
| 5 | Sub-second TCP connect | Nothing (3–5 s ss poll misses sub-tick flows) | tcp_connect kprobe at SYN time (rule 100414) |
| 6 | Kernel module load | Nothing without explicit auditd config | sys_enter_init_module / sys_enter_finit_module tracepoint (rule 100415) |
| 7 | bpf() syscall use (eBPF rootkit primitive) | Nothing without explicit auditd config | sys_enter_bpf tracepoint (rule 100416) |
| 8 | Process self-rename via prctl(PR_SET_NAME) | Sees the FAKE name in ps / ss | sys_enter_prctl tracepoint (rule 100417 — partially: see Tetragon limitation §4 below) |
Plus four cross-source chain rules that combine the gap signals into adversary-narrative detections:
- 100420 — setuid escalation followed by sensitive file read within 60 s. “An attacker just escalated and is now consuming credentials.”
- 100421 — suspicious binary location followed by TCP egress within 60 s. “A binary staged in
/tmpis now talking to a remote host.” (Classic dropper + C2 callback.) - 100422 — fileless payload followed by TCP egress within 60 s. “Memory-only payload is functional and making network calls.”
- 100423 — kernel module load followed by
bpf()syscall within 60 s. “Likely rootkit installation.” (eBPF rootkits like Symbiote and Boopkit follow this pattern: load a small kernel module, then load BPF programs to extend in-kernel reach.)
Plus two standalone detections with high real-world value:
- 100425 — reverse-shell pattern in argv (
bash -i,/dev/tcp/,nc -e,python -c '...socket...',socat tcp:,perl -e '...socket...'). Stock Wazuh’s command-line capture lives only in 60 s ps snapshots, which miss the fast setup phase of a reverse shell. Tetragon’s execve-time argv capture is unaffected by lifetime. - 100426 — cloud metadata service probe (
169.254.169.254,metadata.google.internal, etc.). The IMDS endpoint is the single shared link-local IP on AWS / GCP / Azure / Alibaba / Oracle Cloud / DigitalOcean. Reading from it typically reveals temporary IAM credentials and instance identity — common in SSRF + container-escape exploits.
Total: 1 anchor rule at level 3, 8 single-event gap rules, 4 chain rules, 2 standalone detection rules. 15 rules, all in 100400–100440.
Architecture: One Compose, Two Containers
The repository ships a two-container Docker Compose stack. One docker compose up -d --build and you have:
┌─────────────────────────────────────┐
│ wazebpf-manager │
│ wazuh/wazuh-manager:4.10.0 │
│ • bind-mounts │
│ wazuh/decoders/ │
│ wazuh/rules/ │
│ • authd password enabled │
│ • alerts → /var/ossec/logs │
└────────────┬────────────────────────┘
│ 1514/tcp (events)
│ 1515/tcp (authd)
▼
┌─────────────────────────────────────┐
│ wazebpf-endpoint │
│ ubuntu:22.04 + Wazuh agent 4.10 │
│ + Tetragon v1.7.0 (privileged) │
│ │
│ • Tetragon → JSON to │
│ /var/log/lab/tetragon-raw.log │
│ • reformat sidecar (Python) → │
│ /var/log/lab/tetragon.log │
│ (syslog-shaped, hostname injected) │
│ • Wazuh agent ingests via │
│ localfile syslog log_format │
│ • 11 attack reproduction scripts │
└─────────────────────────────────────┘
Both containers come up clean from a docker compose down -v. The endpoint’s agent self-enrolls against the bundled manager via agent-auth against authd on port 1515 (password-based).
The endpoint container runs --privileged because Tetragon needs CAP_BPF + CAP_PERFMON + CAP_SYS_ADMIN + CAP_SYS_PTRACE to load and attach BPF programs. It also bind-mounts /sys/kernel/btf/vmlinux, /sys/fs/bpf, and /sys/kernel/debug from the host — Tetragon needs all three. A modern Linux (kernel ≥ 5.10 with CONFIG_DEBUG_INFO_BTF=y) ships all three by default.
There’s no GUI lab harness here. The reproduction scripts call kernel syscalls directly via Python ctypes (memfd_create, init_module, bpf, prctl) and shell out to common Linux tools (cat, dig, curl, bash, perl). The detections target the syscall-level fingerprint of those operations, not the specific binary that drove them. A real attacker invoking execveat on a memfd produces the same evidence the lab’s 11-fileless-memfd.sh does.
Detection Vectors: Three Sources, Three Decoders
Three event classes feed the rule pack, all from Tetragon, all through one localfile entry:
| Source | Decoder | What it sees |
|---|---|---|
syslog /var/log/lab/tetragon.log | tetragon + tetragon-fields | Every Tetragon event line, decoded into tetragon_type (process_exec, process_kprobe, process_tracepoint) + tetragon_json (full JSON payload) |
Three event classes, one decoder. That’s the design choice the post is most opinionated about. Every TracingPolicy you add later — kprobes on new functions, LSM probes, uprobes for user-space functions, additional tracepoints — flows through the same decoder without changes. Rules narrow on tetragon_type and pull fields out of tetragon_json via PCRE2. Scaling new gap detections is therefore a rule-only edit; the decoder is “set once, forget.”
The decoder file is short:
<decoder name="tetragon">
<program_name>^tetragon$</program_name>
</decoder>
<decoder name="tetragon-fields">
<parent>tetragon</parent>
<regex type="pcre2">^type=(\S+) (\{.*\})$</regex>
<order>tetragon_type, tetragon_json</order>
</decoder>
Every <regex> uses type="pcre2" because OSSEC-flavor regex on Wazuh 4.10 chokes on the brace literals (\{, \}) inside the JSON payload capture. PCRE2 is supported on Wazuh ≥ 4.4.
The reformat sidecar
Tetragon writes events to --export-filename as one JSON object per line. Wazuh’s syslog pre-decoder requires:
<Mon DD HH:MM:SS> <HOSTNAME> <PROGRAM>[<PID>]: <message>
So we need to wrap each Tetragon line in a syslog frame plus tag it with the top-level JSON event type so the Wazuh decoder can select on it without re-parsing JSON inside PCRE2:
May 05 12:39:12 wazebpf-endpoint tetragon[15]: type=process_exec {"process_exec":{"process":{...
A 60-line Python sidecar (docker/tetragon-reformat.py) tails Tetragon’s raw log and emits the wrapped lines. The first attempt was a bash + tail | awk pipe that failed silently — mawk (Ubuntu 22.04’s default awk) doesn’t honor POSIX {n} interval quantifiers without --re-interval, and stdbuf -oL doesn’t actually unbuffer awk’s writes when stdout is a regular file. The Python sidecar is straightforward and works deterministically on every distro.
The two TracingPolicies
The lab ships two TracingPolicy CRDs (plus Tetragon’s built-in execve sensor, which fires process_exec events with no policy needed):
tetragon/policies/
10-tcp-connect.yaml kprobe on tcp_connect with kernel-side
NotDAddr selectors that exclude loopback +
RFC1918 destinations
20-syscall-tracepoints.yaml combined tracepoints on
syscalls/sys_enter_finit_module
syscalls/sys_enter_init_module
syscalls/sys_enter_bpf
syscalls/sys_enter_prctl
The combined tracepoint policy is intentional — see §“Gotchas worth knowing” below for the empirical reason.
The 15 Rules: Every Detection That Matters
All fifteen rules live in the 100400–100440 ID range — well clear of stock Wazuh and clear of the previous packs (100200–100240 for the MCP pack, 100300–100340 for the network pack). The full file is wazuh/rules/local_rules.xml.
Anchor (100400)
Fires on every Tetragon event line. Level 3 to keep it out of analyst console queries.
<rule id="100400" level="3">
<decoded_as>tetragon</decoded_as>
<description>Tetragon event: $(tetragon_type)</description>
<group>tetragon_anchor,</group>
</rule>
Every higher-severity rule references <if_sid>100400</if_sid> plus a tetragon_type filter.
Single-event gap rules (100410–100417)
Each catches one of the eight gap classes. The most representative:
<!-- 100411 — fileless execution via memfd_create + execveat -->
<rule id="100411" level="13">
<if_sid>100400</if_sid>
<field name="tetragon_type">^process_exec$</field>
<match type="pcre2">"binary":"(memfd:|\/dev\/fd\/\d+|\/proc\/self\/fd\/\d+|\/proc\/\d+\/fd\/\d+)</match>
<!-- Suppress runc-init noise: the legitimate container runtime
does memfd-style self-replacement during boot. Real fileless
payloads will have a different parent (bash, python, sh). -->
<field name="tetragon_json" negate="yes" type="pcre2">"parent":\{[^}]*"binary":"\/usr\/bin\/runc"</field>
<description>eBPF: fileless execution via memfd_create + execveat ($(tetragon_json))</description>
<mitre><id>T1620</id></mitre>
<group>tetragon_exec,fileless,attack,</group>
</rule>
Two non-obvious things in this rule:
-
The
binarypatterns include/dev/fd/<n>ANDmemfd:AND/proc/self/fd/. Tetragon’s documentation says memfd execveat shows up asmemfd:<name>— empirically (Tetragon v1.7.0 on Linux 6.8) it shows up as/dev/fd/<n>. Cost me 10 minutes diagnosing “why doesn’t my fileless test fire the rule.” All forms are now covered. -
The negate filter on
parent.binary = /usr/bin/runc.runcdoes memfd-style self-replacement during container boot to mitigate CVE-2019-5736 — that fires the rule ~80 times per container start. The negate filter drops the noise from 37 fires to 1 fire (just the genuine test). This is a real example of “raw eBPF gives high signal AND high volume; userspace narrowing is mandatory.”
Setuid escalation (100413)
This is the rule I’m most happy with from the single-event set:
<rule id="100413" level="14">
<if_sid>100400</if_sid>
<field name="tetragon_type">^process_exec$</field>
<match type="pcre2">"uid":0,[^}]*"auid":[1-9]\d{0,3}[,}]</match>
<description>eBPF: setuid privilege escalation — uid=0 from user auid ($(tetragon_json))</description>
<mitre><id>T1548.001</id></mitre>
<group>tetragon_exec,privilege_escalation,attack,</group>
</rule>
Tetragon’s process_exec records both the post-execve uid AND the immutable auid (audit / login uid). A process running as uid=0 with a user-range auid (1–9999) is the classic setuid escalation moment — sudo, su, mount, etc. firing as root from a user shell. System daemons have auid=4294967295 (unset) and are excluded by the pattern.
Stock Wazuh has no equivalent: ps shows post-execve uid only and there is no auid in standard ps output. The auid field is a kernel-level thing tied to login session tracking, only really visible via auditd or eBPF.
The rule is also noisy on a normal Linux endpoint — the Wazuh agent’s own subprocesses, container-managed sudo, etc. produce ~30–50 fires per gap-suite run. Production deployments need an allow-list of trusted callers. Consider this the intended demonstration of “raw eBPF telemetry has both high signal AND high volume — userspace narrowing is mandatory.”
TCP connect via kprobe (100414)
Backed by the lab-tcp-connect TracingPolicy:
spec:
kprobes:
- call: "tcp_connect"
syscall: false
args:
- {index: 0, type: "sock"}
selectors:
- matchArgs:
- {index: 0, operator: "DAddr", values: ["0.0.0.0/0"]}
- {index: 0, operator: "NotDAddr", values: ["127.0.0.0/8","10.0.0.0/8","172.16.0.0/12","192.168.0.0/16","169.254.0.0/16"]}
The kernel-side NotDAddr selectors filter loopback + RFC1918 BEFORE the event is emitted. This matters at scale: in-kernel filtering means events the rules will never want never become events at all. The previous network-shadow-AI lab’s ss -tnp watcher polled every 3–5 seconds, which missed any flow that completed inside one tick. A kprobe fires at the SYN, so even a 50 ms flow is captured.
The Wazuh rule itself is trivial — the decoder pulls tetragon_type + tetragon_json, the rule narrows on tetragon_type=process_kprobe plus a function_name literal:
<rule id="100414" level="11">
<if_sid>100400</if_sid>
<field name="tetragon_type">^process_kprobe$</field>
<match type="pcre2">"function_name":"tcp_connect"</match>
<description>eBPF: TCP connect to non-internal destination ($(tetragon_json))</description>
<mitre><id>T1071.001</id></mitre>
<group>tetragon_kprobe,egress,attack,</group>
</rule>
Chain rules (100420, 100421, 100422, 100423)
These are the headline detections. The fileless → egress chain:
<rule id="100422" level="14" timeframe="60">
<if_matched_sid>100411</if_matched_sid>
<if_sid>100414</if_sid>
<description>CHAIN: fileless payload followed by TCP egress within 60s ($(tetragon_json))</description>
<mitre>
<id>T1620</id>
<id>T1071.001</id>
</mitre>
<group>tetragon_chain,fileless_callback,attack,</group>
</rule>
A process_exec with binary=/dev/fd/<n> (rule 100411) fires. Within 60 seconds, a process_kprobe for tcp_connect to a non-internal IP (rule 100414) fires. The chain (100422) fires on top of the second event with level 14.
The pattern that matters: <if_matched_sid> works correctly even when the precondition rule is itself superseded in alerts.json by a higher-severity sibling. Verified empirically — the lab’s chain rules continue to fire on top of preconditions whose alerts are hidden by other chains.
The most impressive chain is 100423 (kernel module + bpf within 60s, level 15), the rootkit-install pattern. eBPF rootkits — Symbiote, Boopkit, the various academic “in-kernel persistence” PoCs — all follow the same shape: load a small kernel module, then load eBPF programs to extend in-kernel reach. Catching that two-event sequence with no other signal is a very strong indicator of compromise.
Reverse-shell pattern (100425)
<rule id="100425" level="14">
<if_sid>100400</if_sid>
<field name="tetragon_type">^process_exec$</field>
<match type="pcre2">"arguments":".*?(bash -i|sh -i|\/dev\/tcp\/|nc -e |ncat -e |socat (tcp|exec):|python3? -c .*?socket|perl -e .*?socket|ruby -rsocket -e)</match>
<field name="tetragon_json" negate="yes" type="pcre2">\/dev\/tcp\/127\.0\.0\.1\/</field>
<description>eBPF: reverse-shell command pattern in process argv ($(tetragon_json))</description>
<mitre><id>T1059.004</id><id>T1071.001</id></mitre>
<group>tetragon_exec,reverse_shell,attack,</group>
</rule>
Three small lessons in this rule that are worth more than the rule itself:
-
[^"]*is fragile in PCRE2 against JSON-encoded fields when the field value contains escaped quotes (\"). The Wazuh agent’s modulesd usesbash -c ':> /dev/tcp/127.0.0.1/<port>'for localhost connectivity probes — the"in the bash-cargv JSON-encodes to\"and[^"]*greedy-matches stop at the literal quote character. Switched to.*?(non-greedy any-char) which handles escapes correctly. This was a 10-minute debug. -
The Wazuh agent’s own
bash -c ':> /dev/tcp/...'localhost probe matches the rule. That’s a true positive on the technical pattern but operationally noisy. The negate filter drops localhost destinations — real reverse shells target REMOTE hosts, not loopback. -
Anchored regex on
(^|\.)openai\.com$-style patterns is the lesson the previous network post taught me. Apply it everywhere to avoidfakeopenai.comstyle false positives.
Live Validation: 13238 Anchor Events → 114 Paging Alerts Across 10 Rule IDs
The lab ships a before/after harness (scripts/before-after-demo.sh) that runs the gap suite once and partitions the resulting alerts.json by rule level — anchor noise vs paging-grade rule pack output. Output on a clean lab boot:
====================================================================
BEFORE — what stock Wazuh + raw Tetragon stream surfaces
(only the level-3 anchor rule 100400)
====================================================================
level-3 anchor (100400) fires: 13238
paging-grade rule IDs: 0
→ analyst signal-to-noise: 0 useful alerts in 13238 events
====================================================================
AFTER — what the custom 15-rule pack adds on top
====================================================================
paging-grade alerts (level >= 10):
100410 L11 2 fires
100411 L13 4 fires
100413 L14 36 fires
100414 L11 1 fires
100415 L13 4 fires
100420 L14 10 fires
100422 L14 16 fires
100423 L15 8 fires
100425 L14 30 fires
100426 L12 3 fires
distinct paging-grade rule IDs: 10
total paging-grade alerts: 114
====================================================================
DELTA
====================================================================
Stock Wazuh + Tetragon raw stream: 13238 anchor events, 0 paging detections
+ this rule pack: 114 paging-grade alerts across 10 rule IDs
The single anchor rule (100400) absorbs the level-3 noise; the 14 narrower rules surface 114 paging-grade alerts across 10 distinct IDs. Four rules account for 64 of those — the three chain rules that fired (100420/100422/100423) plus the standalone reverse-shell pattern (100425) — because each represents an adversary-narrative pattern, not a single signal.
The IDs that don’t appear in the AFTER count (100412, 100416, 100417, 100421) either match internally and are superseded by their higher-severity sibling chains in alerts.json (100412 → 100420, 100416 → 100423), or depend on conditions this particular run didn’t hit (100421 needs a /tmp binary AND egress within 60 s; 100417 is the prctl rule constrained by the Tetragon tracepoint limitation in §“Gotchas worth knowing” #6). Superseded behavior is intentional: chain rules ARE higher-priority versions of their preconditions.
A representative 100423 alert (the rootkit-install chain):
{
"timestamp": "2026-05-05T...",
"rule": {
"level": 15,
"description": "CHAIN: kernel module load followed by bpf() syscall within 60s — likely rootkit install (...)",
"id": "100423",
"mitre": {
"id": ["T1547.006", "T1014"],
"tactic": ["Persistence", "Defense Evasion"],
"technique": ["Kernel Modules and Extensions", "Rootkit"]
},
"groups": ["ebpf", "tetragon", "tetragon_chain", "rootkit_install", "attack"]
},
"agent": {"id": "001", "name": "wazebpf-endpoint"},
"decoder": {"name": "tetragon"},
"data": {
"tetragon_type": "process_tracepoint",
"tetragon_json": "{...sys_enter_bpf event...}"
},
"location": "/var/log/lab/tetragon.log"
}
Compatibility matrix
The lab tests the integration on four mainstream Linux distributions. All four pass the gap suite end-to-end with paging-rule-ID counts of 9 / 10 / 11 / 12. (Rising count is cumulative manager state across runs, not per-distro capacity — restart the manager between variants for clean per-distro counts.)
| Variant | Base image | Build | Agent active | libc | Python | Paging IDs | Paging alerts |
|---|---|---|---|---|---|---|---|
ubuntu-22.04 | ubuntu:22.04 | ✅ | ✅ | glibc 2.35 | 3.10.12 | 9 | 111 |
ubuntu-24.04 | ubuntu:24.04 | ✅ | ✅ | glibc 2.39 | 3.12.3 | 10 | 234 |
debian-12 | debian:bookworm-slim | ✅ | ✅ | glibc 2.36 | 3.11.2 | 11 | 326 |
alma-9 | almalinux:9 | ✅ | ✅ | glibc 2.34 | 3.9.25 | 12 | 428 |
The matrix lives at compat/test-matrix.sh + compat/Dockerfile.{deb,rhel}. Per-variant results land at compat/results/<variant>.{json,log}.
Honest scope: the matrix tests distro userspace — the Wazuh agent + Tetragon binary + reformat sidecar across distros. Multi-kernel-VERSION compatibility is documented from Tetragon’s own compatibility surface, not exhaustively re-tested per kernel version (we run on a single host kernel; multi-kernel testing requires VM rotation). The kernel feature compatibility table is in report/COMPAT.md.
The kernel-feature short version: every TracingPolicy in this lab works on Linux ≥ 4.18 with BTF. RHEL 8 / AlmaLinux 8 (kernel 4.18 with optional BTF) needs kernel-debuginfo installed for Tetragon to extract BTF; everything else (Ubuntu 22.04 / 24.04, Debian 11/12, RHEL 9 family) works out of the box.
Resource benchmark
After ~8 minutes of the lab running 00-fire-all-gaps.sh plus the full compat/test-matrix.sh:
| Metric | Value |
|---|---|
| Total events captured | 1,593,401 (1.59M) |
| Sustained throughput | ~3,300 events/sec average |
| Endpoint container memory | 1.3 GiB RSS at sustained load |
| Endpoint container CPU | 20–35% of one core |
| Manager CPU | 91% of one core (analysisd is the hot loop) |
| Manager memory | 547 MiB |
| Tetragon’s own on-disk footprint | ~60 MiB (5 rotated 10MB files + current) |
Three operational truths the benchmark exposes:
-
Tetragon at “idle” on a busy host = thousands of events/sec. Even with no triggers running, the BPF hooks fire ~216 events/sec just from the kernel doing kernel things (Wazuh agent’s own subprocesses, container init, etc.). BPF runs in kernel space and is PID-namespace-blind — it sees ALL processes on the host, not just the container’s. This is a feature for monitoring; it’s also what makes in-kernel filtering via TracingPolicy selectors mandatory at scale.
-
The reformat sidecar has no log rotation. After 8 minutes our test had 2.2 GiB in the single
/var/log/lab/tetragon.logfile. Over 24 hours of similar load that would be ~400 GiB. Production action: wirelogrotateagainst it (postrotate:kill -HUPthe sidecar so it reopens the file), OR replace the file sink with a unix socket / pipe straight to the Wazuh agent. This is the single most important production-tuning step before deploying the lab’s pattern at scale. -
The manager is the bottleneck at scale. At 3,300 events/sec the manager’s analysisd sits at 91% of one core. Wazuh manager 4.10’s analysisd is single-threaded per worker; horizontal scaling needs a cluster. For a single-manager deployment, the sustainable event rate from this integration is somewhere between 5k–10k events/sec depending on rule complexity.
Gotchas worth knowing
Eight things that cost me real time during the build. Documenting them so you don’t pay the same tax.
-
Tetragon’s BPF objects ship at
/var/lib/tetragon/bpf/, NOT/var/lib/tetragon/. Pass--bpf-lib /var/lib/tetragon/bpfexplicitly. The default path doesn’t search thebpf/subdir. -
Tetragon resolves memfd-backed
execveatto/dev/fd/<n>, NOT the documentedmemfd:<name>. The fileless rule 100411 needs to match both forms. -
runc fires
/proc/self/fd/6 init80+ times during container startup (legitimate CVE-2019-5736 self-replacement). Negate-on-parent=runc dropped 100411 noise from 37 → 1. -
Tetragon BPF hooks see ALL host processes regardless of PID namespace. Even with
--export-allowlistset narrow, the lab observed 5,594 hostdateexecs in one run from the host’s cron jobs. Feature, not bug — but explains the firehose volume; in-kernel filtering via TracingPolicy selectors is mandatory for production. -
Tetragon v1.7.0
generic_tracepointsensor does NOT multiplex across separate TracingPolicy CRDs. Loading three separate policies (kmod, bpf, prctl) results in only the last loaded firing. Empirical fix: combine all syscall tracepoints into ONE TracingPolicy file. The lab’stetragon/policies/20-syscall-tracepoints.yamldoes exactly this. -
Tetragon v1.7.0 tracepoint args at
index: 5, type: intforsys_enter_prctlalways returnsint_arg: 0regardless of the actualoptionargument. Couldn’t narrow PR_SET_NAME (option=15) at the eBPF layer; rule 100417 fires noisily on any prctl call from non-system callers. Production fix: kprobe on__x64_sys_prctlto read register-passed args directly. Reserved as a follow-up. -
Tetragon’s pidfile at
/var/run/tetragon/tetragon.pidsurvivesdocker compose restartbecause compose preserves the writable layer. Entrypoint mustrm -fit before relaunching Tetragon. Cost me 15 minutes diagnosing “why is the new policy not loading.” -
Wazuh
<directories>paths must be on the same line as the opening tag. A multi-line<directories>body silently registers as an empty path (visible asMonitoring path: ''inossec.log) and the entire FIM block does nothing. Carried over from the previous lab — bit me again here.
Plus three more from the multi-distro matrix:
-
AlmaLinux 9 base image has
tinionly via EPEL (not base repos) andcurl-minimal(notcurl) pre-installed with a baseos conflict against the fullcurlpackage. Both fixed incompat/Dockerfile.rhel. -
Dockerfile RUN doesn’t compose with heredoc +
\line continuations. The heredoc terminator gets shell-glommed into the same logical command. Useprintf '...\n...'for multi-line config writes. -
[^"]*is fragile in PCRE2 against JSON-encoded fields with embedded escaped quotes. Use non-greedy.*?for argv-pattern matches.
Reproduction
git clone https://github.com/nadimjsaliby/wazuh-ebpf-tetragon.git
cd wazuh-ebpf-tetragon
docker compose down -v # clean baseline
docker compose up -d --build # boots manager + endpoint, agent auto-enrolls
# wait for the agent to show Active on the manager
until docker exec wazebpf-manager /var/ossec/bin/agent_control -l \
| grep -q 'wazebpf-endpoint.*Active'; do sleep 2; done
# fire every gap reproduction
docker exec -u labuser -it wazebpf-endpoint bash /opt/lab/scripts/00-fire-all-gaps.sh
# observe alerts
docker exec wazebpf-manager bash -c \
"tail -F /var/ossec/logs/alerts/alerts.json" \
| jq -c 'select(.rule.id|tonumber>=100400 and tonumber<=100499)'
# run the before/after harness
bash scripts/before-after-demo.sh
# (optional) run the multi-distro matrix
bash compat/test-matrix.sh
# (optional) deploy the kind addon
cd kind && bash bootstrap.sh
To deploy the rules against an existing Wazuh manager:
scp wazuh/decoders/local_decoder.xml manager:/var/ossec/etc/decoders/
scp wazuh/rules/local_rules.xml manager:/var/ossec/etc/rules/
ssh manager '/var/ossec/bin/wazuh-control restart'
You’ll also need to ship Tetragon (or a compatible eBPF event source) to each agent endpoint. See kind/README.md for the Kubernetes pattern (Tetragon + reformat sidecar + Wazuh agent in one DaemonSet, sharing emptyDir for the event flow).
Why this is the right shape
I picked Wazuh as the engine for the same reasons I picked it for the previous two posts: it’s the only open-source platform that combines ingestion, decoding, rule correlation, and cross-source <if_matched_sid> chaining in a single agent + manager stack with no separate stream-processing layer.
The eBPF angle is what’s new here. Tetragon’s TracingPolicy DSL keeps the kernel programs declarative, version-controlled, and easy to read. Its JSON-line export is exactly the shape Wazuh’s localfile syslog log_format wants once you wrap it. The Wazuh agent’s existing rule chain handles correlation. The manager’s existing alert pipeline produces the same alerts.json your existing dashboards already consume.
What this isn’t: a rewrite of Wazuh. The agent still handles FIM, SCA, vuln-detection, log analysis, and rule correlation. The eBPF integration adds a single new telemetry source — an in-kernel one — without disturbing the rest. Drop the rule pack into your existing manager, deploy the Tetragon sidecar on a few endpoints, and you have kernel-level visibility for that subset without touching anything else.
This is the third post in a series. The first (rogue MCP / shadow AI agents) covers the IDE side of agentic coding. The second (network-side shadow AI) covers DNS + egress + browser extensions. This third one closes the kernel-level gap that both prior posts had to work around. Together they cover the agentic-AI surface area, the network surface area, and the kernel surface area on developer endpoints — three rule packs, one Wazuh manager, one set of dashboards.
The next post in the series will cover Wazuh against a live C2 framework (Sliver) with the full kill chain and what each layer of detection here catches and misses. That’s the credibility bridge to the rest of detection-engineering work — adversary realism instead of “polite simulator” labs.
Resources
- Repository + lab —
nadimjsaliby/wazuh-ebpf-tetragon— clone anddocker compose up - Companion endpoint pack —
nadimjsaliby/wazuh-shadow-ai— rogue MCP / IDE-side shadow AI - Companion network pack —
nadimjsaliby/wazuh-network-shadow-ai— DNS + egress + browser extensions - Tetragon documentation — Getting Started with Docker
- Tetragon documentation — TracingPolicy reference
- Tetragon documentation — Linux kernel compatibility
- Cilium project on GitHub
- Wazuh Documentation — Custom Rules and Decoders
- Wazuh Documentation — Localfile log collection
- eBPF.io — What is eBPF?
- Linux kernel BPF documentation
- MITRE ATT&CK — T1014 Rootkit
- MITRE ATT&CK — T1059.004 Unix Shell
- MITRE ATT&CK — T1071.001 Application Layer Protocol: Web Protocols
- MITRE ATT&CK — T1547.006 Boot or Logon Autostart Execution: Kernel Modules and Extensions
- MITRE ATT&CK — T1548.001 Abuse Elevation Control Mechanism: Setuid and Setgid
- MITRE ATT&CK — T1552.005 Unsecured Credentials: Cloud Instance Metadata API
- MITRE ATT&CK — T1620 Reflective Code Loading
- Symbiote: A New, Nearly-Impossible-to-Detect Linux Threat — eBPF rootkit case study
About this contribution
This post is contributed under the Wazuh Ambassadors program, an open-community track for independent practitioners who build with Wazuh — the open-source SIEM + XDR platform whose ingestion, decoding, and <if_matched_sid> chain-rule engine underpin every detection in this rule pack. The lab, rules, decoder, TracingPolicies, attack scripts, compatibility matrix, and benchmarks are all open-source and reproducible in one docker compose up.