Wazuh on K8s: 7 Frameworks, Auto-Remediation, One Chart
Most organizations running Wazuh on Kubernetes are stitching together five separate tools to get compliance coverage that still has gaps. One tool scans CIS benchmarks. Another handles admission policies. A third runs vulnerability checks. Remediation is manual. Reports are spreadsheets. None of them share context, and when an auditor asks whether a single kubelet misconfiguration also fails your NIST, PCI, and HIPAA controls — nobody can answer without hours of cross-referencing.
I built a single Helm chart that replaces that entire stack. 167 checks across 7 compliance frameworks. Admission webhook enforcement. Automated remediation. Runtime threat detection mapped to MITRE ATT&CK. Prometheus metrics. Grafana dashboards. Audit-ready compliance reports. One helm install, full lifecycle coverage.
This post is a technical deep-dive into the architecture, the cross-framework mapping that makes it work, the runtime threat detection engine, and every enterprise feature under the hood.
The Problem With Current Approaches
Here’s what most teams are running today:
- A Wazuh or Falco DaemonSet for detection
- OPA/Gatekeeper or Kyverno for admission policies
- A separate CIS scanner (kube-bench) as a CronJob
- Manual remediation or Ansible playbooks triggered by humans
- Compliance reports generated in spreadsheets by hand
These tools don’t share context. A CIS check that fails on the kubelet doesn’t automatically map to the NIST 800-53 control it satisfies. The admission webhook doesn’t know what the SCA scanner found. The remediation is always manual.
The result: compliance drift, audit fatigue, and a false sense of security.
Architecture: One Chart, Full Lifecycle
The chart deploys as a DaemonSet across every node, with four enforcement layers:
┌─────────────────────────┐
Deploy ──────►│ Admission Webhook │ ◄── PREVENT
│ Block before it runs │
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
Runtime ─────►│ Wazuh Agent DaemonSet │ ◄── DETECT
│ SCA + FIM + Vuln + RT │
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
CronJob ─────►│ Auto-Remediation │ ◄── FIX
│ File perms, kernel, │
│ SSH, auditd, modules │
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
Scheduled ───►│ Compliance Reports │ ◄── PROVE
│ JSON / HTML / CSV │
│ S3 upload + email │
└─────────────────────────┘
Prevent — A ValidatingWebhookConfiguration intercepts every pod, deployment, statefulset, daemonset, job, and cronjob at admission time. It blocks privileged containers, host namespace access, privilege escalation, :latest tags, missing required labels, and unauthorized registries. It does this before the workload ever touches a node.
Detect — Wazuh agents run SCA (Security Configuration Assessment) scans against seven policy files simultaneously. Each policy file is a set of checks written in Wazuh’s SCA YAML format, with compliance cross-references baked into every check.
Fix — A CronJob runs every 6 hours (configurable) and remediates findings automatically: file permissions, kernel sysctl parameters, SSH hardening, unused kernel modules, and auditd rules. It starts in dry-run mode by default — it logs what it would fix without touching anything.
Prove — A weekly CronJob generates compliance reports in JSON, HTML, and CSV. It can upload directly to S3 or email stakeholders. The HTML report is audit-ready with framework breakdowns and pass/fail summaries.
Seven Frameworks, One Scan
Here’s what a single SCA scan evaluates:
| Framework | Controls | What It Checks |
|---|---|---|
| CIS Kubernetes v1.8.0 | 31 (L1 + L2) | API server config, kubelet hardening, etcd security, RBAC, network policies |
| CIS Linux v2.0.0 | 36 (L1 + L2) | Filesystem, network params, SSH, logging, file permissions, password policy |
| NIST 800-53 Rev5 | 24 | AC, AU, CM, IA, SC, SI control families mapped to K8s and OS checks |
| PCI-DSS v4.0 | 20 | Network segmentation, encryption at rest/transit, access control, FIM, audit trails |
| HIPAA §164.312 | 16 | Access control, audit controls, integrity, authentication, transmission security |
| SOC2 Type II | 18 | CC6-CC8 trust criteria, availability, change management |
| Runtime Threats | 22 | MITRE ATT&CK mapped: cryptomining, container escape, reverse shells, persistence |
Total: 167 checks per scan cycle.
The critical insight is cross-framework mapping. Take kubelet anonymous authentication as an example:
- id: 50400
title: "IA-2: kubelet anonymous auth disabled"
compliance:
- nist_800_53: ["IA-2"]
- cis: ["4.2.1"]
This single check satisfies:
- CIS Kubernetes 4.2.1 (Worker Node — Kubelet)
- NIST 800-53 IA-2 (Identification and Authentication)
- PCI-DSS 2.2.1 (Secure default configurations)
- HIPAA 164.312(a)(2)(i) (Unique user identification)
- SOC2 CC6.1 (Logical access security)
One finding, five frameworks addressed. That’s the kind of efficiency that auditors and compliance teams actually need, and that no single open-source tool provides out of the box.
Runtime Threat Detection: MITRE ATT&CK Mapped
This is the part that goes beyond compliance into active threat hunting. The runtime policy file checks for indicators of compromise that map directly to MITRE ATT&CK techniques:
Cryptomining (T1496)
- id: 90100
title: "Cryptominer process detection"
condition: none
rules:
- "p:xmrig"
- "p:minerd"
- "p:cpuminer"
- "p:ethminer"
- "p:cgminer"
- "p:nbminer"
- "p:t-rex"
- "p:gminer"
This doesn’t just check for xmrig. It checks for 11 known miners, stratum protocol connections on mining pool ports (3333, 4444, 8333, 14444, 45700), and processes consuming >90% CPU as a behavioral indicator.
Container Escape (T1611)
- id: 90201
title: "Container escape — Host mount abuse"
condition: none
rules:
- "c:mount -> r:docker.sock"
- "c:mount -> r:containerd.sock"
Detects containers mounting the Docker socket or containerd socket (the most common container escape vector), nsenter usage, cgroup release_agent abuse (CVE-2022-0492 style), and running privileged containers.
Reverse Shells (T1059)
Detects shell processes with network socket redirections, ncat/nc with -e flags, and socat TCP connections. These are the exact patterns you’d see in a real post-exploitation scenario.
Credential Harvesting (T1552)
Checks for processes reading Kubernetes ServiceAccount tokens from /proc, connections to the cloud metadata endpoint (169.254.169.254), and SSH private key scanning across /home.
Persistence (T1053, T1543, T1554)
Detects recently created cron jobs, modified system binaries, and new systemd service files — all created within the last 60 minutes, which is a strong indicator of an active intrusion.
The Admission Webhook: Shift-Left Enforcement
Detection is reactive. The admission webhook is proactive — it prevents non-compliant workloads from ever running.
It’s deployed as a separate HA deployment (default 2 replicas with topology spread constraints) with its own ServiceAccount, RBAC, NetworkPolicy, PDB, and cert-manager TLS certificate.
The policy engine evaluates 13 rules:
{
"blockPrivileged": true,
"blockHostNetwork": true,
"blockHostPID": true,
"blockHostIPC": true,
"requireRunAsNonRoot": true,
"blockPrivilegeEscalation": true,
"blockLatestTag": true,
"requireImageDigest": false,
"requiredLabels": ["app.kubernetes.io/name", "app.kubernetes.io/version"],
"blockedImageRegistries": [],
"allowedImageRegistries": []
}
The webhook itself is self-hardened:
- Non-root (runs as UID 65534)
- Read-only root filesystem
- All capabilities dropped
- Seccomp RuntimeDefault
- NetworkPolicy restricting traffic to only the API server
- Failure policy defaults to
Ignore(fail-open) so a webhook outage doesn’t block deployments, switchable toFailfor strict environments
The exemption system is critical for production. The chart’s own namespace and service account are automatically exempted, along with kube-system, kube-public, and kube-node-lease. You can’t accidentally lock yourself out.
Auto-Remediation: From Detection to Action
The remediation engine runs as a privileged CronJob with host filesystem access. Here’s what it fixes:
File Permissions — Sets /etc/passwd to 644, /etc/shadow to 640, /etc/group to 644, /etc/gshadow to 640. For Kubernetes nodes, it also enforces 600 permissions and root:root ownership on kube-apiserver.yaml, kube-controller-manager.yaml, kube-scheduler.yaml, and etcd.yaml.
Kernel Parameters — Applies sysctl hardening:
net.ipv4.conf.all.send_redirects=0
net.ipv4.conf.default.send_redirects=0
net.ipv4.conf.all.accept_source_route=0
net.ipv4.conf.default.accept_source_route=0
net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.default.accept_redirects=0
net.ipv4.tcp_syncookies=1
net.ipv6.conf.all.accept_ra=0
net.ipv6.conf.default.accept_ra=0
It persists changes to /etc/sysctl.conf so they survive reboots.
SSH Hardening — Ensures PermitRootLogin no, PermitEmptyPasswords no, MaxAuthTries 4, ClientAliveInterval 300, ClientAliveCountMax 3, LoginGraceTime 60.
Kernel Modules — Disables cramfs, squashfs, and udf by writing to /etc/modprobe.d/cis-hardening.conf.
Auditd Rules — Adds watch rules for /etc/passwd, /etc/shadow, /etc/group, and /etc/gshadow.
The dry-run mode is essential. On first deployment, it logs every change it would make without touching anything:
[DRY-RUN] Would execute: chmod 640 /host/etc/shadow
[DRY-RUN] Would execute: sysctl -w net.ipv4.conf.all.send_redirects=0
[DRY-RUN] Would execute: echo 'PermitRootLogin no' >> /host/etc/ssh/sshd_config
When you’re ready to go live, flip autoRemediation.dryRun: false. It sends a Slack notification with a count of changes made per node.
Observability: Prometheus + Grafana
Every agent pod runs a metrics sidecar exporting six metrics:
| Metric | Type | Description |
|---|---|---|
wazuh_agent_up | gauge | Is the agent process running (0/1) |
wazuh_sca_checks_passed | gauge | Number of SCA checks currently passing |
wazuh_sca_checks_failed | gauge | Number of SCA checks currently failing |
wazuh_fim_events_total | counter | Total file integrity change events |
wazuh_vulnerabilities_detected | gauge | Current vulnerability count |
wazuh_alerts_total | counter | Total alerts generated |
The PrometheusRule defines six alerts:
- WazuhAgentDown — Agent offline for 5+ minutes (critical)
- WazuhHighSCAFailureRate — >30% of checks failing (warning)
- WazuhCriticalSCAFailures — >50% of checks failing (critical)
- WazuhVulnerabilitiesDetected — >50 vulnerabilities on a node (warning)
- WazuhFIMSpikeDetected — Unusual rate of file changes (warning)
- WazuhAlertStorm — >50 alerts/sec indicating an active incident (critical)
The Grafana dashboard is auto-discovered via sidecar label and shows: agent status, compliance score gauge, SCA pass/fail per node, vulnerability trends, FIM event rate, and alert rate with threshold highlighting.
Self-Hardening: The Chart Secures Itself
A security chart that isn’t itself hardened is a joke. This chart practices what it preaches:
- NetworkPolicy — Agent pods can only reach the Wazuh manager, DNS, and the Kubernetes API. Webhook pods only accept traffic from the API server.
- PodDisruptionBudget — Maintains 50% agent availability during rolling updates and node drains.
- Seccomp — RuntimeDefault profile on all pods.
- Secret management — Registration passwords are stored in Kubernetes Secrets with
helm.sh/resource-policy: keep. Supports external secret references. - Config checksums — DaemonSet pods auto-restart when ConfigMaps change. No manual rollout needed.
- cert-manager integration — Webhook TLS and optional agent-to-manager mTLS via cert-manager Certificates with ECDSA P-256 keys.
- Manager HA — Supports multiple manager endpoints with automatic failover.
- Values schema validation — JSON Schema catches misconfiguration before
helm installruns. - Priority class — Agents run as
system-node-criticalso they’re the last thing evicted under resource pressure.
Deploying It
Minimal deployment with CIS + NIST (enabled by default):
helm install wazuh-hardening ./wazuh-k8s-hardening \
--namespace wazuh-system --create-namespace \
--set manager.host=wazuh-manager.wazuh.svc.cluster.local \
--set manager.registrationPassword=YOUR_PASSWORD
Full enterprise deployment:
global:
clusterName: "prod-us-east-1"
environment: "production"
organization: "Your Org"
manager:
host: "wazuh-manager.wazuh.svc.cluster.local"
existingSecret: "wazuh-auth"
failover:
enabled: true
hosts:
- host: "wazuh-manager-2.wazuh.svc.cluster.local"
compliance:
cisKubernetes:
profile: "L2"
cisLinux:
profile: "L2"
nist80053:
enabled: true
pciDss:
enabled: true
hipaa:
enabled: true
soc2:
enabled: true
admissionWebhook:
enabled: true
failurePolicy: "Fail"
autoRemediation:
enabled: true
dryRun: false
notifications:
enabled: true
slackWebhookUrl: "https://hooks.slack.com/services/..."
Why Wazuh
I chose Wazuh as the engine for this because it’s the only open-source platform that can run SCA, FIM, vulnerability detection, log collection, rootcheck, and active response from a single agent binary. Falco does runtime detection well but doesn’t do compliance scanning. kube-bench does CIS but nothing else. OPA does admission but doesn’t touch host-level hardening.
Wazuh’s SCA engine accepts custom YAML policy files with compliance cross-references baked into every check. That’s the capability that makes multi-framework mapping possible without building a custom engine from scratch. The agent is lightweight enough to run as a DaemonSet without starving your workloads, and the manager aggregates findings across every node into a single pane of glass.
This chart extends Wazuh’s capabilities into areas it doesn’t cover natively: admission-time enforcement, automated remediation, Prometheus-native observability, and scheduled compliance reporting. It’s what Wazuh should ship as a reference Kubernetes deployment.