Kubernetes Deployment Correctness Guard

Kubernetes Says Healthy.
Your System Is Broken.

CI/CD pipeline succeeded. Kubernetes reports Ready. But the deployment just broke PubSub permissions, the image tag doesn't exist, and a container is OOMKilled every 30 seconds.
DeployGuard detects deployment failures that infrastructure reports as healthy.

Install in 10 Minutes View Docs

Terminal

$ kubectl apply -f https://agent.deployguard.net/install/YOUR_TOKEN

One command. Immediate detection of deployment correctness failures.

The Question No One Answers

"Did this deployment break production even though Kubernetes says it's healthy?"

You merged the PR. CI passed. kubectl rollout status completed. Kubernetes reports all pods Ready. But the system is broken — and nobody knows yet.

12 minutes later, Slack lights up:

"Is production down?"

It is. Kubernetes passed readiness probes. CI/CD pipeline returned exit code 0.
The container has been crash-looping since deployment. Nobody was told.

What DeployGuard Detects (v0.1)

Infrastructure and dependency correctness failures — immediately after deploy

CrashLoopBackOff

Container starts, crashes, restarts. Kubernetes keeps retrying. Pipeline says "success."

ImagePullBackOff

Wrong tag, expired registry credentials, or image deleted. Pods stuck in Pending.

OOMKilled

Container exceeded memory limit. Killed silently. Users see 502.

ProgressDeadlineExceeded

Rollout timed out. Old pods still serving. New version never came up.

CreateContainerConfigError

Missing ConfigMap, Secret, or invalid volume mount. Container cannot start.

ExternalDependency

Broken permissions, missing queues, unreachable topics, failed scheduling.

What DeployGuard Does NOT Detect

Application logic bugs

Business rule exceptions

Validation errors

Circular dependency injection

DeployGuard guards deployment correctness — not application correctness.

The gap between "deployed" and "actually working" is where incidents live.
DeployGuard closes it.

Product Positioning

What DeployGuard Is — and Is Not

DeployGuard answers one specific question:

"Did this deployment break production even though Kubernetes says it's healthy?"

DeployGuard is NOT

A monitoring platform

An observability tool

An APM

A logging system

A metric collector

A dashboard builder

DeployGuard IS

Deployment failure detection

Infrastructure & dependency correctness guard

Semantic incident generator

Auto-resolution tracker

Post-deploy verification layer

Kubernetes-native agent

DeployGuard doesn't collect numbers about your system. It produces structured incidents when a deployment introduces an infrastructure or dependency correctness failure. It detects the failure, tracks its lifecycle, and confirms when it resolves.

Why Not Prometheus / Grafana?

Prometheus collects numbers. DeployGuard produces incidents.

Prometheus says:

container_restarts_total{pod="payment-7b9f8"} = 5

Something is wrong. You need to investigate which pod, what error, which deployment caused it, and whether it's still happening.

DeployGuard says:

CrashLoopBackOff in prod/payment-service
Container: payment | Since: 14:32
Commit: abc123 by @dev

This deployment broke this specific workload. Here's the failure type, the commit that caused it, and it's still active.

Tool	What It Produces	DeployGuard Produces
Prometheus / Grafana	Shows that container_restarts_total increased	Tells you "payment-service is in CrashLoopBackOff since 14:32, caused by deploy abc123"
Kubernetes Probes	Reports "Pod is Ready"	Reports "Pod passed readiness but is crash-looping every 30s"
CI/CD Pipeline	Reports "Deployment succeeded" (exit code 0)	Reports "Rollout timed out — ProgressDeadlineExceeded in prod/checkout"
APM / Sentry	Catches application exceptions after a request is served	Catches infrastructure failures before requests can be served

Prometheus answers "is something wrong?"

DeployGuard answers "this deployment broke PubSub permissions in prod/checkout at 14:32"

How DeployGuard Works

Six steps. No sidecars. No code changes. No instrumentation. No config files.

┌─────────────────────────────────────────────────────────────────┐
│                 Your Kubernetes Cluster                          │
│                                                                  │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │                  DeployGuard Agent                       │   │
│   │                                                          │   │
│   │  ✓ Watches: Pods / Deployments / Events / Nodes / PVCs │   │
│   │  ✓ Detects: CrashLoopBackOff, OOMKilled, etc.          │   │
│   │  ✓ Produces: Typed incidents with full context          │   │
│   │  ✓ Resolves: Auto-detects recovery                      │   │
│   │                                                          │   │
│   │  🔒 Read-only RBAC — Cannot modify your cluster          │   │
│   └─────────────────────────────────────────────────────────┘   │
│                              │                                   │
└──────────────────────────────│───────────────────────────────────┘
                               │ HTTPS (outbound only)
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                DeployGuard Control Plane (SaaS)                  │
│                                                                  │
│   ├── Incident Ingestion & Deduplication                        │
│   ├── Failure Lifecycle (active → resolved)                     │
│   └── Notification Delivery (Slack, Webhook)                    │
└─────────────────────────────────────────────────────────────────┘

Step 1

Create Account

Step 2

Register Cluster

Name your cluster. Select environment (dev / staging / prod).

Step 3

kubectl apply Agent

One command installs the agent. Creates namespace, RBAC, and deployment. Read-only.

Step 4

Agent Detects Failures

The agent watches Pods, Deployments, Events, Nodes, and PVCs. Detects correctness failures in real-time.

Step 5

Structured Incident Appears

Not a log line. A typed incident: CrashLoopBackOff, namespace, workload, severity, timestamp. Immediately.

Step 6

Explicit Resolution

When the workload recovers, the agent detects it and resolves the incident. Full lifecycle tracked.

What a DeployGuard Incident Looks Like

{
  "failureType":  "CrashLoopBackOff",
  "severity":     "error",
  "namespace":    "prod",
  "workloadType": "Deployment",
  "workloadName": "payment-service",
  "container":    "payment",
  "message":      "Back-off restarting failed container payment",
  "firstSeen":    "2026-02-16T14:32:01Z",
  "status":       "active"
}

Not a metric. Not a log line. A semantic incident with type, severity, and context.

Install in 10 Minutes

From zero to first incident detection. No Helm charts. No values.yaml. No operator to manage.

Step 1: Create Account

2 min

Step 2: Register Cluster

30 sec

Name your cluster and select environment (dev / staging / prod).

Step 3: Get Install Command

10 sec

Generate a one-time install URL with your API key baked in.

Step 4: kubectl apply

2 min

One command. Creates namespace, ServiceAccount, ClusterRole, and agent Deployment. Read-only RBAC.

Step 5: Agent Detects Failures

< 1 sec

Agent watches Pods, Deployments, Events, Nodes, and PVCs. Deployment correctness failures produce structured incidents.

Step 6: Incident + Resolution

real-time

Typed incident appears immediately. When the workload recovers, the agent resolves it automatically.

Terminal

Copy

# One command to install

$ kubectl apply -f https://agent.deployguard.net/install/YOUR_TOKEN

# Creates: namespace, serviceaccount, clusterrole, deployment

Total setup time: < 10 minutes

What You Get

Deployment correctness verification. Not another tool to configure.

Detection in Seconds

Know about deployment failures immediately. Before users. Before on-call escalation. Before anyone checks Slack.

Semantic Incidents

Not numbers. Not log lines. Typed incidents: CrashLoopBackOff, OOMKilled, ProgressDeadlineExceeded — with namespace, workload, and severity.

Zero Code Changes

Works with any language, framework, or runtime. The agent watches Kubernetes resource state, not your application code.

Read-Only Agent

Agent has zero write permissions. Cannot modify deployments, pods, or any cluster resource. RBAC enforced.

Commit-Level Context

Each failure is tied to a specific deployment, namespace, and git commit. No guessing which change broke things.

Full Lifecycle Tracking

Every incident has a first_seen, last_seen, and resolved_at. Active → Resolved. Full audit trail.

Built for Production Clusters

Security & Trust

We know you're protective of your infrastructure. So are we.

Read-Only RBAC

The agent only has get, list, and watch permissions. It cannot create, update, delete, or patch any resource in your cluster.

rules:
  - apiGroups: ["", "apps"]
    resources: ["pods", "deployments", "events",
                "replicasets", "nodes", "pvcs"]
    verbs: ["get", "list", "watch"]  # No write access
  - apiGroups: ["authorization.k8s.io"]
    resources: ["selfsubjectaccessreviews"]
    verbs: ["create"]  # RBAC self-check only

No Cluster Credentials

DeployGuard never receives your kubeconfig, API server URL, or cloud provider credentials. The agent runs inside your cluster using a ServiceAccount.

Short-Lived Agent JWTs

Agent JWTs expire in 1 hour and auto-refresh. Bootstrap tokens are single-use and expire in 24 hours. API keys use SHA-256 hashing — plaintext is never stored.

Trust Architecture

Customer Network              Internet              DeployGuard
     │                          │                      │
     │  HTTPS (outbound)        │                      │
     │─────────────────────────>│─────────────────────>│
     │                          │                      │
     │  HTTPS response          │                      │
     │<─────────────────────────│<─────────────────────│
     │                          │                      │
     │  No inbound traffic      │                      │
     │          ✗               │                      │

Agent runs in its own namespace (deployguard)

All communication is HTTPS / TLS-encrypted

Outbound-only — no inbound ports required

Tenant data is fully isolated by tenant_id

One-time bootstrap tokens for initial registration

API key hash storage only — no plaintext persistence

Stop Discovering Broken Deployments From Users

Kubernetes says Ready. CI/CD says success. But the deployment broke something.
DeployGuard tells you in seconds — not when a user files a ticket.

Start Free View Documentation

No credit card required. No sales call. Install in 10 minutes.

Kubernetes Says Healthy.Your System Is Broken.