How DeployGuard Works

A Kubernetes agent that detects deployment correctness failures your CI/CD pipeline cannot see. Here's exactly what it does, how it works, and what data it touches.

Full transparency — no hidden data collection, no surprises.

The Question We Answer

"Did this deployment break production even though Kubernetes says it's healthy?"

Your CI/CD pipeline returns exit code 0. Kubernetes reports all pods Ready. But the container is crash-looping, the image tag doesn't exist, or PubSub permissions were revoked. Nobody is told.

CrashLoopBackOff

Container crashes repeatedly. K8s keeps restarting. Pipeline says "success."

OOMKilled

Container exceeded memory limit. Killed silently. Users see 502.

ProgressDeadlineExceeded

Rollout timed out. Old pods still serving. New version never came up.

Six Steps — Zero to First Incident

From signup to structured incident detection in under 10 minutes.

Create Account

No credit card required. Your tenant is created instantly with a unique identifier that isolates all your data.

Register Cluster

Name your cluster. Select the environment (dev / staging / prod). The cluster record is created in DeployGuard.

Alternatively, the agent auto-creates the cluster record during registration if the cluster name is new.

kubectl apply Agent

Run a single kubectl command. It creates the deployguard namespace, a ServiceAccount, a read-only ClusterRole, and a Deployment running the agent.

The agent pod is < 50MB RAM. It runs in its own namespace with read-only RBAC. It cannot modify, create, or delete any resource in your cluster.

Agent Detects Deployment Failures

The agent watches Pods, Deployments, ReplicaSets, Events, Nodes, and PVCs using Kubernetes informers. It detects correctness failures in real-time.

Detected types: CrashLoopBackOff, ImagePullBackOff, OOMKilled, CreateContainerConfigError, ProgressDeadlineExceeded, ExternalDependency failures (permissions, queues, topics, mounts, scheduling).

Structured Incident Appears Immediately

Not a log line. A typed incident with failureType, severity, namespace, workload, container, message, and timestamp. Sent to the control plane, deduplicated, and stored.

Notifications fire via Slack or webhook. Cooldown prevents spam from flapping workloads. Each incident is tied to the git commit that caused the deployment.

Explicit Resolution

When the workload recovers (container becomes healthy, deployment progresses, PVC binds), the agent detects recovery and resolves the incident.

Full lifecycle tracking: active → resolved. Resolution is confirmed by the agent, not assumed. Fallback: TTL-based auto-resolve after 10 minutes of silence.

Complete Data Transparency

Exactly what the agent reads, what it reports, and what it never touches.

What the Agent Reads

Pod status, container states, restart counts
Deployment conditions (Progressing, Available)
Kubernetes Warning events (reasons, messages)
Node conditions (Ready, DiskPressure, MemoryPressure)
PVC phase (Pending, Bound, Lost)
Resource annotations (git commit SHA, author)

What the Agent Sends

Failure type (e.g., CrashLoopBackOff)
Namespace, workload name, container name
Severity (info / warning / error / critical)
Error message from Kubernetes
Timestamp and resolution status
Git commit SHA and author (from annotations)

What We Never Access

Your application source code
Environment variables or secrets
Container filesystem contents
Network traffic between your services
Database contents or credentials
Application logs (separate opt-in feature)

Architecture

Agent → Control Plane → Notification. Outbound only.

Your K8s Cluster

DeployGuard Agent

Read-only correctness guard

HTTPS (TLS)

Typed failure incidents

Outbound traffic only

DeployGuard Cloud

Incident dedup & lifecycle

Slack / Webhook notifications

Public status pages

The agent initiates all connections (outbound only). No inbound ports need to be opened in your cluster.

Security Model

Read-only. No credentials. Short-lived tokens.

Read-Only RBAC

ClusterRole with get, list, watch only. Plus SelfSubjectAccessReview for RBAC self-check.

resources: [pods, deployments, events, replicasets,

nodes, pvcs, pods/log]

verbs: [get, list, watch]

No Cluster Credentials

DeployGuard never receives your kubeconfig, API server URL, or cloud provider credentials. The agent runs inside your cluster using a Kubernetes ServiceAccount.

Short-Lived Agent JWTs

Agent JWTs expire in 1 hour and auto-refresh at 80% lifetime. API keys use SHA-256 hashing — plaintext is never stored on the server.

Tenant Isolation

Every query is scoped by tenant_id. Agent JWT embeds tenant identity at registration. There is zero cross-tenant data access by design.

Frequently Asked Questions

Does the agent have write access to my cluster?

No. The agent uses a ClusterRole with read-only permissions (get, list, watch on pods, deployments, events, replicasets, nodes, PVCs). The only write permission is creating SelfSubjectAccessReviews to check its own RBAC, and patching its own Deployment for self-update.

What is the difference between DeployGuard and Prometheus?

Prometheus collects time-series data and requires you to write alerting rules, build queries, and investigate which deployment caused an issue. DeployGuard produces typed incidents — "CrashLoopBackOff in prod/payment-service since 14:32, caused by commit abc123" — with no configuration.

Where is my data stored?

All data is stored in Google Cloud Platform in the EU region (europe-west1). Data is encrypted at rest and in transit using TLS. Your data never leaves the EU unless explicitly requested.

Can other tenants see my data?

No. DeployGuard uses strict multi-tenant isolation. Each tenant has a unique identifier, and all database queries are scoped to that tenant. There is no cross-tenant data access.

What happens if the agent goes down?

The control plane detects the missing heartbeat within 3 minutes and marks the agent as disconnected. Your cluster continues to operate normally — the agent is purely read-only and has no impact on your workloads.

How much resources does the agent use?

The agent is lightweight: typically < 50MB RAM and negligible CPU. It runs as a single pod in the deployguard namespace with configurable resource limits.

What does DeployGuard NOT detect?

DeployGuard does not detect application logic bugs, business rule exceptions, validation errors, or circular dependency injection. It guards deployment and infrastructure correctness — not application correctness.

Guard Your Deployments

Install in under 10 minutes. See your first structured incident immediately. Stop discovering broken deployments from users.

Get Started Free Read the Docs