Progressive Delivery with GitOps: Safer Deployments Using Argo Rollouts and Flagger

Beyond All-or-Nothing: The Case for Gradual Rollouts

You’ve adopted GitOps. Your infrastructure is declarative, version-controlled, and automatically reconciled. But when it comes to deploying application changes, are you still flipping a switch and hoping for the best?

Progressive delivery bridges this gap. Instead of instant cutover, traffic shifts gradually — 5% → 25% → 100% — with automated checks at every step. If metrics degrade, instant rollback. If health checks pass, automatic promotion. The result: safer deployments without sacrificing velocity.

The Progressive Delivery Stack

At its core, progressive delivery combines three capabilities:

  1. Traffic Shifting — Gradually move users from old to new version
  2. Automated Analysis — Continuously evaluate SLOs and business metrics
  3. Automatic Promotion/Rollback — Decisions based on data, not gut feeling

The two leading implementations in the Kubernetes ecosystem are Argo Rollouts and Flagger. Both integrate with existing GitOps workflows but approach progressive delivery differently.

Argo Rollouts: Native Kubernetes Experience

Argo Rollouts extends the Deployment concept with custom resources. You get canaries, blue-green deployments, and experiments using familiar Kubernetes primitives.

Architecture Overview

┌─────────────────────────────────────────┐
│           Argo Rollouts Controller      │
│  (manages Rollout CRD, traffic shaping) │
├─────────────────────────────────────────┤
│              Service Mesh               │
│    (Istio, Linkerd, NGINX, ALB, SMI)  │
├─────────────────────────────────────────┤
│           Prometheus/OTel               │
│         (metric queries for analysis)   │
└─────────────────────────────────────────┘

Example: Canary Deployment

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payment-service
spec:
  replicas: 10
  strategy:
    canary:
      canaryService: payment-service-canary
      stableService: payment-service-stable
      trafficRouting:
        istio:
          virtualService:
            name: payment-service-vs
            routes:
            - primary
      steps:
      - setWeight: 5
      - pause: {duration: 10m}
      - setWeight: 20
      - pause: {duration: 10m}
      - analysis:
          templates:
          - templateName: success-rate
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 100
      - analysis:
          templates:
          - templateName: success-rate
          - templateName: latency

Analysis Template

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 5m
    count: 3
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{service="payment-service",status=~"2.."}[5m]))
          /
          sum(rate(http_requests_total{service="payment-service"}[5m]))

Flagger: GitOps-Native Approach

Flagger takes a different approach. Instead of replacing Deployments, it works alongside them — creating canary resources and managing traffic splitting externally.

Architecture Overview

┌─────────────────────────────────────────┐
│              Flagger                    │
│  (watches Deployments, manages canary) │
├─────────────────────────────────────────┤
│         Service Mesh / Ingress          │
│  (Istio, Linkerd, NGINX, Gloo, Contour)│
├─────────────────────────────────────────┤
│         Prometheus/CloudWatch            │
│          (metrics for canary checks)  │
└─────────────────────────────────────────┘

Example: Automated Canary

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: payment-service
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  service:
    port: 8080
  analysis:
    interval: 30s
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
    webhooks:
    - name: load-test
      url: http://flagger-loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://payment-service-canary/"

Argo Rollouts vs Flagger: Quick Comparison

Aspect Argo Rollouts Flagger
Deployment Model Replaces Deployment with Rollout CRD Watches existing Deployments
GitOps Integration Argo CD native (same project) Works with any GitOps tool
Traffic Control Multiple meshes + ALB/NLB Multiple meshes + ingress controllers
Experimentation Built-in A/B/n testing A/B testing via webhooks
Analysis AnalysisTemplate/AnalysisRun CRDs Inline metric thresholds
Rollback Automatic on failed analysis Automatic on threshold breach

Metric-Driven Promotion

The magic happens when deployment decisions are based on actual system behavior, not time-based guesses.

Key Metrics to Watch

  • Golden Signals: Latency, traffic, errors, saturation
  • Business Metrics: Conversion rates, checkout completion
  • Infrastructure Metrics: CPU, memory, disk I/O

Prometheus Integration Example

# Argo Rollouts: P99 latency check
- name: p99-latency
  interval: 5m
  successCondition: result[0] <= 200
  provider:
    prometheus:
      address: http://prometheus.monitoring
      query: |
        histogram_quantile(0.99,
          sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
        )

# Flagger: Error rate check
metrics:
- name: request-success-rate
  thresholdRange:
    min: 99.0
  interval: 1m

Adoption Path: From GitOps to Progressive Delivery

For teams already running Argo CD or Flux, the transition is gradual:

Phase 1: Observability Foundation

  • Ensure metrics are flowing (Prometheus/Grafana operational)
  • Define SLOs and error budgets
  • Set up alerting on key services

Phase 2: First Canary

  • Pick a non-critical service with good metrics coverage
  • Install Argo Rollouts or Flagger controller
  • Convert Deployment to Rollout/Canary (small team impact)

Phase 3: Expand Coverage

  • Roll out to more services
  • Refine analysis templates based on learnings
  • Add automated load testing in canary phase

Phase 4: Advanced Patterns

  • A/B/n testing for feature validation
  • Multi-region progressive rollouts
  • Chaos engineering integration

Integration with Argo CD

Argo Rollouts shines here because it's part of the same ecosystem:

# Application manifest with Rollout
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payment-service
  namespace: argocd
spec:
  project: production
  source:
    repoURL: https://github.com/org/gitops-repo
    targetRevision: HEAD
    path: apps/payment-service
  destination:
    server: https://kubernetes.default.svc
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

The Rollout resource is just another Kubernetes object — Argo CD manages it like any Deployment.

Common Pitfalls and How to Avoid Them

Insufficient Metrics Coverage

Problem: Canary proceeds based on partial data.
Solution: Require minimum metric samples before promotion decision.

Overly Aggressive Traffic Shifts

Problem: 50% traffic jump exposes too many users to issues.
Solution: Use smaller steps (5% → 10% → 25% → 50% → 100%).

Ignoring Cold Start Effects

Problem: New pods show artificially high latency initially.
Solution: Add warmup period or exclude initial metrics from analysis.

When to Choose Which

Choose Argo Rollouts if:

  • You're already using Argo CD
  • You want tight integration with your GitOps workflow
  • You need sophisticated experimentation (A/B/n testing)

Choose Flagger if:

  • You use Flux or another GitOps tool
  • You prefer keeping native Deployments
  • You want simpler, less invasive setup

Conclusion

Progressive delivery isn't just a safety net — it's a competitive advantage. Teams that deploy confidently multiple times per day recover faster from incidents, validate features with real traffic, and reduce the blast radius of bad changes.

The tooling is mature, the patterns are proven, and the integration with existing GitOps workflows is seamless. Whether you choose Argo Rollouts or Flagger, the important step is starting: pick a service, set up your first canary, and let data drive your deployment decisions.


GitOps gave us declarative infrastructure. Progressive delivery gives us declarative confidence in our deployments.