it-stud.io – Christian Müller

März 14, 2026März 15, 2026

Measuring Developer Productivity in the AI Era: Beyond Velocity Metrics

Introduction

The promise of AI-assisted development is irresistible: 10x productivity gains, code written at the speed of thought, junior developers performing like seniors. But as organizations deploy GitHub Copilot, Claude Code, and other AI coding assistants, a critical question emerges: How do we actually measure the impact?

Traditional velocity metrics — story points completed, lines of code, pull requests merged — are increasingly inadequate. They measure output, not outcomes. Worse, they can be gamed, especially when AI can generate thousands of lines of code in seconds. This article explores modern frameworks for measuring developer productivity in the AI era, separating hype from reality and providing practical guidance for engineering leaders.

The Problem with Traditional Velocity Metrics

For decades, engineering teams have relied on metrics like:

Lines of Code (LOC): More code doesn’t mean better software. AI makes this metric meaningless — you can generate 10,000 lines in minutes.
Story Points / Velocity: Measures estimation consistency, not actual value delivered. Teams optimize for completing stories, not solving problems.
Pull Requests Merged: Encourages many small PRs over thoughtful changes. Doesn’t capture review quality or long-term impact.
Commits per Day: Trivially gameable. Says nothing about the value of those commits.

These metrics share a fundamental flaw: they measure activity, not productivity. In the AI era, activity is cheap. An AI can produce endless activity. What matters is whether that activity translates to business outcomes.

The SPACE Framework: A Holistic View

The SPACE framework, developed by researchers at GitHub, Microsoft, and the University of Victoria, offers a more nuanced approach. SPACE stands for:

Satisfaction and well-being
Performance
Activity
Communication and collaboration
Efficiency and flow

The key insight: productivity is multidimensional. No single metric captures it. Instead, you need a balanced set of metrics across all five dimensions, combining quantitative data with qualitative insights.

Applying SPACE to AI-Assisted Teams

When developers use AI coding assistants, SPACE metrics take on new meaning:

Satisfaction: Do developers feel AI tools help them? Or do they create frustration through incorrect suggestions and context-switching?
Performance: Are we shipping features that matter? Is customer satisfaction improving? Are we reducing incidents?
Activity: Still relevant, but must be interpreted carefully. High activity with AI might indicate productive use — or it might indicate the developer is blindly accepting suggestions.
Communication: Does AI change how teams collaborate? Are code reviews more or less effective? Is knowledge sharing happening?
Efficiency: Are developers spending less time on boilerplate? Is time-to-first-commit improving for new team members?

DORA Metrics: Outcomes Over Output

The DORA (DevOps Research and Assessment) metrics focus on delivery performance:

Deployment Frequency: How often do you deploy to production?
Lead Time for Changes: How long from commit to production?
Change Failure Rate: What percentage of deployments cause failures?
Mean Time to Recovery (MTTR): How quickly do you recover from failures?

DORA metrics are outcome-oriented: they measure the effectiveness of your entire delivery pipeline, not individual developer activity. In the AI era, they remain highly relevant — perhaps more so. AI should theoretically improve all four metrics. If it doesn’t, something is wrong.

AI-Specific DORA Extensions

Consider tracking additional metrics when AI is involved:

AI Suggestion Acceptance Rate: What percentage of AI suggestions are accepted? Too high might indicate rubber-stamping; too low suggests the tool isn’t helping.
AI-Assisted Change Failure Rate: Do changes written with AI assistance fail more or less often?
Time Saved per Task Type: For which tasks does AI provide the most leverage? Boilerplate? Tests? Documentation?

The „10x“ Reality Check

Marketing claims of „10x productivity“ with AI are pervasive. The reality is more nuanced:

Studies show 10-30% improvements in specific tasks like writing boilerplate code, generating tests, or explaining unfamiliar codebases.
Complex problem-solving sees minimal AI uplift. Architecture decisions, debugging subtle issues, and understanding business requirements still depend on human expertise.
Junior developers may see larger gains — AI helps them write syntactically correct code faster. But they still need to learn why code works, or they’ll introduce subtle bugs.
10x claims often compare against unrealistic baselines (e.g., writing everything from scratch vs. using any tooling at all).

A realistic expectation: AI provides meaningful productivity gains for certain tasks, modest gains overall, and requires investment in learning and integration to realize benefits.

Practical Metrics for AI-Era Teams

Based on SPACE, DORA, and real-world experience, here are concrete metrics to track:

Quantitative Metrics

Metric	What It Measures	AI-Era Considerations
Main Branch Success Rate	% of commits that pass CI on main	Should improve with AI; if not, AI may be introducing bugs
MTTR	Time to recover from incidents	AI-assisted debugging should reduce this
Time to First Commit (new devs)	Onboarding effectiveness	AI should accelerate ramp-up
Code Review Turnaround	Time from PR open to merge	AI-generated code may need more careful review
Test Coverage Delta	Change in test coverage over time	AI can generate tests; is coverage improving?

Qualitative Metrics

Developer Experience Surveys: Regular pulse checks on tool satisfaction, flow state, friction points.
AI Tool Usefulness Ratings: For each major task type, how helpful is AI? (Scale 1-5)
Knowledge Retention: Are developers learning, or becoming dependent on AI? Periodic assessments can reveal this.

Tooling: Waydev, LinearB, and Beyond

Several platforms now offer AI-era productivity analytics:

Waydev: Integrates with Git, Jira, and CI/CD to provide DORA metrics and developer analytics. Offers AI-specific insights.
LinearB: Focuses on workflow metrics, identifying bottlenecks in the development process. Good for measuring cycle time and review efficiency.
Pluralsight Flow (formerly GitPrime): Deep git analytics with focus on team patterns and individual contribution.
Jellyfish: Connects engineering metrics to business outcomes, helping justify AI tool investments.

When evaluating tools, ensure they can:

Distinguish between AI-assisted and non-AI-assisted work (if your tools support this tagging)
Provide qualitative feedback mechanisms alongside quantitative data
Avoid creating perverse incentives (e.g., rewarding lines of code)

Avoiding Measurement Pitfalls

Don’t use metrics punitively. Metrics are for learning, not for ranking developers. The moment metrics become tied to performance reviews, they get gamed.
Don’t measure too many things. Pick 5-7 key metrics across SPACE dimensions. More than that creates noise.
Do measure trends, not absolutes. A team’s MTTR improving over time is more meaningful than comparing MTTR across different teams.
Do include qualitative data. Numbers without context are dangerous. Regular conversations with developers provide essential context.
Do revisit metrics regularly. As AI tools evolve, so should your measurement approach.

Conclusion

Measuring developer productivity in the AI era requires abandoning simplistic velocity metrics in favor of holistic frameworks like SPACE and outcome-oriented measures like DORA. The „10x productivity“ hype should be tempered with realistic expectations: AI provides meaningful but not transformative gains, and those gains vary significantly by task type and developer experience.

The organizations that will thrive are those that invest in thoughtful measurement — combining quantitative data with qualitative insights, tracking outcomes rather than output, and continuously refining their approach as AI tools mature.

Start by auditing your current metrics. Are they measuring activity or productivity? Then layer in SPACE dimensions and DORA outcomes. Finally, talk to your developers — their lived experience with AI tools is the most valuable data point of all.

März 13, 2026März 13, 2026

Intent-Driven Infrastructure: From IaC Scripts to Self-Reconciling Platforms

Introduction

For years, Infrastructure as Code (IaC) has been the gold standard for managing cloud resources. Tools like Terraform, Pulumi, and CloudFormation brought version control, repeatability, and collaboration to infrastructure management. But as cloud environments grow in complexity, a fundamental tension has emerged: IaC scripts describe how to build infrastructure, not what infrastructure should look like.

Intent-driven infrastructure flips this paradigm. Instead of writing imperative scripts or even declarative configurations that describe specific resources, you express intents — high-level descriptions of desired outcomes. The platform then continuously reconciles reality with intent, automatically correcting drift, scaling resources, and enforcing policies.

This article explores how intent-driven infrastructure works, the technologies enabling it, and practical steps to adopt this approach in your organization.

The Limitations of Traditional IaC

Traditional IaC has served us well, but several pain points are driving the need for evolution:

Configuration Drift: Despite declarative tools, drift between desired and actual state is common. Manual changes, failed applies, and partial rollbacks create inconsistencies that require human intervention to resolve.
Brittle Pipelines: CI/CD pipelines for infrastructure often break on edge cases — timeouts, API rate limits, dependency ordering. Recovery requires manual debugging and re-running pipelines.
Cognitive Overhead: Developers must understand cloud-provider-specific APIs, resource dependencies, and lifecycle management. This creates a bottleneck where only specialized engineers can make infrastructure changes.
Day-2 Operations Gap: Most IaC tools excel at provisioning but struggle with ongoing operations — scaling, patching, certificate rotation, and compliance enforcement.

What is Intent-Driven Infrastructure?

Intent-driven infrastructure introduces a higher level of abstraction. Instead of specifying individual resources, you express intents like:

“I need a production-grade PostgreSQL database with 99.9% availability, encrypted at rest, accessible only from the application namespace, with automated backups retained for 30 days.”

The platform interprets this intent and:

Compiles it into concrete resource definitions (RDS instance, security groups, backup policies, monitoring rules)
Validates against organizational policies (cost limits, security requirements, compliance rules)
Provisions the resources across the appropriate cloud accounts
Continuously reconciles — if drift is detected, the platform automatically corrects it

Core Architectural Patterns

Kubernetes as Universal Control Plane

The Kubernetes API server and its reconciliation loop have proven to be remarkably versatile. Projects like Crossplane leverage this pattern to manage any infrastructure resource through Kubernetes Custom Resource Definitions (CRDs). The key insight: the reconciliation loop that keeps your pods running can also keep your cloud infrastructure aligned with intent.

Crossplane Compositions as Intent Primitives

Crossplane v2 Compositions allow platform teams to define reusable, opinionated templates that abstract away provider-specific complexity. A single DatabaseIntent CRD can provision an RDS instance on AWS, Cloud SQL on GCP, or Azure Database — the developer only expresses intent, not implementation.

apiVersion: platform.example.com/v1alpha1
kind: DatabaseIntent
metadata:
  name: orders-db
spec:
  engine: postgresql
  version: "16"
  availability: high
  encryption: true
  backup:
    retentionDays: 30
  network:
    allowFrom:
      - namespace: orders-app

Policy Guardrails: OPA, Kyverno, and Cedar

Intent without governance is chaos. Policy engines ensure that every intent is validated before execution:

OPA (Open Policy Agent) / Gatekeeper: Rego-based policies for Kubernetes admission control. Powerful but requires learning a new language.
Kyverno: YAML-native policies that feel natural to Kubernetes operators. Lower barrier to entry, excellent for common patterns.
Cedar: AWS-backed authorization language for fine-grained access control. Emerging as a standard for application-level policy.

Together, these tools enforce constraints like cost ceilings, security baselines, and compliance requirements — automatically, at every change.

Continuous Reconciliation vs. Imperative Apply

The fundamental shift from traditional IaC to intent-driven infrastructure is moving from imperative apply (run a pipeline to make changes) to continuous reconciliation (the platform constantly ensures reality matches intent). This eliminates drift by design rather than detecting it after the fact.

Orchestration Platforms: Humanitec and Score

Humanitec provides an orchestration layer that translates developer intent into fully resolved infrastructure configurations. Using Score (an open-source workload specification), developers describe what their application needs without specifying how it is provisioned. The platform engine resolves dependencies, applies organizational rules, and generates deployment manifests.

Benefits in Practice

Faster Recovery: When infrastructure drifts or fails, the reconciliation loop automatically corrects it. MTTR drops from hours to minutes.
Safer Changes: Policy gates validate every change before execution. No more “oops, I deleted the production database” moments.
Developer Velocity: Developers express intent in familiar terms, not cloud-provider-specific configurations. Time-to-production for new services drops significantly.
Compliance by Default: Security, cost, and regulatory policies are enforced continuously, not checked periodically.
AI-Agent Compatibility: Intent-based APIs are natural interfaces for AI agents. An AI coding assistant can express “I need a cache with 10GB capacity” without understanding the intricacies of ElastiCache configuration.

Challenges and Guardrails

Intent-driven infrastructure is not without its challenges:

Abstraction Leakage: When things go wrong, engineers need to understand the underlying resources. Too much abstraction can make debugging harder.
Policy Complexity: As organizations grow, policy definitions can become complex and conflicting. Invest in policy testing and simulation.
Observability: You need new metrics — not just “is the resource healthy?” but “is the intent satisfied?” Intent satisfaction metrics are a new concept for most teams.
Migration Path: Existing Terraform/Pulumi codebases represent significant investment. Migration must be gradual, starting with new workloads and selectively adopting intent-driven patterns for existing ones.
Organizational Change: Intent-driven infrastructure shifts responsibilities. Platform teams own the abstraction layer; application teams own the intents. This requires clear role definitions and trust.

Getting Started: A Minimal Viable Implementation

Start Small: Pick one workload type (e.g., databases) and create an intent CRD using Crossplane Compositions.
Add Policy Gates: Implement basic Kyverno policies for cost limits and security baselines.
Enable Reconciliation: Let the Crossplane controller continuously reconcile. Monitor drift detection and auto-correction rates.
Measure Impact: Track MTTR, change drift frequency, time-to-recover, and developer satisfaction.
Iterate: Expand to more resource types, add more sophisticated policies, and integrate with your IDP (Internal Developer Portal).

Conclusion

Intent-driven infrastructure represents the next evolution of Infrastructure as Code. By shifting from imperative scripts to declarative intents backed by continuous reconciliation and policy guardrails, organizations can build platforms that are more resilient, more secure, and more developer-friendly.

The tools are maturing rapidly — Crossplane, Humanitec, OPA, Kyverno, and the broader Kubernetes ecosystem provide a solid foundation. The question is no longer whether to adopt intent-driven patterns, but how fast your team can start the journey.

Start with a single workload, prove the value, and scale from there. Your future self — debugging a production issue at 3 AM — will thank you when the platform auto-heals before you even finish your coffee.

März 13, 2026März 13, 2026

Golden Paths for AI-Generated Code: How Platform Teams Keep Up with Machine-Speed Development

The AI Development Velocity Gap

AI coding assistants have fundamentally changed how software gets written. GitHub Copilot, Claude Code, Cursor, and their ilk are delivering on the promise of 55% faster development cycles—but they’re also creating a bottleneck that most organizations haven’t anticipated.

The problem isn’t the code generation. It’s what happens after the AI writes it.

Traditional code review processes, pipeline configurations, and compliance checks weren’t designed for machine-speed development. When a developer can generate 500 lines of functional code in minutes, but your security scan takes 45 minutes and your approval workflow spans three days, you’ve created a velocity cliff. The AI accelerates development right up to the point where organizational friction brings it to a halt.

This is where Golden Paths come in—not as a new concept, but as an evolution. Platform engineering teams are realizing that paved roads designed for human developers need to be reimagined for AI-assisted development. The path itself needs to be machine-consumable.

What Makes a Golden Path „AI-Native“?

Traditional Golden Paths provide opinionated defaults: here’s how we build microservices, here’s our standard CI/CD pipeline, here’s our approved tech stack. AI-native Golden Paths go further—they encode organizational knowledge in formats that both humans and AI assistants can understand and follow.

The Three Layers

1. Templates as Machine Instructions

Backstage scaffolders and Cookiecutter templates have always been about consistency. But when an AI assistant generates code, it needs to know not just what to create, but how to create it according to your standards.

Modern template systems are evolving to include:

Intent declarations — What is this template for? („Internal API with PostgreSQL, OAuth, and OpenTelemetry“)
Constraint specifications — What’s non-negotiable? („All services must use mTLS, secrets must reference Vault, no direct database access from handlers“)
Context documentation — Why these decisions? („mTLS required for zero-trust compliance, Vault integration prevents secret sprawl“)

This isn’t just documentation for humans. It’s context that AI assistants can consume to generate code that already complies with your standards—before the first commit.

2. Embedded Governance

The old model: write code, submit PR, wait for review, fix issues, merge. The AI-native model: generate compliant code from the start.

Golden Paths are increasingly embedding governance as code:

# Example: Terraform module with embedded policy
module "service_template" {
  source = "platform/golden-paths//microservice"
  
  # Intent declaration
  service_type = "internal-api"
  data_stores  = ["postgresql"]
  
  # Embedded compliance
  security_profile = "pci-dss"  # Enforces mTLS, encryption at rest, audit logging
  observability    = "full"     # Auto-injects OTel, requires SLO definitions
  
  # AI assistant instructions
  ai_context = {
    testing_strategy = "contract-first"
    docs_requirement = "openapi-generated"
    deployment_model = "canary-required"
  }
}

The AI assistant—whether it’s generating the initial service scaffold or helping add a new endpoint—has explicit guidance about organizational requirements. The „shift left“ here isn’t just moving security earlier; it’s embedding organizational knowledge so deeply that compliance becomes the path of least resistance.

3. Continuous Validation, Not Gates

Traditional pipelines are gate-based: run tests, run security scans, wait for approval, deploy. AI-native Golden Paths favor continuous validation: the path itself ensures compliance, and deviations are caught immediately—not at PR time.

Tools like Datadog’s Service Catalog, Cortex, and Port are evolving from static documentation to active validation systems. They don’t just record that your service should have tracing; they verify it’s actually emitting traces, that SLOs are defined, that dependencies are documented. The Golden Path becomes a living specification, continuously reconciled against reality.

The Platform Team’s New Role

This shift changes what platform engineering teams optimize for. Previously, the goal was standardization—get everyone using the same tools, the same patterns, the same pipelines. Now, the goal is machine-consumable context.

Platform teams are becoming curators of organizational knowledge. Their deliverables aren’t just templates and Terraform modules, but:

Decision records as structured data — Why do we use Kafka over RabbitMQ? The reasoning needs to be parseable by AI assistants, not just documented in Confluence.
Architecture constraints as code — Policy definitions that both CI pipelines and AI assistants can evaluate.
Context about context — Metadata about when standards apply, what exceptions exist, and how to evolve them.

The best platform teams are already treating their Golden Paths as products—with user research (what do developers and AI assistants struggle with?), iteration (which constraints are too burdensome?), and metrics (time from idea to production, compliance drift, developer satisfaction).

Practical Implementation: Start Small

The organizations succeeding with AI-native Golden Paths aren’t boiling the ocean. They’re starting with one painful workflow and making it AI-friendly.

Phase 1: One Service Template

Pick your most common service type—probably an internal API—and create a template that encodes your current best practices. But don’t stop at file generation. Include:

A Backstage scaffolder with clear, structured metadata
CI/CD pipelines that validate compliance automatically
Documentation that explains why each decision was made
Example prompts that developers (or AI assistants) can use to extend the service

Phase 2: Expand to Common Patterns

Once the first template proves valuable, expand to other frequent scenarios:

Data pipeline templates („Ingest from Kafka, transform with dbt, load to Snowflake“)
ML serving templates („Model deployment with A/B testing, canary analysis, and drift detection“)
Frontend component templates („React component with Storybook, accessibility tests, and design system integration“)

For each, the goal isn’t just consistency—it’s making the organizational knowledge machine-consumable.

Phase 3: Active Validation

The final evolution is continuous reconciliation. Your Golden Path specifications should be validated against actual running services, with drift detection and automated remediation where possible. If a service was created with the „internal-api“ template but no longer has the required observability, the platform should flag it—not as a compliance violation, but as a service that’s fallen off the golden path.

The Competitive Imperative

Organizations that solve this problem will have a compounding advantage. Their developers—augmented by AI assistants—will move at machine speed, but with organizational guardrails that ensure security, compliance, and maintainability. Those stuck with human-speed governance processes will find their AI investments stalling at the velocity cliff.

The question isn’t whether to adopt AI coding assistants. That ship has sailed. The question is whether your platform can keep up with the pace they enable.

Golden Paths aren’t new. But Golden Paths designed for AI-generated code? That’s the platform engineering challenge of 2026.

Want to implement AI-native Golden Paths? Start with your most painful developer workflow. Make the path so clear that both humans and AI assistants can follow it without thinking. Then iterate.

März 9, 2026März 9, 2026

Non-Human Identity: Why Your AI Agents Need Their Own IAM Strategy

Every identity in your infrastructure tells a story. For decades, that story was simple: a human logs in, does work, logs out. But today, the cast of characters has exploded. Service accounts, API keys, CI/CD runners, Kubernetes operators, cloud functions, and now—AI agents that reason, plan, and act autonomously. Welcome to the era of Non-Human Identity (NHI), where the machines outnumber the people, and your IAM strategy hasn’t caught up.

If you’re a DevOps engineer, security architect, or platform engineer, this isn’t theoretical. This is the attack surface you’re defending right now, whether you know it or not.

The NHI Sprawl Problem: Your Identities Are Already Out of Control

Here’s a number that should keep you up at night: in the average enterprise, non-human identities outnumber human users by 45:1. In some DevOps-heavy organizations analyzed by Entro Security’s 2025 report, that ratio has climbed to 144:1—a 44% year-over-year increase driven by AI agents, CI/CD automation, and third-party integrations.

GitGuardian’s 2025 State of Secrets Sprawl report paints an equally alarming picture: 23.77 million new secrets leaked on GitHub in 2024 alone, a 25% increase from the previous year. Repositories using AI coding assistants like GitHub Copilot show 40% higher secret leak rates. And 70% of secrets first detected in public repositories in 2022 are still active.

This is NHI sprawl: an uncontrolled proliferation of machine credentials—API keys, service account tokens, OAuth client secrets, SSH keys, database passwords—scattered across your infrastructure, your CI/CD pipelines, your Slack channels, and your Jira tickets. 43% of exposed secrets now appear outside code repositories entirely.

The scale of the problem becomes clear when you inventory what qualifies as a non-human identity:

Service accounts in cloud providers (AWS IAM roles, GCP service accounts, Azure managed identities)
API keys and tokens for SaaS integrations
CI/CD runner identities (GitHub Actions, GitLab CI, Jenkins)
Kubernetes service accounts and workload identities
Infrastructure-as-code automation (Terraform, Pulumi state backends)
AI agents that autonomously call APIs, deploy code, or access databases

Each one of these is an identity. Each one needs authentication, authorization, and lifecycle management. And most organizations are managing them with the same tools they built for humans in 2015.

Why Traditional IAM Fails for AI Agents

Traditional IAM was designed around a specific model: a human authenticates (usually with a password plus MFA), receives a session, performs actions within their role, and eventually logs out. The entire architecture assumes a bounded, interactive session with a human making decisions at the keyboard.

AI agents break every one of these assumptions.

Ephemeral lifecycles. An AI agent might exist for seconds—spun up to process a request, execute a multi-step workflow, and terminate. Traditional identity provisioning, which relies on onboarding workflows, approval chains, and manual deprovisioning, can’t keep up with entities that live and die in milliseconds.

Non-interactive authentication. Agents don’t type passwords. They don’t respond to MFA push notifications. They authenticate through tokens, certificates, or workload attestation—mechanisms that traditional IAM treats as second-class citizens.

Dynamic scope requirements. A human user typically has a stable role: „developer,“ „SRE,“ „database admin.“ An AI agent’s required permissions can change from task to task, even within a single execution chain. It might need read access to a monitoring API, then write access to a deployment pipeline, then database credentials—all in one workflow.

Scale that breaks assumptions. When your environment can spin up thousands of autonomous agents concurrently—each needing unique, auditable credentials—the per-identity overhead of traditional IAM becomes a bottleneck, not a safeguard.

No human in the loop (by design). The entire value proposition of AI agents is autonomy. But traditional IAM’s risk controls assume a human is making judgment calls. When an agent autonomously decides to escalate a deployment or modify infrastructure, who approved that access?

Delegation Chains: The Trust Problem That Keeps Growing

Perhaps the most fundamental challenge with AI agent identity is delegation. In traditional systems, delegation is simple: Alice grants Bob access to a shared folder. The chain is short, auditable, and traceable.

With AI agents, delegation becomes a recursive chain. Consider this scenario:

A developer asks an AI orchestrator to „deploy the latest release to staging“
The orchestrator delegates to a CI/CD agent to build and test
The CI/CD agent delegates to a security scanning agent to verify compliance
The security agent delegates to a cloud provider API to check configurations
Each hop requires credentials, and each hop reduces the trust boundary

This is a delegation chain: a sequence of authority transfers where each agent acts on behalf of the previous one. The security questions multiply at each hop: Did the original user authorize this entire chain? Can intermediate agents expand their scope? What happens when one link in the chain is compromised?

Without a formal delegation model, you get what security teams call ambient authority—agents inheriting broad permissions from their caller without explicit, auditable constraints. This is how lateral movement attacks happen in agent-driven architectures.

OpenID Connect for Agents: Standards Are Catching Up

The good news: the identity standards community has recognized this gap. The OpenID Foundation published its „Identity Management for Agentic AI“ whitepaper in 2025, and work on OpenID Connect for Agents (OIDC-A) 1.0 is actively progressing.

OIDC-A extends the familiar OAuth 2.0 / OpenID Connect framework with agent-specific capabilities:

Agent authentication: Agents receive ID Tokens with claims that identify them as non-human entities, including their type, model, provider, and capabilities
Delegation chain validation: New claims like delegator_sub (who delegated authority), delegation_chain (full history of authority transfers), and delegation_constraints (scope and time limits) enable relying parties to validate the entire trust chain
Scope attenuation per hop: Each delegation step can only reduce scope, never expand it—a critical safeguard against privilege escalation
Purpose binding: The delegation_purpose claim ties access to a specific intent, supporting auditability and compliance
Attestation verification: JWT-based attestation evidence lets relying parties verify the integrity and provenance of an agent before trusting its claims

The delegation flow works like this: a user authenticates and explicitly authorizes delegation to an agent. The authorization server issues a scoped ID Token to the agent with the delegation chain attached. The agent can then present this token to downstream services, which validate the chain—checking chronological ordering, trusted issuers, scope reduction at each hop, and constraint enforcement.

This is a fundamental shift from „the agent has a service account with broad permissions“ to „the agent carries a verifiable, constrained, auditable proof of delegated authority.“ The difference matters enormously for security posture.

Modern Approaches: Zero Standing Privilege and Beyond

Standards provide the protocol layer. But implementing NHI security in practice requires adopting a set of architectural principles that go beyond what traditional IAM offers.

Zero Standing Privilege (ZSP)

The single most impactful principle for NHI security is eliminating standing privileges entirely. No agent, service account, or workload should have persistent access to any resource. Instead, all access is granted just-in-time (JIT)—requested, approved (potentially automatically based on policy), and expired within a defined window.

This sounds radical, but it’s increasingly practical. Tools like Britive, Apono, and P0 Security provide JIT access platforms that can provision and deprovision cloud IAM roles, database credentials, and Kubernetes RBAC bindings in seconds. The agent requests access, the policy engine evaluates the request against contextual signals (time, identity chain, workload attestation, behavioral baseline), and temporary credentials are issued.

The result: even if an agent is compromised, there are no standing credentials to steal. The blast radius collapses from „everything the service account could ever access“ to „whatever the agent was authorized for in that specific moment.“

SPIFFE and Workload Identity

SPIFFE (Secure Production Identity Framework for Everyone) and its runtime implementation SPIRE represent the most mature approach to cryptographic workload identity. SPIFFE assigns every workload a unique, verifiable identity (SPIFFE ID) and issues short-lived credentials called SVIDs (SPIFFE Verifiable Identity Documents)—either X.509 certificates or JWTs.

For AI agents, SPIFFE provides several critical capabilities:

Runtime attestation: Identities are bound to workload attributes (container metadata, node selectors, cloud instance tags) rather than static credentials
Automatic rotation: SVIDs are short-lived and automatically renewed, eliminating the credential rotation problem
Federated trust: SPIFFE trust domains can federate across organizational boundaries, enabling secure agent-to-agent communication in multi-cloud environments
No shared secrets: Authentication uses cryptographic proof, not shared API keys or passwords

SPIFFE is already integrated with HashiCorp Vault, Istio, Envoy, and major cloud provider identity systems. An IETF draft currently profiles OAuth 2.0 to accept SPIFFE SVIDs for client authentication, bridging the gap between workload identity and application-layer authorization.

Verifiable Credentials for Agents

The W3C Verifiable Credentials (VC) model, originally designed for human identity use cases, is being adapted for non-human identities. In this model, an agent carries a set of cryptographically signed credentials that attest to its capabilities, provenance, and authorization—without requiring real-time connectivity to a central authority.

This is particularly powerful for offline-capable agents and edge deployments where agents may need to prove their identity and authorization without reaching back to a central IdP. Combined with OIDC-A delegation chains, verifiable credentials create a portable, tamper-evident identity for AI agents.

Teleport: First-Class Non-Human Identities in Practice

While standards and frameworks provide the conceptual foundation, some platforms are already implementing first-class NHI support. Teleport is a notable example, offering unified identity governance that treats machine identities with the same rigor as human users.

Teleport’s approach covers the full infrastructure stack—SSH servers, RDP gateways, Kubernetes clusters, databases, internal web applications, and cloud APIs—under a single identity and access management plane. What makes it relevant for NHI is the architecture:

Certificate-based identity: Every connection (human or machine) authenticates via short-lived certificates, not static keys or passwords
Workload identity integration: Machine-to-machine communication uses cryptographic identity tied to workload attestation
Unified audit trail: Human and non-human access events appear in the same audit log, enabling correlation and compliance
Just-in-time access requests: Both humans and machines can request elevated access through the same workflow, with policy-driven approval

Similarly, vendors like Britive and P0 Security are building platforms specifically designed for the NHI challenge—providing discovery, classification, and JIT governance for the thousands of non-human identities scattered across cloud environments.

The key insight from these implementations: treating non-human identities as a governance afterthought (i.e., handing out long-lived service account keys and hoping for the best) is no longer viable. First-class NHI support means the same identity lifecycle, the same audit rigor, and the same least-privilege enforcement—applied uniformly to every identity in your infrastructure.

Practical Implementation Guidelines for NHI Security

Moving from theory to practice requires a structured approach. Here’s a roadmap for engineering teams building NHI security into their platforms.

1. Inventory and Classify Your Non-Human Identities

You can’t secure what you can’t see. Start with a comprehensive inventory of every NHI in your environment—service accounts, API keys, OAuth clients, CI/CD tokens, workload identities, and AI agent credentials. Classify them by criticality, scope, and lifecycle. Many organizations discover they have 10–50x more NHIs than they estimated.

2. Eliminate Long-Lived Credentials

Every static API key and long-lived service account token is a breach waiting to happen. Establish a migration plan to replace them with short-lived, automatically rotated credentials. Prioritize high-privilege credentials first. Use workload identity federation (GCP Workload Identity, AWS IAM Roles for Service Accounts, Azure Workload Identity) to eliminate static credentials for cloud-native workloads.

3. Implement Zero Standing Privilege for Agents

No AI agent should have permanent access to production resources. Deploy JIT access platforms that provision credentials on-demand with automatic expiration. Define policies that evaluate request context—who triggered the agent, what task it’s performing, what workload attestation it carries—before issuing credentials.

4. Adopt Cryptographic Workload Identity

Deploy SPIFFE/SPIRE or equivalent workload identity infrastructure. Issue SVIDs to your agents tied to runtime attestation. Use mTLS for agent-to-service communication and JWT-SVIDs for application-layer authorization. This eliminates shared secrets from your architecture entirely.

5. Model and Enforce Delegation Chains

For agentic workflows where AI agents delegate to other agents, implement explicit delegation tracking. Whether you adopt OIDC-A or build a custom solution, ensure that every delegation hop is recorded, scope is attenuated (never expanded), and the original authorizing identity is always traceable. Use policy engines like OPA (Open Policy Agent) to enforce delegation constraints at each service boundary.

6. Unify Human and Non-Human Audit Trails

Your SIEM shouldn’t have separate views for human and machine access. Correlation is critical—when an AI agent accesses a database after a human triggered a deployment, that causal chain must be visible in a single audit view. Ensure your identity platform emits structured logs that include delegation chains, workload attestation, and request context.

7. Build Behavioral Baselines for Agent Activity

AI agents produce distinct behavioral patterns—API call frequencies, resource access sequences, timing distributions. Establish baselines and alert on deviations. Unlike human users, agent behavior should be relatively predictable; anomalies are a strong signal of compromise or misconfiguration.

The Road Ahead

Gartner predicts that 30% of enterprises will deploy autonomous AI agents by 2026. With emerging standards like OIDC-A, maturing frameworks like SPIFFE, and vendors building first-class NHI platforms, the tooling is finally catching up to the problem.

But the window for proactive implementation is closing. Organizations that wait for NHI sprawl to become a security incident—and over 50 NHI-linked breaches were reported in H1 2025 alone—will be playing catch-up from a position of compromise.

The bottom line: your AI agents are identities. They need authentication, authorization, delegation controls, lifecycle management, and audit trails—just like your human users. The difference is scale, speed, and autonomy. Build your IAM strategy accordingly, or the agents will build their own—and you won’t like the result.

März 8, 2026März 9, 2026

Progressive Delivery with GitOps: Safer Deployments Using Argo Rollouts and Flagger

Beyond All-or-Nothing: The Case for Gradual Rollouts

You’ve adopted GitOps. Your infrastructure is declarative, version-controlled, and automatically reconciled. But when it comes to deploying application changes, are you still flipping a switch and hoping for the best?

Progressive delivery bridges this gap. Instead of instant cutover, traffic shifts gradually — 5% → 25% → 100% — with automated checks at every step. If metrics degrade, instant rollback. If health checks pass, automatic promotion. The result: safer deployments without sacrificing velocity.

The Progressive Delivery Stack

At its core, progressive delivery combines three capabilities:

Traffic Shifting — Gradually move users from old to new version
Automated Analysis — Continuously evaluate SLOs and business metrics
Automatic Promotion/Rollback — Decisions based on data, not gut feeling

The two leading implementations in the Kubernetes ecosystem are Argo Rollouts and Flagger. Both integrate with existing GitOps workflows but approach progressive delivery differently.

Argo Rollouts: Native Kubernetes Experience

Argo Rollouts extends the Deployment concept with custom resources. You get canaries, blue-green deployments, and experiments using familiar Kubernetes primitives.

Architecture Overview

┌─────────────────────────────────────────┐
│           Argo Rollouts Controller      │
│  (manages Rollout CRD, traffic shaping) │
├─────────────────────────────────────────┤
│              Service Mesh               │
│    (Istio, Linkerd, NGINX, ALB, SMI)  │
├─────────────────────────────────────────┤
│           Prometheus/OTel               │
│         (metric queries for analysis)   │
└─────────────────────────────────────────┘

Example: Canary Deployment

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payment-service
spec:
  replicas: 10
  strategy:
    canary:
      canaryService: payment-service-canary
      stableService: payment-service-stable
      trafficRouting:
        istio:
          virtualService:
            name: payment-service-vs
            routes:
            - primary
      steps:
      - setWeight: 5
      - pause: {duration: 10m}
      - setWeight: 20
      - pause: {duration: 10m}
      - analysis:
          templates:
          - templateName: success-rate
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 100
      - analysis:
          templates:
          - templateName: success-rate
          - templateName: latency

Analysis Template

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 5m
    count: 3
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{service="payment-service",status=~"2.."}[5m]))
          /
          sum(rate(http_requests_total{service="payment-service"}[5m]))

Flagger: GitOps-Native Approach

Flagger takes a different approach. Instead of replacing Deployments, it works alongside them — creating canary resources and managing traffic splitting externally.

Architecture Overview

┌─────────────────────────────────────────┐
│              Flagger                    │
│  (watches Deployments, manages canary) │
├─────────────────────────────────────────┤
│         Service Mesh / Ingress          │
│  (Istio, Linkerd, NGINX, Gloo, Contour)│
├─────────────────────────────────────────┤
│         Prometheus/CloudWatch            │
│          (metrics for canary checks)  │
└─────────────────────────────────────────┘

Example: Automated Canary

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: payment-service
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  service:
    port: 8080
  analysis:
    interval: 30s
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
    webhooks:
    - name: load-test
      url: http://flagger-loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://payment-service-canary/"

Argo Rollouts vs Flagger: Quick Comparison

Aspect	Argo Rollouts	Flagger
Deployment Model	Replaces Deployment with Rollout CRD	Watches existing Deployments
GitOps Integration	Argo CD native (same project)	Works with any GitOps tool
Traffic Control	Multiple meshes + ALB/NLB	Multiple meshes + ingress controllers
Experimentation	Built-in A/B/n testing	A/B testing via webhooks
Analysis	AnalysisTemplate/AnalysisRun CRDs	Inline metric thresholds
Rollback	Automatic on failed analysis	Automatic on threshold breach

Metric-Driven Promotion

The magic happens when deployment decisions are based on actual system behavior, not time-based guesses.

Key Metrics to Watch

Golden Signals: Latency, traffic, errors, saturation
Business Metrics: Conversion rates, checkout completion
Infrastructure Metrics: CPU, memory, disk I/O

Prometheus Integration Example

# Argo Rollouts: P99 latency check
- name: p99-latency
  interval: 5m
  successCondition: result[0] <= 200
  provider:
    prometheus:
      address: http://prometheus.monitoring
      query: |
        histogram_quantile(0.99,
          sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
        )

# Flagger: Error rate check
metrics:
- name: request-success-rate
  thresholdRange:
    min: 99.0
  interval: 1m

Adoption Path: From GitOps to Progressive Delivery

For teams already running Argo CD or Flux, the transition is gradual:

Phase 1: Observability Foundation

Ensure metrics are flowing (Prometheus/Grafana operational)
Define SLOs and error budgets
Set up alerting on key services

Phase 2: First Canary

Pick a non-critical service with good metrics coverage
Install Argo Rollouts or Flagger controller
Convert Deployment to Rollout/Canary (small team impact)

Phase 3: Expand Coverage

Roll out to more services
Refine analysis templates based on learnings
Add automated load testing in canary phase

Phase 4: Advanced Patterns

A/B/n testing for feature validation
Multi-region progressive rollouts
Chaos engineering integration

Integration with Argo CD

Argo Rollouts shines here because it's part of the same ecosystem:

# Application manifest with Rollout
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payment-service
  namespace: argocd
spec:
  project: production
  source:
    repoURL: https://github.com/org/gitops-repo
    targetRevision: HEAD
    path: apps/payment-service
  destination:
    server: https://kubernetes.default.svc
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

The Rollout resource is just another Kubernetes object — Argo CD manages it like any Deployment.

Common Pitfalls and How to Avoid Them

Insufficient Metrics Coverage

Problem: Canary proceeds based on partial data.
Solution: Require minimum metric samples before promotion decision.

Overly Aggressive Traffic Shifts

Problem: 50% traffic jump exposes too many users to issues.
Solution: Use smaller steps (5% → 10% → 25% → 50% → 100%).

Ignoring Cold Start Effects

Problem: New pods show artificially high latency initially.
Solution: Add warmup period or exclude initial metrics from analysis.

When to Choose Which

Choose Argo Rollouts if:

You're already using Argo CD
You want tight integration with your GitOps workflow
You need sophisticated experimentation (A/B/n testing)

Choose Flagger if:

You use Flux or another GitOps tool
You prefer keeping native Deployments
You want simpler, less invasive setup

Conclusion

Progressive delivery isn't just a safety net — it's a competitive advantage. Teams that deploy confidently multiple times per day recover faster from incidents, validate features with real traffic, and reduce the blast radius of bad changes.

The tooling is mature, the patterns are proven, and the integration with existing GitOps workflows is seamless. Whether you choose Argo Rollouts or Flagger, the important step is starting: pick a service, set up your first canary, and let data drive your deployment decisions.

GitOps gave us declarative infrastructure. Progressive delivery gives us declarative confidence in our deployments.

März 7, 2026März 7, 2026

WebAssembly Components: The Next Evolution of Cloud-Native Runtimes

Beyond the Browser: WebAssembly Goes Cloud-Native

WebAssembly started as a way to run high-performance code in browsers. But in 2026, Wasm is making its biggest leap yet — into cloud infrastructure, serverless platforms, and edge computing.

The promise: write once, run anywhere — but this time, it might actually work. Faster cold starts than containers, smaller footprints than VMs, and true polyglot interoperability. Let’s explore why WebAssembly Components are changing how we think about cloud-native runtimes.

The Container Problem

Containers revolutionized deployment, but they come with baggage:

Cold start times: Seconds to spin up, problematic for serverless
Image sizes: Hundreds of MBs for a simple service
Resource overhead: Each container needs its own OS libraries
Security surface: Full Linux userspace means more attack vectors

What if we could keep the isolation benefits while shedding the overhead?

Enter WebAssembly Components

WebAssembly modules are compact, sandboxed, and lightning-fast. But raw Wasm has limitations — no standard way to compose modules, limited system access, language-specific ABIs.

The Component Model fixes this:

WIT (WebAssembly Interface Types) — Language-agnostic interface definitions
Composability — Link components together like building blocks
Capability-based security — Fine-grained permissions, not all-or-nothing
WASI P2 — Standardized system interfaces (files, sockets, clocks)

The Stack

┌─────────────────────────────────────────┐
│         Your Application Logic          │
│    (Rust, Go, Python, JS, C#, etc.)     │
├─────────────────────────────────────────┤
│         WebAssembly Component           │
│      (WIT interfaces, composable)       │
├─────────────────────────────────────────┤
│              WASI P2 Runtime            │
│    (wasmtime, wasmer, wazero, etc.)     │
├─────────────────────────────────────────┤
│         Host Platform (any OS)          │
└─────────────────────────────────────────┘

Why This Matters: The Numbers

Metric	Container	Wasm Component
Cold start	500ms – 5s	1-10ms
Image size	50-500 MB	1-10 MB
Memory overhead	50+ MB baseline	< 1 MB baseline
Startup density	~100/host	~10,000/host

For serverless and edge computing, these differences are transformative.

WASI P2: The Missing Piece

WASI (WebAssembly System Interface) gives Wasm modules access to the outside world — but in a controlled way.

WASI P2 (Preview 2, now stable) introduces:

wasi:io — Streams and polling
wasi:filesystem — File access (sandboxed)
wasi:sockets — Network connections
wasi:http — HTTP client and server
wasi:cli — Command-line programs

The key insight: capabilities are passed in, not assumed. A component can only access what you explicitly grant.

# Grant only specific capabilities
wasmtime run --dir=/data::/app/data --env=API_KEY my-component.wasm

Production-Ready Platforms

Fermyon Spin

Serverless framework built on Wasm. Write handlers in any language, deploy with sub-millisecond cold starts.

# spin.toml
[component.api]
source = "target/wasm32-wasi/release/api.wasm"
allowed_http_hosts = ["https://api.example.com"]

[component.api.trigger]
route = "/api/..."

spin build && spin deploy

wasmCloud

Distributed application platform. Components communicate via capability providers — swap implementations without changing code.

Built-in service mesh (NATS-based)
Declarative deployments
Hot-swappable components

Cosmonic

Managed wasmCloud. Think „Kubernetes for Wasm“ but simpler.

Fastly Compute

Edge computing at massive scale. Wasm components run in 50+ global PoPs.

Polyglot Done Right

The Component Model’s superpower: true language interoperability.

Write your hot path in Rust, business logic in Go, and glue code in Python — they all compile to Wasm and link together seamlessly.

// WIT interface definition
package myapp:core;

interface calculator {
    add: func(a: s32, b: s32) -> s32;
    multiply: func(a: s32, b: s32) -> s32;
}

world my-service {
    import wasi:http/outgoing-handler;
    export calculator;
}

Generate bindings for any language, implement, compile, compose.

When to Use Wasm Components

Great Fit

Serverless functions — Cold starts matter
Edge computing — Size and startup matter even more
Plugin systems — Safe third-party code execution
Multi-tenant platforms — Strong isolation, high density
Embedded systems — Constrained resources

Not Yet Ready For

Heavy GPU workloads — No standard GPU access (yet)
Long-running stateful services — Designed for request/response
Legacy apps — Requires recompilation, not lift-and-shift

The Ecosystem in 2026

The tooling has matured significantly:

cargo-component — Rust → Wasm components
componentize-py — Python → Wasm components
jco — JavaScript → Wasm components
wit-bindgen — Generate bindings for any language
wasm-tools — Compose, inspect, validate components

Runtimes are production-ready:

Wasmtime — Bytecode Alliance reference runtime (fastest)
Wasmer — Focus on ease of use and embedding
WasmEdge — Optimized for cloud-native and AI
wazero — Pure Go, zero CGO dependencies

Getting Started

Try Spin — Easiest path to a running Wasm service

spin new -t http-rust my-service
cd my-service && spin build && spin up

Learn WIT — Understand the interface definition language
Explore wasmCloud — For distributed systems
Start small — One function, not your whole platform

Containers won’t disappear — but for the next generation of serverless, edge, and embedded applications, WebAssembly Components offer something containers can’t: instant startup, minimal footprint, and true portability without compromise.

März 6, 2026März 6, 2026

Confidential Computing: Running AI Workloads on Untrusted Infrastructure

The Trust Problem in AI-as-a-Service

As organizations rush to adopt AI, a critical question emerges: How do you protect sensitive training data and inference requests when they run on infrastructure you don’t fully control?

Whether you’re a healthcare provider processing patient data, a financial institution analyzing transactions, or an enterprise with proprietary models — the moment your data hits the cloud, you’re trusting someone else’s security. Traditional encryption protects data at rest and in transit, but during processing? It’s decrypted and vulnerable.

Enter Confidential Computing — the ability to process encrypted data without ever exposing it, even to the infrastructure operator.

How Confidential Computing Works

At its core, Confidential Computing creates hardware-enforced Trusted Execution Environments (TEEs) — isolated enclaves where code and data are protected from everything outside, including the hypervisor, host OS, and even physical access to the machine.

The Key Technologies

Intel TDX (Trust Domain Extensions) — VM-level isolation with encrypted memory, hardware-attested trust
AMD SEV-SNP (Secure Encrypted Virtualization – Secure Nested Paging) — Memory encryption with integrity protection against replay attacks
ARM CCA (Confidential Compute Architecture) — Realms-based isolation for ARM processors
NVIDIA Confidential Computing — GPU TEEs for accelerated AI workloads

The magic: cryptographic attestation proves to you — remotely and verifiably — that your workload is running in a genuine TEE with the exact code you intended.

Why This Matters for AI

AI workloads are uniquely sensitive:

Asset	Risk Without Protection
Training Data	PII exposure, regulatory violations, competitive intelligence leak
Model Weights	IP theft, model extraction attacks
Inference Requests	User privacy violations, business data exposure
Inference Results	Sensitive predictions leaked to adversaries

Confidential Computing addresses all four — your data is encrypted in memory, your model is protected, and neither the cloud provider nor a compromised admin can see what’s happening inside the TEE.

Practical Implementation: Confidential Containers

The good news: you don’t need to rewrite your applications. Confidential Containers bring TEE protection to standard Kubernetes workloads.

The Stack

┌─────────────────────────────────────────┐
│           Your AI Application           │
├─────────────────────────────────────────┤
│         Confidential Container          │
│    (encrypted memory, attested boot)    │
├─────────────────────────────────────────┤
│     Kata Containers / Cloud Hypervisor  │
├─────────────────────────────────────────┤
│         AMD SEV-SNP / Intel TDX         │
├─────────────────────────────────────────┤
│          Cloud Infrastructure           │
│    (untrusted - can't see inside TEE)   │
└─────────────────────────────────────────┘

Key Projects

Confidential Containers (CoCo) — CNCF sandbox project, integrates with Kubernetes
Kata Containers — Lightweight VMs as container runtime, TEE-enabled
Gramine — Library OS for running unmodified applications in Intel SGX
Occlum — Memory-safe LibOS for Intel SGX

Cloud Provider Support

All major clouds now offer Confidential Computing:

Azure — Confidential VMs (DCasv5/ECasv5), Confidential AKS, AMD SEV-SNP & Intel TDX
GCP — Confidential VMs, Confidential GKE Nodes, Confidential Space
AWS — Nitro Enclaves (different model), upcoming SEV-SNP support

Azure Example: Confidential AKS

az aks create \
  --resource-group myRG \
  --name myConfidentialCluster \
  --node-vm-size Standard_DC4as_v5 \
  --enable-confidential-computing

Your pods now run in AMD SEV-SNP protected VMs — with memory encryption enforced by hardware.

Attestation: Trust But Verify

How do you know your workload is actually running in a TEE? Remote Attestation.

The TEE generates a cryptographic quote — signed by the hardware itself — proving:

The hardware is genuine (not emulated)
The TEE firmware is unmodified
Your specific code/container image is loaded
No tampering occurred during boot

You verify this quote against the hardware vendor’s root of trust before sending any sensitive data.

# Example: Verify attestation before inference
attestation_quote = get_tee_attestation()
if verify_quote(attestation_quote, expected_measurement):
    response = send_inference_request(encrypted_data)
else:
    raise SecurityError("Attestation failed - TEE compromised")

Performance Considerations

Confidential Computing isn’t free:

Memory encryption overhead: 2-8% for SEV-SNP, varies by workload
Attestation latency: Milliseconds per verification (cache results)
Memory limits: TEE-protected memory may have size constraints
GPU support: Still maturing — NVIDIA H100 supports Confidential Computing, but ecosystem tooling is catching up

For most AI inference workloads, the overhead is acceptable. Training large models in TEEs remains challenging due to memory constraints.

Use Cases in Regulated Industries

Healthcare

Train diagnostic AI on patient data from multiple hospitals — no hospital sees another’s data, the model improves for everyone.

Finance

Run fraud detection models on transaction data without exposing transaction details to the cloud provider.

Multi-Party AI

Multiple organizations contribute data to train a shared model — Confidential Computing ensures no party can access another’s raw data.

Getting Started

Identify sensitive workloads — Not everything needs TEE protection; focus on regulated data and proprietary models
Choose your cloud — Azure has the most mature Confidential AKS offering today
Start with inference — Confidential inference is easier than confidential training
Implement attestation — Don’t skip verification; it’s the foundation of trust
Monitor performance — Measure overhead in your specific workload

Confidential Computing shifts the trust model fundamentally: instead of trusting your cloud provider’s policies and people, you trust silicon and cryptography. For AI workloads handling sensitive data, that’s a game-changer.

März 4, 2026März 6, 2026

Internal Developer Portals: Backstage, Port.io, and the Path to Self-Service Platforms

Platform Engineering: The 2026 Megatrend

The days when developers had to write tickets and wait for days for infrastructure are over. Internal Developer Portals (IDPs) are the heart of modern Platform Engineering teams — enabling self-service while maintaining governance.

Comparing the Contenders

Backstage (Spotify)

The open-source heavyweight from Spotify has established itself as the de facto standard:

Software Catalog — Central overview of all services, APIs, and resources
Tech Docs — Documentation directly in the portal
Templates — Golden paths for new services
Plugins — Extensible through a large community

Strength: Flexibility and community. Weakness: High setup and maintenance effort.

Port.io

The SaaS alternative for teams that want to be productive quickly:

No-Code Builder — Portal without development effort
Self-Service Actions — Day-2 operations automated
Scorecards — Production readiness at a glance
RBAC — Enterprise-ready access control

Strength: Time-to-value. Weakness: Less flexibility than open source.

Cortex

The focus is on service ownership and reliability:

Service Scorecards — Enforce quality standards
Ownership — Clear responsibilities
Integrations — Deep connection to monitoring tools

Strength: Reliability engineering. Weakness: Less developer experience focus.

Software Catalogs: The Foundation

An IDP stands or falls with its catalog. The core questions:

What do we have? — Services, APIs, databases, infrastructure
Who owns it? — Service ownership must be clear
What depends on what? — Dependency mapping for impact analysis
How healthy is it? — Scorecards for quality standards

Production Readiness Scorecards

Instead of saying „you should really have that,“ scorecards make standards measurable:

Service: payment-api
━━━━━━━━━━━━━━━━━━━━
✅ Documentation    [100%]
✅ Monitoring       [100%]
⚠️  On-Call Rotation [ 80%]
❌ Disaster Recovery [ 20%]
━━━━━━━━━━━━━━━━━━━━
Overall: 75% - Bronze

Teams see at a glance where action is needed — without anyone pointing fingers.

Integration Is Everything

An IDP is only as good as its integrations:

CI/CD — GitHub Actions, GitLab CI, ArgoCD
Monitoring — Datadog, Prometheus, Grafana
IaC — Terraform, Crossplane, Pulumi
Ticketing — Jira, Linear, ServiceNow
Cloud — AWS, GCP, Azure native services

The Cultural Shift

The biggest challenge isn’t technical — it’s the shift from gatekeeping to enablement:

Old (Gatekeeping)	New (Enablement)
„Write a ticket“	„Use the portal“
„We’ll review it“	„Policies are automated“
„Takes 2 weeks“	„Ready in 5 minutes“
„Only we can do that“	„You can, we’ll help“

Getting Started

The pragmatic path to an IDP:

Start small — A software catalog alone is valuable
Pick your battles — Don’t automate everything at once
Measure adoption — Track portal usage
Iterate — Take developer feedback seriously

Platform Engineering isn’t a product you buy — it’s a capability you build. IDPs are the visible interface to that capability.

Februar 23, 2026Februar 23, 2026

Agentic AI in the SDLC: From Copilot to Autonomous DevOps

The Evolution Beyond AI-Assisted Development

We’ve all gotten comfortable with AI assistants in our IDEs. Copilot suggests code, ChatGPT explains errors, and various tools help us write tests. But there’s a fundamental shift happening: AI is moving from assistant to agent.

The difference? An assistant waits for your prompt. An agent takes initiative.

What Does „Agentic AI“ Mean for the SDLC?

Traditional AI in development is reactive. You ask a question, you get an answer. Agentic AI is different—it operates with goals, not just prompts:

Planning — Breaking complex tasks into actionable steps
Tool Use — Interacting with APIs, CLIs, and infrastructure directly
Reasoning — Making decisions based on context and constraints
Persistence — Maintaining state across multiple interactions
Self-Correction — Detecting and recovering from errors

Imagine telling an AI: „We need a new microservice for payment processing with PostgreSQL, deployed to our EU cluster, with proper security policies.“ An agentic system doesn’t just write the code—it provisions the database, creates the Kubernetes manifests, configures network policies, sets up monitoring, and opens a PR for review.

The Architecture of Agentic DevSecOps

Building autonomous AI into your SDLC requires more than just API keys. You need infrastructure designed for agent operations:

1. Agent-Native Infrastructure

AI agents need first-class platform support:

apiVersion: platform.example.io/v1
kind: AIAgent
metadata:
  name: infra-provisioner
spec:
  provider: anthropic
  model: claude-3
  mcpEndpoints:
    - kubectl
    - crossplane-claims
    - argocd
  rbacScope: namespace/dev-team
  rateLimits:
    requestsPerMinute: 30
    resourceClaims: 5

This isn’t hypothetical—it’s where platform engineering is heading. Agents as managed workloads with proper RBAC, quotas, and audit trails.

2. Multi-Layer Guardrails

Autonomous AI requires autonomous safety. A five-layer approach:

Input Validation — Schema enforcement, prompt injection detection
Action Scoping — Resource limits, allowed operations whitelist
Human Approval Gates — Critical actions require sign-off
Audit Logging — Every agent action traceable and reviewable
Rollback Capabilities — Automated recovery from failed operations

The goal: let agents move fast on routine tasks while maintaining human oversight where it matters.

3. GitOps-Native Agent Operations

Every agent action should be a Git commit. Database provisioned? That’s a Crossplane claim in a PR. Deployment scaled? That’s a manifest change with full history. This gives you:

Complete audit trail
Easy rollback (git revert)
Review workflows for sensitive changes
Drift detection (desired state vs. actual)

Real-World Agent Workflows

Here’s what becomes possible:

Scenario: Production Incident Response

Alert fires: „Payment service latency > 500ms“
Agent analyzes metrics, traces, and recent deployments
Identifies: database connection pool exhaustion
Creates PR: increase pool size + add connection timeout
Runs canary deployment to staging
Notifies on-call engineer for production approval
After approval: deploys to production, monitors recovery

Time from alert to fix: minutes, not hours.

Scenario: Developer Self-Service

Developer: „I need a PostgreSQL database for my new service, small size, EU region, with daily backups.“

Agent:

Creates Crossplane Database claim
Provisions via the appropriate cloud provider
Configures External Secrets for credentials
Adds Prometheus ServiceMonitor
Updates team’s resource inventory
Responds with connection details and docs link

No tickets. No waiting. Full compliance.

The Security Imperative

With great autonomy comes great responsibility. Agentic systems in your SDLC must be security-first by design:

Zero Trust — Agents authenticate for every action, no ambient authority
Least Privilege — Granular RBAC scoped to specific resources and operations
No Secrets in Prompts — Credentials via Vault/External Secrets, never in context
Network Isolation — Agent workloads in dedicated, policy-controlled namespaces
Immutable Audit — Every action logged to tamper-evident storage

Getting Started

You don’t need to build everything at once. A pragmatic path:

Start with observability — Let agents read metrics and logs (no write access)
Add diagnostic capabilities — Agents can analyze and recommend, humans execute
Enable scoped automation — Agents can act within strict guardrails (dev environments first)
Expand with trust — Gradually increase scope based on demonstrated reliability

The Future is Agentic

The SDLC has always been about automation—from compilers to CI/CD to GitOps. Agentic AI is the next layer: automating the decisions, not just the execution.

The organizations that figure this out first will ship faster, respond to incidents quicker, and let their engineers focus on the creative work that humans do best.

The question isn’t whether to adopt agentic AI in your SDLC. It’s how fast you can build the infrastructure to do it safely.

This is part of our exploration of AI-native platform engineering at it-stud.io. We’re building open-source tooling for agentic DevSecOps—follow along on GitHub.

Februar 21, 2026Februar 21, 2026

AI Observability: Why Your AI Agents Need OpenTelemetry

The Black Box Problem in AI Agents

When you deploy an AI agent in production, you’re essentially running a complex system that makes decisions, calls external APIs, processes data, and interacts with users—all in ways that can be difficult to understand after the fact. Traditional logging tells you that something happened, but not why or how long or at what cost.

For LLM-based systems, this opacity becomes a serious operational challenge:

Token costs can spiral without visibility into per-request usage
Latency issues hide in the pipeline between prompt and response
Tool calls (file reads, API requests, code execution) happen invisibly
Context window management affects quality but rarely surfaces in logs

The answer? Observability—specifically, distributed tracing designed for AI workloads.

OpenTelemetry: The Standard not only for AI Observability

OpenTelemetry (OTEL) has emerged as the industry standard for collecting telemetry data—traces, metrics, and logs—from distributed systems. What makes it particularly powerful for AI applications:

Traces Show the Full Picture

A single user message to an AI agent might trigger:

Webhook reception from Telegram/Slack
Session state lookup
Context assembly (system prompt + history + tools)
LLM API call to Anthropic/OpenAI
Tool execution (file read, web search, code run)
Response streaming back to user

With OTEL traces, each step becomes a span with timing, attributes, and relationships. You can see exactly where time is spent and where failures occur.

Metrics for Cost Control

OTEL metrics give you counters and histograms for:

tokens.input / tokens.output per request
cost.usd aggregated by model, channel, or user
run.duration_ms to track response latency
context.tokens to monitor context window usage

This transforms AI spend from „we used $X this month“ to „user Y’s workflow Z costs $0.12 per run.“

Practical Setup: OpenClaw + Jaeger

At it-stud.io, we tested OpenClaw as our AI agent framework – already supporting OTEL by default – and enabled full observability with a simple configuration change:

{
  "plugins": {
    "allow": ["diagnostics-otel"],
    "entries": {
      "diagnostics-otel": { "enabled": true }
    }
  },
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://localhost:4318",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "sampleRate": 1.0
    }
  }
}

For the backend, we chose Jaeger—a CNCF-graduated project that provides:

OTLP ingestion (HTTP on port 4318)
Trace storage and search
Clean web UI for exploration
Zero external dependencies (all-in-one binary)

What You See: Real Traces from AI Operations

Once enabled, every AI interaction generates rich telemetry:

openclaw.model.usage

Provider, model name, channel
Input/output/cache tokens
Cost in USD
Duration in milliseconds
Session and run identifiers

openclaw.message.processed

Message lifecycle from queue to response
Outcome (success/error/timeout)
Chat and user context

openclaw.webhook.processed

Inbound webhook handling per channel
Processing duration
Error tracking

From Tracing to AI Governance

Observability isn’t just about debugging—it’s the foundation for:

Cost Allocation

Attribute AI spend to specific projects, users, or workflows. Essential for enterprise deployments where multiple teams share infrastructure.

Compliance & Auditing

Traces provide an immutable record of what the AI did, when, and why. Critical for regulated industries and internal governance.

Performance Optimization

Identify slow tool calls, optimize prompt templates, right-size model selection based on actual latency requirements.

Capacity Planning

Metrics trends inform scaling decisions and budget forecasting.

Getting Started

If you’re running AI agents in production without observability, you’re flying blind. The good news: implementing OTEL is straightforward with modern frameworks.

Our recommended stack:

Instrumentation: Framework-native (OpenClaw, LangChain, etc.) or OpenLLMetry
Collection: OTEL Collector or direct OTLP export
Backend: Jaeger (simple), Grafana Tempo (scalable), or Langfuse (LLM-specific)

The investment is minimal; the visibility is transformative.

At it-stud.io, we help organizations build observable, governable AI systems. Interested in implementing AI observability for your team? Get in touch.