DevOps – it-stud.io

Mai 15, 2026Mai 18, 2026

Internal Developer Portals 2.0: How AI Copilots Inside Backstage and Port Are Transforming Developer Self-Service

Internal Developer Portals have spent the past three years earning their place in the platform engineering stack. Backstage — now a CNCF Graduated project — established the blueprint: a service catalog, software templates, TechDocs, and a plugin ecosystem exceeding 900 integrations. For many organizations, that was enough. But static catalogs have hit a ceiling. Developers still context-switch between Backstage, Slack, their IDE, and a dozen dashboards to scaffold a service, request infrastructure, or troubleshoot an incident. The portal that was supposed to unify developer experience became just another tab.

2025 and 2026 have introduced a different paradigm: AI copilots embedded directly inside IDPs. Not chatbots bolted onto the side, but intelligent agents that understand your service catalog, your golden paths, and your organizational policies — and let developers interact with infrastructure through natural language instead of form-driven UIs. This is Internal Developer Portals 2.0, and it changes the economics of platform engineering.

From Catalog-Centric to Action-Centric Portals

The first generation of IDPs was catalog-centric. You browsed a list of services, looked up ownership, maybe triggered a pre-built template. The developer experience was better than nothing, but it still required knowing where to click and which template to use. For a senior engineer who helped build the portal, that was fine. For a new hire on day three, it was another maze.

Action-centric IDPs flip the model. Instead of navigating a catalog hierarchy, a developer types:

"Deploy my payment-service to staging with the new database migration"

The AI copilot inside the portal understands the intent, resolves the service from the catalog, identifies the correct deployment pipeline, checks RBAC policies, and either executes or presents a confirmation step. The catalog is still there — it’s the knowledge backbone — but the interaction layer has fundamentally changed.

This isn’t speculative. Port has shipped an AI assistant that queries its internal software catalog and executes self-service actions through natural language. Cortex integrates LLM-driven recommendations directly into its scorecards. Humanitec has taken an API-first approach that makes AI orchestration a first-class integration pattern. Even Backstage itself is seeing community plugins that expose catalog data to AI agents via standardized protocols.

The Knowledge Graph Advantage

What makes IDP-embedded AI fundamentally different from a generic ChatGPT wrapper is context. An Internal Developer Portal already holds a rich knowledge graph:

Service dependencies: which services call which, what databases they use, what message queues connect them
Team ownership: who owns what, who’s on-call, escalation paths
Runbooks and documentation: operational playbooks indexed per service
Deployment history: what was deployed when, by whom, with what configuration
Scorecards: production readiness, security posture, cost allocation

When an AI copilot has access to this graph, its responses move from generic to surgical. Ask it "Why is checkout-service latency spiking?" and it can correlate recent deployments, check the dependency graph for upstream changes, pull relevant runbooks, and suggest specific remediation steps — all without the developer leaving the portal.

Compare this to ChatOps bots in Slack that operate with minimal context, or IDE-integrated copilots that understand your code but not your infrastructure. The IDP sits at the intersection of code, infrastructure, and organizational knowledge. That’s where AI adds the most leverage.

MCP Servers: The Bridge Between AI Agents and IDP APIs

The technical glue making this possible is increasingly the Model Context Protocol (MCP). Originally open-sourced by Anthropic in 2024 and now seeing broad adoption, MCP provides a standardized interface for AI agents to discover and invoke tools — including IDP APIs.

An MCP server wrapping your Backstage or Port API exposes capabilities like:

Querying the service catalog (list services, get owners, check dependencies)
Triggering software templates (scaffold a new microservice, provision a database)
Reading and updating scorecards
Executing self-service actions (deploy, rollback, scale)
Fetching TechDocs and runbooks for context

This decouples the AI layer from the IDP implementation. Your platform team maintains the MCP server as a thin adapter. The AI agent — whether it’s embedded in the portal UI, accessible via Slack, or running inside an IDE — connects through the same protocol. You get a single source of truth for what actions are available and what permissions govern them.

For teams already running Backstage, this is particularly powerful. The existing plugin ecosystem handles data aggregation; an MCP server adds an AI-native interaction layer on top without replacing the portal itself.

Scorecards Meet AI Analysis

Scorecards have been one of the quiet successes of the IDP movement. Tools like Backstage (via the Scorecards plugin), Port, and Cortex let platform teams define maturity criteria — production readiness, security compliance, documentation coverage, cost efficiency — and track every service against them.

AI transforms scorecards from passive dashboards into active recommendation engines:

Service maturity gaps: „Your order-service scores 62% on production readiness. Adding health check endpoints and configuring pod disruption budgets would bring it to 85%.“
Security posture: „Three services in the payments domain are running container images older than 90 days. Here are the specific CVEs affecting them.“
Cost optimization: „Based on CPU utilization patterns over the last 30 days, analytics-worker is over-provisioned by 3x. Recommended resource requests: 200m CPU, 256Mi memory.“

The shift is from „here’s your score“ to „here’s what to do about it.“ When combined with self-service actions, the AI can even generate the pull request to implement the recommendation — turning insight into action in a single interaction.

Dynamic Golden Paths: AI-Generated Templates

Golden paths — the blessed, paved roads for common developer tasks — have traditionally been static. Your platform team creates a service template, a database provisioning workflow, a CI/CD pipeline configuration. Developers pick from the menu.

AI-powered IDPs make golden paths dynamic. Instead of maintaining 15 slightly different service templates for different tech stacks and deployment targets, you maintain a smaller set of composable building blocks. The AI assembles them based on the developer’s intent:

"I need a new Go microservice with a PostgreSQL database, deployed to our EU region, with PII data handling compliance"

The copilot generates a tailored template that includes the correct Helm values for the EU cluster, enables encryption-at-rest annotations for PII compliance, configures the appropriate network policies, and sets up the CI/CD pipeline with the required security scanning stages. The golden path isn’t a fixed road anymore — it’s a GPS that calculates the route based on where you’re going.

This has real implications for template maintenance. Platform teams spend significant effort keeping templates current across Kubernetes versions, policy changes, and infrastructure updates. AI-generated templates that compose from maintained primitives reduce that burden substantially.

Incident Response: The Killer Use Case

If there’s a single scenario where IDP-embedded AI proves its ROI overnight, it’s incident response. Consider the typical flow today:

Alert fires in PagerDuty or Opsgenie
On-call engineer opens the monitoring dashboard
Checks recent deployments in the CI/CD tool
Looks up service ownership in the IDP
Searches for relevant runbooks
Correlates with dependency graph to identify blast radius
Begins remediation

Steps 2 through 6 are pure context gathering — and they happen under pressure at 3 AM. An AI agent inside the IDP can perform all of them in seconds:

Correlate the alert with the service catalog entry
Identify recent changes (deployments, config updates, dependency upgrades)
Pull the relevant runbook and highlight the most likely remediation steps
Map the blast radius through the dependency graph
Suggest or auto-execute a rollback if the confidence is high enough

The on-call engineer still makes the decision, but the mean time to context drops from 15 minutes to 15 seconds. For organizations running hundreds of microservices, that’s not a nice-to-have — it’s a competitive advantage.

Developer Experience Metrics: Measuring What Matters

AI-powered IDPs also change how we measure developer experience. The DORA metrics (deployment frequency, lead time for changes, change failure rate, mean time to recovery) and the SPACE framework (satisfaction, performance, activity, communication, efficiency) are becoming first-class citizens in IDP dashboards.

The AI layer adds predictive and diagnostic capabilities:

Trend analysis: „Deployment frequency for the checkout team has dropped 30% over the past sprint. The primary bottleneck appears to be flaky integration tests in the payment-gateway pipeline.“
Correlation: „Teams using the v3 service template have 40% lower change failure rates than those on v2. Consider migrating remaining v2 services.“
Forecasting: „Based on current velocity, the platform migration will complete in Q3 — two weeks later than planned. The blocker is database schema migrations for three legacy services.“

This is where platform engineering ROI becomes measurable. When you can demonstrate that AI-assisted self-service reduces time-to-production for new services from five days to four hours, the investment case writes itself.

Backstage’s Plugin Ecosystem vs. AI-Native Platforms

The market is splitting into two camps, and platform teams need to understand the tradeoffs:

Dimension	Backstage + AI Plugins	AI-Native Platforms (Port, Cortex)
Flexibility	900+ plugins, infinite customization	Fewer but deeper integrations
AI integration	Community-driven, via MCP/plugins	Built-in, first-class AI features
Maintenance burden	High (self-hosted, plugin compatibility)	Lower (SaaS, managed updates)
Data ownership	Full control (self-hosted)	Vendor-dependent
Time to value	Weeks to months	Days to weeks
Vendor lock-in	Low (CNCF, open source)	Moderate to high
Knowledge graph depth	As deep as you build it	Pre-built entity models

Neither approach is universally better. If your organization has strong platform engineering capacity and wants full control, Backstage with AI plugins and MCP servers gives you maximum flexibility. If you want faster time-to-value and your team is lean, an AI-native platform like Port gets you to production-grade IDP faster — at the cost of some flexibility and data sovereignty.

ChatOps vs. IDP-Embedded AI vs. IDE Copilots

It’s worth clarifying where IDP-embedded AI fits relative to other AI integration points:

ChatOps (Slack/Teams bots): Good for notifications and simple commands. Limited context about your infrastructure. Works well for quick queries but struggles with complex multi-step workflows.
IDE-integrated copilots (GitHub Copilot, Cursor): Excellent for code generation. No awareness of your deployment topology, service catalog, or organizational policies. Wrong tool for infrastructure tasks.
IDP-embedded AI: Sits at the intersection of organizational knowledge, infrastructure state, and developer workflows. Best for self-service actions, incident response, and cross-cutting concerns that span multiple services.

The ideal setup uses all three — but the IDP is the orchestration layer. Your Slack bot calls the IDP’s AI capabilities through MCP. Your IDE copilot references the service catalog for context. The IDP is the brain; everything else is an interface.

Multi-Tenancy, RBAC, and Governance

Here’s where many early AI-in-IDP implementations fall short: governance. When an AI agent can trigger deployments, modify infrastructure, or scaffold services, you need the same (or stricter) access controls as your existing self-service workflows.

Critical requirements:

RBAC for AI actions: The AI copilot should inherit the requesting user’s permissions, not operate with elevated privileges
Audit trails: Every AI-initiated action must be logged with the full context — who asked, what was requested, what was executed, what was the outcome
Approval gates: Destructive or high-risk actions (production deployments, database migrations, security policy changes) should require human approval, even when AI-initiated
Multi-tenancy: In organizations with multiple teams sharing an IDP, AI actions must respect tenant boundaries. Team A’s copilot cannot access Team B’s secrets or deploy to Team B’s namespaces
Rate limiting: Prevent AI agents from executing runaway loops of infrastructure changes

Without these controls, you’re trading developer friction for security risk — not a trade worth making.

The Risks: What Can Go Wrong

Let’s be direct about the failure modes:

Hallucinated infrastructure configurations: An AI that generates a Kubernetes manifest with incorrect resource limits, missing security contexts, or wrong network policies can cause outages. Every AI-generated configuration must pass through the same validation pipelines (OPA/Kyverno, CI checks) as human-authored configs.
Insufficient audit trails: If an AI agent makes a change and the audit log only shows „AI modified resource X,“ you’ve lost forensic capability. Log the full chain: user prompt → AI interpretation → action taken → result.
Shadow IT acceleration: If self-service becomes too easy, developers spin up resources without proper tagging, cost allocation, or lifecycle management. AI-powered IDPs need to enforce organizational policies at the point of creation, not after the fact.
Over-reliance on AI recommendations: Scorecard suggestions and incident response playbooks should augment human judgment, not replace it. Build a culture where AI recommendations are validated, not blindly accepted.

Getting Started: A Practical Roadmap

If you’re running Backstage today and want to add AI capabilities, here’s a pragmatic path:

Start with read-only: Build an MCP server that exposes your service catalog, scorecards, and documentation to an AI agent. Let developers query the catalog through natural language. Zero risk, immediate value.
Add scorecard analysis: Connect the AI to your scorecard data and let it generate improvement recommendations. Still read-only, but now actively useful.
Enable template generation: Allow the AI to compose software templates based on developer intent. Route the output through your existing PR review process.
Introduce action execution: Wire up deployment, scaling, and provisioning actions with approval gates. Start with non-production environments.
Extend to incident response: Connect alerting systems and let the AI perform context gathering and remediation suggestions during incidents.

Each step builds on the previous one, and each can be rolled back independently. The key is maintaining human oversight throughout — AI copilots augment your platform team, they don’t replace it.

The Bottom Line

Internal Developer Portals 2.0 aren’t about replacing Backstage or rebuilding your IDP from scratch. They’re about adding an intelligence layer that transforms the portal from a passive catalog into an active assistant. The service catalog becomes a knowledge graph. Templates become dynamic. Scorecards become recommendation engines. Incident response becomes proactive.

The organizations that get this right will see measurable improvements in developer productivity, onboarding speed, and operational resilience. The ones that don’t will keep maintaining static portals that developers tolerate rather than love.

The technology is ready. The protocols (MCP) are standardizing. The question isn’t whether AI belongs in your IDP — it’s how quickly you can integrate it without compromising the governance that makes your platform trustworthy.

April 30, 2026Mai 3, 2026

Small Language Models for Platform Engineering: Why 8B Parameters Beat API Dependencies

The economics of AI in platform engineering are shifting — fast. For the past two years, the default answer to „how do we add AI to our internal platform?“ has been „call an API.“ But with inference costs rising, data governance getting stricter, and a new generation of compact models matching much larger counterparts on critical benchmarks, that default is worth questioning. Small Language Models (SLMs) — particularly in the 7B–9B parameter range — have reached a threshold where they can handle the majority of platform engineering workloads without ever leaving your network.

The Benchmark Reality Check: 8B Is Not a Compromise

IBM’s Granite 4.1 8B, released in April 2026 under Apache 2.0, is a useful anchor for this conversation. On enterprise coding benchmarks, the 8B model matches IBM’s own 32B Mixture-of-Experts (MoE) variant. On HumanEval pass@1, the 8B scores 87.2% compared to 89.6% for the 30B model — a gap of less than 3 percentage points that is largely irrelevant for the deterministic, constrained tasks that platform teams actually run.

This pattern holds across the SLM landscape:

Phi-4 (14B) — Microsoft’s model excels at reasoning-heavy tasks, punching well above its weight on MATH and GPQA
Qwen-3 (8B) — Strong multilingual coding support, excellent for polyglot infrastructure codebases
Llama-3.3 (8B) — Meta’s workhorse, widely supported across inference frameworks
Mistral-Small (22B) — A good middle ground when you need more capacity without the frontier price tag

The takeaway: if you are still reaching for GPT-4 or Claude Sonnet to answer „why is this Helm chart failing?“ you are likely overspending.

Dense Non-Thinking Architecture: Why It Matters for Operations

Granite 4.1 uses what IBM calls a Dense Non-Thinking Architecture. In practice, this means the model does not execute an internal chain-of-thought (CoT) reasoning step before responding. For frontier models solving novel math problems, CoT is valuable. For a platform engineer asking „summarize this PagerDuty alert and suggest the top three actions,“ CoT overhead is pure latency and token cost with zero benefit.

Platform tasks are largely pattern-matching with context, not novel reasoning. Alert triage, PR description generation, runbook execution, code review comments — these are well-defined, repetitive, structured tasks where a fast, confident response beats a slow, deeply deliberative one. Dense models optimized for inference speed are a natural fit.

The FinOps Case: What Self-Hosting an 8B Model Actually Costs

Let’s put numbers on this. A mid-tier platform team might generate 50,000 LLM calls per month for internal tooling: PR review summaries, alert enrichment, documentation queries, CI/CD pipeline diagnostics.

At $0.002 per 1K tokens (input + output average), 50,000 calls at ~500 tokens each = $50/month in API costs. Manageable — until agents arrive.

Agentic workflows are not single API calls. A single „investigate this alert“ agent might issue 15–25 tool calls, each with full context. That same 50,000-event scenario becomes 750,000–1,250,000 LLM calls. At $0.002/1K tokens, that is now $1,500–$2,500/month — and growing linearly with adoption.

Self-hosting an 8B model on a single RTX 4090 (~$1,800 hardware) or a Mac Studio M4 Max (~$2,000) delivers:

~30–50 tokens/second throughput (sufficient for internal tooling)
Zero marginal cost per call after hardware amortization
Full data residency — no tokens leave your network
Instant availability without rate limits or provider outages

At an agentic scale, the hardware pays for itself within 1–2 months. Beyond that, it is pure savings.

Platform Engineering Use Cases Where SLMs Shine

1. Alert Triage and Runbook Execution

The HolmesGPT pattern (CNCF Sandbox) demonstrates the right approach: give an SLM access to kubectl, PromQL, and Loki, and a structured Markdown runbook. With a well-crafted runbook, tool calls per investigation drop from 16+ to 2–4. An 8B model running locally handles this at millisecond latency with no data leaving the cluster.

2. CI/CD Pipeline Assistance

PR description generation, test coverage summaries, changelog drafting — these are low-complexity, high-volume tasks. An SLM integrated directly into your CI/CD pipeline (via Ollama’s REST API or a vLLM endpoint) can run as a pipeline step without any external dependency. No API key rotation. No rate limiting during a big release crunch.

3. Code Review Comments

Automated first-pass code review — style enforcement, security pattern flagging, documentation gaps — is exactly the kind of task where an 8B model is sufficient. The model does not need to understand your entire business domain; it needs to apply consistent rules to code diffs. Fine-tuning on your internal codebase further improves relevance.

4. Documentation and Runbook Generation

Keeping runbooks current is a perennial platform team pain point. An SLM that can read infrastructure-as-code, observe recent incident patterns, and generate or update Markdown documentation solves a real operational problem — without requiring a cloud API call for every update.

Enterprise Trust: Granite’s Compliance Credentials

IBM Granite 4.1 ships with two features that matter disproportionately in regulated industries: Guardian Models and cryptographic signing.

Guardian Models are companion classifiers that can check model inputs and outputs for compliance — harmful content, PII exposure, prompt injection attempts. This is built into the model ecosystem, not bolted on afterward. For financial services or healthcare platform teams, this is a significant differentiator versus a generic open-source model.

The cryptographic signing (with ISO certification) means you can verify model provenance. In an era where supply chain security is central to platform governance (see SLSA, Sigstore, in-toto), being able to verify that the model running in your cluster is exactly the model IBM published is not a minor detail.

The Multi-Model Strategy: SLM + Cloud for 80/20 Coverage

The most practical approach is not „replace all cloud APIs with SLMs“ — it is to route intelligently:

~80% of tasks → Local SLM: Alert triage, CI/CD assistance, doc generation, code review, runbook execution, structured queries against internal data
~20% of tasks → Cloud frontier model: Novel architecture decisions, complex multi-step reasoning, tasks requiring broad world knowledge not captured in your fine-tuned model

This mirrors how mature platform teams already think about compute: use the right tool at the right cost tier. An internal platform that routes requests based on complexity signals (task type, token budget, confidence threshold) gives you both cost efficiency and capability headroom.

Getting Started: Self-Hosting in the Platform Engineering Stack

The barrier to running an 8B model is lower than most teams expect:

Ollama — Single-command model serving, REST API, model library with one-line pulls (ollama pull granite3.3:8b)
LM Studio — Desktop GUI for evaluation, good for initial benchmarking before committing to infrastructure
vLLM — Production-grade serving with OpenAI-compatible API, batching, and quantization support; the right choice for Kubernetes-native deployments

For Kubernetes, vLLM running as a Deployment with a GPU node selector and an HPA on request queue depth is a reasonable production starting point. Pair it with an OpenAI-compatible API shim and your existing LLM-integrated tooling requires zero code changes to switch endpoints.

The Connection to Agentic Infrastructure

The Agentic Compute Cliff is real: GitHub Copilot paused new signups in April 2026 due to capacity constraints, and multiple cloud providers are experiencing GPU shortages. As agentic workloads scale — where a single developer workflow might trigger hundreds of LLM calls per hour — dependency on cloud inference is a reliability and cost risk.

SLMs running on internal infrastructure are not just a cost play. They are a resilience play. Your internal platform keeps working when the cloud provider has an outage. Your agents are not rate-limited during a major incident response. Your data never transits a network boundary you do not control.

When 8B Is Not Enough

Intellectual honesty matters here. SLMs are not the answer for everything:

Novel architecture decisions requiring broad reasoning across domains
Complex multi-step debugging across large, unfamiliar codebases
Tasks requiring deep world knowledge beyond your training/fine-tuning window
High-stakes customer-facing generation where quality variance is unacceptable

The skill is in classification — building a platform that knows when to route locally and when to escalate to a frontier model. That routing logic, often just a simple task classifier, is itself a good candidate to run on a local SLM.

Conclusion: Make the Economics Argument

The conversation about SLMs in platform engineering is no longer theoretical. The benchmarks have arrived. The tooling (Ollama, vLLM, LM Studio) is mature. The hardware cost is justified within months at agentic scale. And the privacy and compliance benefits — data residency, Guardian Models, cryptographic provenance — increasingly matter as organizations bring AI deeper into their software delivery lifecycle.

The 8B parameter class is not a compromise. It is a deliberate choice that aligns cost, performance, privacy, and operational simplicity for the tasks that platform teams actually run. Start with one use case — alert triage is a natural first target — measure the results, and expand from there. The API dependency you are paying for today may be entirely optional.

April 1, 2026April 9, 2026

Code Knowledge Graphs: Semantic Search for AI Coding Agents

AI coding tools have revolutionized software development, but there’s a fundamental limitation hiding in plain sight: most AI agents don’t actually understand your codebase—they just search it. When you ask Claude Code, Cursor, or GitHub Copilot to refactor a function, they retrieve relevant file chunks using embedding similarity. But code isn’t a collection of independent text fragments. It’s a graph of interconnected symbols, call hierarchies, and dependencies.

A new generation of tools is changing this paradigm. By parsing repositories into knowledge graphs and exposing them via MCP (Model Context Protocol), projects like Codebase-Memory, CodeGraph, and Lattice give AI agents structural awareness—enabling call-graph traversal, impact analysis, and semantic queries with sub-millisecond latency.

The RAG Problem: Why File-Based Retrieval Falls Short

Traditional RAG (Retrieval-Augmented Generation) pipelines treat codebases as document collections. They chunk files, generate embeddings, and retrieve the most similar fragments when an agent needs context. This approach has critical limitations for code:

Scattered evidence: Function definitions get split across chunks, separating signatures from implementations and losing import context.
Semantic blindness: Vector similarity doesn’t understand call relationships. A function and its callers may embed to distant vectors despite being tightly coupled.
Context window pressure: Complex queries requiring multi-file context quickly exhaust token budgets, forcing truncation of relevant code.
No impact awareness: When modifying a function, RAG can’t tell you which downstream components will break.

The result? AI agents that confidently generate code changes without understanding the ripple effects through your architecture.

Enter Code Knowledge Graphs

Knowledge graphs offer a fundamentally different approach: instead of treating code as text to embed, they parse it into structured relationships. Every function, class, import, and call site becomes a node in a traversable graph. This enables queries that RAG simply cannot answer:

„What functions call processPayment()?“ — Direct graph traversal, not similarity search.
„Show me the impact radius if I change the User interface.“ — Transitive dependency analysis.
„Find all implementations of the Repository pattern.“ — Semantic pattern matching across the codebase.

The key enabler is Tree-Sitter, a parsing library that generates abstract syntax trees (ASTs) for 66+ programming languages. By walking these ASTs, tools can extract symbols, relationships, and structural information without language-specific parsers.

Codebase-Memory: The MCP-Native Approach

Codebase-Memory has emerged as a leading implementation, garnering 900+ GitHub stars since its February 2026 release. It parses repositories with Tree-Sitter and stores the resulting knowledge graph in SQLite, then exposes 14 MCP query tools for AI agents:

Tool	Purpose
`get_symbol`	Retrieve a symbol’s definition, docstring, and location
`get_callers`	Find all functions that call a given symbol
`get_callees`	List all functions called by a symbol
`get_impact_radius`	Transitive analysis of what breaks if a symbol changes
`semantic_search`	Natural language queries over the graph
`get_module_structure`	Hierarchical view of a module’s exports

The performance gains are substantial. Codebase-Memory reports 10x lower token costs compared to file-based retrieval—agents get precisely the context they need without padding prompts with irrelevant code. Query latency runs in sub-milliseconds, even on large repositories.

CodeGraph and token-codegraph: Multi-Language Support

CodeGraph, originally a TypeScript project by Colby McHenry, pioneered the concept of exposing code structure via MCP. Its Rust port, token-codegraph, extends support to Rust, Go, Java, and Scala. Key features include:

libsql storage with FTS5 full-text search for hybrid queries
Incremental syncing for fast re-indexing on file changes
JSON-RPC over stdio for seamless MCP integration
Zero external dependencies—runs entirely locally

The local-first architecture matters for enterprise adoption. Unlike cloud-based code intelligence (Sourcegraph, GitHub Code Search), these tools keep your proprietary code on-premises while still enabling AI-powered navigation.

Lattice: Beyond Syntax to Intent

Lattice takes a different approach by connecting code to its reasoning. Its knowledge graph spans four dimensions:

Research: Background investigation, technical spikes, competitor analysis
Strategy: Architecture decisions, trade-off evaluations, design rationale
Requirements: User stories, acceptance criteria, constraints
Implementation: The actual code and its structural relationships

This enables queries that pure code graphs can’t answer: „Why did we choose PostgreSQL over MongoDB for this service?“ or „What requirements drove the decision to make this component async?“

For AI agents, this context is invaluable. When tasked with extending a feature, they can trace back to the original requirements and strategic decisions rather than guessing from code patterns alone.

Integration Patterns for DevOps Teams

Adopting code knowledge graphs requires integrating them into your existing AI coding workflows:

1. CI/CD Graph Updates

Run graph indexing as part of your pipeline. On each merge to main:

- name: Update Code Knowledge Graph
  run: |
    codebase-memory index --repo . --output graph.db
    codebase-memory serve --port 3001 &

This ensures AI agents always query against the latest codebase structure.

2. MCP Server Configuration

Configure your AI coding tool to connect to the graph server. For Claude Code:

{
  "mcpServers": {
    "codebase": {
      "command": "codebase-memory",
      "args": ["serve", "--db", "./graph.db"]
    }
  }
}

3. Impact Analysis in PR Reviews

Use graph queries to automatically flag high-impact changes:

changed_functions=$(git diff --name-only | xargs codebase-memory changed-symbols)
for fn in $changed_functions; do
  impact=$(codebase-memory get-impact-radius "$fn" --depth 3)
  echo "## Impact Analysis: $fn" >> pr-comment.md
  echo "$impact" >> pr-comment.md
done

Benchmarks: Knowledge Graphs vs. RAG

Recent research validates the knowledge graph approach. On SWE-bench Verified—a benchmark where AI agents resolve real GitHub issues—systems using repository-level graphs significantly outperform pure RAG approaches:

Approach	SWE-bench Score	Token Efficiency
RAG-only retrieval	~45%	Baseline
RepoGraph + RAG hybrid	~62%	3x improvement
Full knowledge graph	~68%	10x improvement

The token efficiency gains compound over time. Agents make fewer exploratory queries when they can directly traverse the call graph, reducing both latency and API costs.

The Future: Hybrid Structural-Semantic Retrieval

The next evolution combines structural graph queries with semantic embeddings. Rather than choosing between „find callers of X“ (structural) and „find code similar to X“ (semantic), hybrid systems enable queries like:

„Find functions that call the payment API and handle similar error patterns to our retry logic.“

This bridges the gap between precise structural navigation and fuzzy semantic understanding—giving AI agents both the map and the intuition to navigate complex codebases.

Conclusion

Code knowledge graphs represent a fundamental shift in how AI agents understand software. By treating repositories as queryable graphs rather than searchable text, tools like Codebase-Memory, CodeGraph, and Lattice unlock capabilities that RAG-based retrieval simply cannot match: call-graph traversal, impact analysis, and sub-millisecond structural queries.

For platform engineering teams, the adoption path is clear: index your repositories, expose the graph via MCP, and integrate impact analysis into your PR workflows. The payoff—10x token efficiency and dramatically more accurate AI assistance—makes this infrastructure investment worthwhile for any team serious about AI-augmented development.

The tools are open source and ready to deploy. The question isn’t whether to adopt code knowledge graphs, but how quickly you can integrate them into your AI coding pipeline.

März 28, 2026April 1, 2026

The Platform Scorecard: Measuring IDP Value Beyond DORA Metrics

Introduction

You’ve built an Internal Developer Platform. Golden paths are paved, self-service portals are live, and developers can spin up environments in minutes instead of days. But when leadership asks „what’s the ROI?“, you find yourself scrambling for numbers that don’t quite capture the value you’ve created.

DORA metrics—deployment frequency, lead time, change failure rate, mean time to recovery—have become the default answer. But in 2026, they’re increasingly insufficient. AI-assisted development can inflate deployment frequency while masking review bottlenecks. Lead time improvements might come at the cost of technical debt. And none of these metrics capture what platform teams actually deliver: developer productivity and organizational capability.

This article introduces the Platform Scorecard—a framework for measuring IDP value that combines traditional delivery metrics with developer experience indicators, adoption signals, and business impact measures. It’s designed for platform teams who need to justify investment, prioritize roadmaps, and demonstrate value beyond „we deployed more stuff.“

Why DORA Metrics Fall Short

DORA metrics revolutionized how we think about software delivery performance. The research is solid, the correlations are real, and every platform team should track them. But they were designed to measure delivery capability, not platform value.

The AI Inflation Problem

With AI coding assistants generating more code faster, deployment frequency naturally increases. But this doesn’t mean developers are more productive—it might mean they’re spending more time reviewing AI-generated PRs, debugging subtle issues, or managing technical debt that accumulates faster than before.

A platform team that enables 10x more deployments hasn’t necessarily delivered 10x more value. They might have just enabled 10x more churn.

The Attribution Problem

When lead time improves, who gets credit? The platform team who built the CI/CD pipelines? The SRE team who optimized the deployment process? The developers who adopted better practices? The AI tools that generate boilerplate faster?

DORA metrics measure outcomes at the organizational level. Platform teams need metrics that measure their specific contribution to those outcomes.

The Experience Gap

A platform can have excellent DORA metrics while developers hate using it. Friction might be hidden in workarounds, shadow IT, or teams simply avoiding the platform altogether. DORA doesn’t capture whether developers want to use your platform—only whether code eventually ships.

The Platform Scorecard Framework

The Platform Scorecard measures platform value across four dimensions:

┌─────────────────────────────────────────────────────────────┐
│                   PLATFORM SCORECARD                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │   MONK      │  │   DX Core   │  │  Adoption   │        │
│  │ Indicators  │  │     4       │  │   Metrics   │        │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘        │
│         │                │                │                │
│         └────────────────┼────────────────┘                │
│                          ▼                                 │
│                 ┌─────────────┐                            │
│                 │  Business   │                            │
│                 │   Impact    │                            │
│                 └─────────────┘                            │
└─────────────────────────────────────────────────────────────┘

MONK Indicators: Platform-specific capability metrics
DX Core 4: Developer experience measurements
Adoption Metrics: Platform usage and engagement signals
Business Impact: Translation to organizational value

MONK Indicators: Measuring Platform Capability

MONK stands for four platform-specific indicators that measure what your IDP actually enables:

M — Mean Time to Productivity

How long does it take a new developer to ship their first meaningful change?

This isn’t just „time to first commit“—it’s time to first production deployment that delivers user value. It captures the entire onboarding experience: environment setup, access provisioning, documentation quality, and golden path effectiveness.

Level	MTTP	What It Indicates
Elite	< 1 day	Fully automated onboarding, excellent docs
High	1-3 days	Good automation, minor manual steps
Medium	1-2 weeks	Significant manual setup, tribal knowledge
Low	> 2 weeks	Broken onboarding, high friction

How to measure: Track the timestamp of a developer’s first day against their first production deployment. Survey new hires about blockers. Instrument your onboarding automation to identify where time is spent.

O — Observability Coverage

What percentage of services have adequate observability?

„Adequate“ means: structured logging, distributed tracing, key metrics dashboards, and alerting. If developers can’t debug their services without SSH-ing into production, your platform isn’t delivering on its observability promise.

Level	Coverage	What It Indicates
Elite	> 95%	Observability is default, opt-out not opt-in
High	80-95%	Most services instrumented, some gaps
Medium	50-80%	Inconsistent adoption, manual setup
Low	< 50%	Observability is an afterthought

How to measure: Scan your service catalog for observability signals. Check for active traces, log streams, and dashboard usage. Automate detection of services without adequate instrumentation.

N — Number of Services on Golden Paths

How many services use your platform’s recommended patterns?

Golden paths only deliver value if teams actually walk them. This metric tracks adoption of your templates, scaffolding, and recommended architectures versus custom or legacy approaches.

Level	Adoption	What It Indicates
Elite	> 80%	Golden paths are genuinely useful
High	60-80%	Good adoption, some justified exceptions
Medium	30-60%	Mixed adoption, paths may need improvement
Low	< 30%	Teams prefer alternatives, paths aren’t valuable

How to measure: Tag services by creation method (template vs. custom). Track which CI/CD patterns are in use. Survey teams about why they didn’t use golden paths.

K — Knowledge Accessibility

Can developers find answers without asking humans?

This measures documentation quality, search effectiveness, and self-service capability. Every question that requires Slack escalation is a failure of your platform’s knowledge layer.

Level	Self-Service Rate	What It Indicates
Elite	> 90%	Excellent docs, effective search, AI-assisted
High	70-90%	Good docs, some gaps in edge cases
Medium	50-70%	Inconsistent docs, frequent escalations
Low	< 50%	Tribal knowledge dominates

How to measure: Track support ticket volume per developer. Survey developers about where they find answers. Analyze search query success rates in your portal.

DX Core 4: Measuring Developer Experience

The DX Core 4 framework, developed by DX (formerly GetDX), measures developer experience through four key dimensions:

Speed

How fast can developers complete common tasks?

Time to create a new service
Time to add a new dependency
Time to deploy a change
Time to rollback a bad deployment
CI/CD pipeline duration

Effectiveness

Can developers accomplish what they’re trying to do?

Task completion rate for common workflows
Error rates in self-service operations
Percentage of tasks requiring manual intervention
First-try success rate for deployments

Quality

Does the platform help developers build better software?

Security vulnerability detection rate
Policy compliance scores
Test coverage trends
Production incident rates by platform-generated vs. custom services

Impact

Do developers feel they’re making meaningful contributions?

Percentage of time on feature work vs. toil
Developer satisfaction scores (quarterly surveys)
Net Promoter Score for the platform
Voluntary platform adoption rate

Adoption Metrics: Measuring Platform Usage

Adoption metrics tell you whether developers are actually using your platform—and how deeply.

Breadth Metrics

Active users: Monthly active developers using the platform
Team coverage: Percentage of teams with at least one active user
Service coverage: Percentage of production services managed by the platform

Depth Metrics

Feature adoption: Which platform capabilities are actually used?
Engagement frequency: How often do developers interact with the platform?
Workflow completion: Do users complete multi-step workflows or drop off?

Retention Metrics

Churn rate: Teams that stop using the platform
Return rate: Users who come back after initial use
Expansion: Teams adopting additional platform features

Shadow IT Indicators

Workaround detection: Teams building alternatives to platform features
Escape hatch usage: How often do teams need to bypass the platform?
Manual process survival: Legacy processes that should be automated

Business Impact: Translating to Value

Ultimately, platform investment needs to translate to business outcomes. The Platform Scorecard connects capability metrics to value through:

Cost Metrics

Infrastructure cost per service: Does the platform optimize resource usage?
Time savings: Developer hours saved by automation (valued at loaded cost)
Incident cost reduction: MTTR improvements × average incident cost
Onboarding cost: MTTP improvement × new hire cost per day

Risk Metrics

Security posture: Vulnerability exposure window, compliance violations
Operational risk: Single points of failure, bus factor for critical systems
Regulatory risk: Audit findings, compliance gaps

Capability Metrics

Time to market: How fast can the organization ship new products?
Experimentation velocity: A/B tests launched, feature flags toggled
Scale readiness: Can the organization 10x without 10x headcount?

Implementing the Platform Scorecard

Start Simple

Don’t try to measure everything at once. Pick one metric from each category:

MONK: Mean Time to Productivity (easiest to measure)
DX Core 4: Developer satisfaction survey (quarterly)
Adoption: Monthly active users
Business Impact: Developer hours saved

Automate Collection

Manual metrics decay quickly. Invest in:

Event tracking in your developer portal
CI/CD pipeline instrumentation
Automated surveys triggered by workflow completion
Service catalog scanning for compliance

Review Cadence

Weekly: Adoption metrics (leading indicators)
Monthly: MONK indicators, DX speed/effectiveness
Quarterly: Full scorecard review, business impact calculation

Benchmark and Trend

Absolute numbers matter less than trends. A 70% golden path adoption rate might be excellent for your organization or terrible—context determines meaning. Track improvement over time and benchmark against similar organizations when possible.

Presenting to Leadership

When presenting Platform Scorecard results to leadership, focus on:

Business impact first: Lead with cost savings and risk reduction
Trends over absolutes: Show improvement trajectories
Developer voice: Include satisfaction quotes and NPS
Comparative context: Industry benchmarks where available
Investment connection: Link metrics to roadmap priorities

Conclusion

DORA metrics remain valuable, but they’re not enough to measure platform value. The Platform Scorecard provides a comprehensive framework that captures what platform teams actually deliver: developer capability, experience improvement, and organizational value.

The key insight is that platforms are products, and products need product metrics. Deployment frequency tells you code is shipping. The Platform Scorecard tells you whether developers are thriving, the organization is more capable, and your investment is paying off.

Start measuring what matters. Your platform’s value is real—now you can prove it.

März 22, 2026März 22, 2026

The Great Migration: From Kubernetes Ingress to Gateway API

Introduction

After years as the de facto standard for HTTP routing in Kubernetes, Ingress is being retired. The Ingress-NGINX project announced in March 2026 that it’s entering maintenance mode, and the Kubernetes community has thrown its weight behind the Gateway API as the future of traffic management.

This isn’t just a rename. Gateway API represents a fundamental rethinking of how Kubernetes handles ingress traffic—more expressive, more secure, and designed for the multi-team, multi-tenant reality of modern platform engineering. But migration isn’t trivial: years of accumulated annotations, controller-specific configurations, and tribal knowledge need to be carefully translated.

This article covers why the migration is happening, how Gateway API differs architecturally, and provides a practical migration workflow using the new Ingress2Gateway tool that reached 1.0 in March 2026.

Why Ingress Is Being Retired

Ingress served Kubernetes well for nearly a decade, but its limitations have become increasingly painful:

The Annotation Problem

Ingress’s core specification is minimal—it handles basic host and path routing. Everything else—rate limiting, authentication, header manipulation, timeouts, body size limits—lives in annotations. And annotations are controller-specific.

# NGINX-specific annotations
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/auth-url: "https://auth.example.com/verify"
    # ... dozens more

Switch from NGINX to Traefik? Rewrite all your annotations. Want to use multiple ingress controllers? Good luck keeping the annotation schemas straight. This has led to:

Vendor lock-in: Teams hesitate to switch controllers because migration costs are high
Configuration sprawl: Critical routing logic is buried in annotations that are hard to audit
No validation: Annotations are strings—typos cause runtime failures, not deployment rejections

The RBAC Gap

Ingress is a single resource type. If you can edit an Ingress, you can edit any Ingress in that namespace. There’s no built-in way to separate „who can define routes“ from „who can configure TLS“ from „who can set up authentication policies.“

In multi-team environments, this forces platform teams to either:

Give app teams too much power (security risk)
Centralize all Ingress management (bottleneck)
Build custom admission controllers (complexity)

Limited Expressiveness

Modern traffic management needs capabilities that Ingress simply doesn’t support natively:

Traffic splitting for canary deployments
Header-based routing
Request/response transformation
Cross-namespace routing
TCP/UDP routing (not just HTTP)

Enter Gateway API

Gateway API is designed from the ground up to address these limitations. It’s not just „Ingress v2″—it’s a complete reimagining of how Kubernetes handles traffic.

Resource Model

Instead of cramming everything into one resource, Gateway API separates concerns:

┌─────────────────────────────────────────────────────────────┐
│                    GATEWAY API MODEL                        │
│                                                             │
│   ┌─────────────────┐                                       │
│   │  GatewayClass   │  ← Infrastructure provider config    │
│   │  (cluster-wide) │    (managed by platform team)        │
│   └────────┬────────┘                                       │
│            │                                                │
│   ┌────────▼────────┐                                       │
│   │     Gateway     │  ← Deployment of load balancer       │
│   │   (namespace)   │    (managed by platform team)        │
│   └────────┬────────┘                                       │
│            │                                                │
│   ┌────────▼────────┐                                       │
│   │   HTTPRoute     │  ← Routing rules                     │
│   │   (namespace)   │    (managed by app teams)            │
│   └─────────────────┘                                       │
└─────────────────────────────────────────────────────────────┘

GatewayClass: Defines the controller implementation (like IngressClass, but richer)
Gateway: Represents an actual load balancer deployment with listeners
HTTPRoute: Defines routing rules that attach to Gateways
Plus: TCPRoute, UDPRoute, GRPCRoute, TLSRoute for non-HTTP traffic

RBAC-Native Design

Each resource type has separate RBAC controls:

# Platform team: can manage GatewayClass and Gateway
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gateway-admin
rules:
  - apiGroups: ["gateway.networking.k8s.io"]
    resources: ["gatewayclasses", "gateways"]
    verbs: ["*"]

---
# App team: can only manage HTTPRoutes in their namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: route-admin
  namespace: team-alpha
rules:
  - apiGroups: ["gateway.networking.k8s.io"]
    resources: ["httproutes"]
    verbs: ["*"]

App teams can define their routing rules without touching infrastructure configuration. Platform teams control the Gateway without micromanaging every route.

Typed Configuration

No more annotation strings. Gateway API uses structured, validated fields:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-app
  namespace: production
spec:
  parentRefs:
    - name: production-gateway
  hostnames:
    - "app.example.com"
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080
          weight: 90
        - name: api-service-canary
          port: 8080
          weight: 10
      timeouts:
        request: 30s
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            add:
              - name: X-Request-ID
                value: "${request_id}"

Traffic splitting, timeouts, header modification—all first-class, validated fields. No more hoping you spelled the annotation correctly.

Ingress2Gateway: The Migration Tool

The Kubernetes SIG-Network team released Ingress2Gateway 1.0 in March 2026, providing automated translation of Ingress resources to Gateway API equivalents.

Installation

# Install via Go
go install github.com/kubernetes-sigs/ingress2gateway@latest

# Or download binary
curl -LO https://github.com/kubernetes-sigs/ingress2gateway/releases/latest/download/ingress2gateway-linux-amd64
chmod +x ingress2gateway-linux-amd64
sudo mv ingress2gateway-linux-amd64 /usr/local/bin/ingress2gateway

Basic Usage

# Convert a single Ingress
ingress2gateway print --input-file ingress.yaml

# Convert all Ingresses in a namespace
kubectl get ingress -n production -o yaml | ingress2gateway print

# Convert and apply directly
kubectl get ingress -n production -o yaml | ingress2gateway print | kubectl apply -f -

What Gets Translated

Ingress2Gateway handles:

Host and path rules: Direct translation to HTTPRoute
TLS configuration: Mapped to Gateway listeners
Backend services: Converted to backendRefs
Common annotations: Timeout, body size, redirects → native fields

What Requires Manual Work

Not everything translates automatically:

Controller-specific annotations: Authentication plugins, custom Lua scripts, rate limiting configurations often need manual migration
Complex rewrites: Regex-based path rewrites may need adjustment
Custom error pages: Implementation varies by Gateway controller

Ingress2Gateway generates warnings for annotations it can’t translate, giving you a checklist for manual review.

Migration Workflow

Phase 1: Assessment

# Inventory all Ingresses
kubectl get ingress -A -o yaml > all-ingresses.yaml

# Run Ingress2Gateway in analysis mode
ingress2gateway print --input-file all-ingresses.yaml 2>&1 | tee migration-report.txt

# Review warnings for untranslatable annotations
grep "WARNING" migration-report.txt

Phase 2: Parallel Deployment

Don’t cut over immediately. Run both Ingress and Gateway API in parallel:

# Deploy Gateway controller (e.g., Envoy Gateway, Cilium, NGINX Gateway Fabric)
helm install envoy-gateway oci://docker.io/envoyproxy/gateway-helm   --version v1.0.0   -n envoy-gateway-system --create-namespace

# Create GatewayClass
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: envoy
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller

# Create Gateway (gets its own IP/hostname)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: production
  namespace: gateway-system
spec:
  gatewayClassName: envoy
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: wildcard-cert

Phase 3: Traffic Shift

With both systems running, gradually shift traffic:

Update DNS to point to Gateway API endpoint with low weight
Monitor error rates, latency, and functionality
Increase Gateway API traffic percentage
Once at 100%, remove old Ingress resources

Phase 4: Testing

Behavioral equivalence testing is critical:

# Compare responses between Ingress and Gateway
for endpoint in $(cat endpoints.txt); do
  ingress_response=$(curl -s "https://ingress.example.com$endpoint")
  gateway_response=$(curl -s "https://gateway.example.com$endpoint")
  
  if [ "$ingress_response" != "$gateway_response" ]; then
    echo "MISMATCH: $endpoint"
  fi
done

Common Migration Pitfalls

Default Timeout Differences

Ingress-NGINX defaults to 60-second timeouts. Some Gateway implementations default to 15 seconds. Explicitly set timeouts to avoid surprises:

rules:
  - matches:
      - path:
          value: /api
    timeouts:
      request: 60s
      backendRequest: 60s

Body Size Limits

NGINX’s proxy-body-size annotation doesn’t have a direct equivalent in all Gateway implementations. Check your controller’s documentation for request size configuration.

Cross-Namespace References

Gateway API supports cross-namespace routing, but it requires explicit ReferenceGrant resources:

# Allow HTTPRoutes in team-alpha to reference services in backend namespace
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
  name: allow-team-alpha
  namespace: backend
spec:
  from:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      namespace: team-alpha
  to:
    - group: ""
      kind: Service

Service Mesh Interaction

If you’re running Istio or Cilium, check their Gateway API support status. Both now implement Gateway API natively, which can simplify your stack—but migration needs coordination.

Gateway Controller Options

Several controllers implement Gateway API:

Controller	Backing Proxy	Notes
Envoy Gateway	Envoy	CNCF project, feature-rich
NGINX Gateway Fabric	NGINX	From F5/NGINX team
Cilium	Envoy (eBPF)	If already using Cilium CNI
Istio	Envoy	Native Gateway API support
Traefik	Traefik	Good for existing Traefik users
Kong	Kong	Enterprise features available

Timeline and Urgency

While Ingress isn’t disappearing overnight, the writing is on the wall:

March 2026: Ingress-NGINX enters maintenance mode
Gateway API v1.0: Already stable since late 2023
New features: Only coming to Gateway API (traffic splitting, GRPC routing, etc.)

Start planning migration now. Even if you don’t execute immediately, understanding Gateway API will be essential for any new Kubernetes work.

Conclusion

The migration from Ingress to Gateway API is inevitable, but it doesn’t have to be painful. Gateway API offers genuine improvements—better RBAC, typed configuration, richer routing capabilities—that justify the migration effort.

Start with Ingress2Gateway to understand the scope of your migration. Deploy Gateway API alongside Ingress to validate behavior. Shift traffic gradually, test thoroughly, and you’ll emerge with a more maintainable, more secure traffic management layer.

The annotation chaos era is ending. The future of Kubernetes traffic management is typed, validated, and RBAC-native. It’s time to migrate.

März 14, 2026März 15, 2026

Measuring Developer Productivity in the AI Era: Beyond Velocity Metrics

Introduction

The promise of AI-assisted development is irresistible: 10x productivity gains, code written at the speed of thought, junior developers performing like seniors. But as organizations deploy GitHub Copilot, Claude Code, and other AI coding assistants, a critical question emerges: How do we actually measure the impact?

Traditional velocity metrics — story points completed, lines of code, pull requests merged — are increasingly inadequate. They measure output, not outcomes. Worse, they can be gamed, especially when AI can generate thousands of lines of code in seconds. This article explores modern frameworks for measuring developer productivity in the AI era, separating hype from reality and providing practical guidance for engineering leaders.

The Problem with Traditional Velocity Metrics

For decades, engineering teams have relied on metrics like:

Lines of Code (LOC): More code doesn’t mean better software. AI makes this metric meaningless — you can generate 10,000 lines in minutes.
Story Points / Velocity: Measures estimation consistency, not actual value delivered. Teams optimize for completing stories, not solving problems.
Pull Requests Merged: Encourages many small PRs over thoughtful changes. Doesn’t capture review quality or long-term impact.
Commits per Day: Trivially gameable. Says nothing about the value of those commits.

These metrics share a fundamental flaw: they measure activity, not productivity. In the AI era, activity is cheap. An AI can produce endless activity. What matters is whether that activity translates to business outcomes.

The SPACE Framework: A Holistic View

The SPACE framework, developed by researchers at GitHub, Microsoft, and the University of Victoria, offers a more nuanced approach. SPACE stands for:

Satisfaction and well-being
Performance
Activity
Communication and collaboration
Efficiency and flow

The key insight: productivity is multidimensional. No single metric captures it. Instead, you need a balanced set of metrics across all five dimensions, combining quantitative data with qualitative insights.

Applying SPACE to AI-Assisted Teams

When developers use AI coding assistants, SPACE metrics take on new meaning:

Satisfaction: Do developers feel AI tools help them? Or do they create frustration through incorrect suggestions and context-switching?
Performance: Are we shipping features that matter? Is customer satisfaction improving? Are we reducing incidents?
Activity: Still relevant, but must be interpreted carefully. High activity with AI might indicate productive use — or it might indicate the developer is blindly accepting suggestions.
Communication: Does AI change how teams collaborate? Are code reviews more or less effective? Is knowledge sharing happening?
Efficiency: Are developers spending less time on boilerplate? Is time-to-first-commit improving for new team members?

DORA Metrics: Outcomes Over Output

The DORA (DevOps Research and Assessment) metrics focus on delivery performance:

Deployment Frequency: How often do you deploy to production?
Lead Time for Changes: How long from commit to production?
Change Failure Rate: What percentage of deployments cause failures?
Mean Time to Recovery (MTTR): How quickly do you recover from failures?

DORA metrics are outcome-oriented: they measure the effectiveness of your entire delivery pipeline, not individual developer activity. In the AI era, they remain highly relevant — perhaps more so. AI should theoretically improve all four metrics. If it doesn’t, something is wrong.

AI-Specific DORA Extensions

Consider tracking additional metrics when AI is involved:

AI Suggestion Acceptance Rate: What percentage of AI suggestions are accepted? Too high might indicate rubber-stamping; too low suggests the tool isn’t helping.
AI-Assisted Change Failure Rate: Do changes written with AI assistance fail more or less often?
Time Saved per Task Type: For which tasks does AI provide the most leverage? Boilerplate? Tests? Documentation?

The „10x“ Reality Check

Marketing claims of „10x productivity“ with AI are pervasive. The reality is more nuanced:

Studies show 10-30% improvements in specific tasks like writing boilerplate code, generating tests, or explaining unfamiliar codebases.
Complex problem-solving sees minimal AI uplift. Architecture decisions, debugging subtle issues, and understanding business requirements still depend on human expertise.
Junior developers may see larger gains — AI helps them write syntactically correct code faster. But they still need to learn why code works, or they’ll introduce subtle bugs.
10x claims often compare against unrealistic baselines (e.g., writing everything from scratch vs. using any tooling at all).

A realistic expectation: AI provides meaningful productivity gains for certain tasks, modest gains overall, and requires investment in learning and integration to realize benefits.

Practical Metrics for AI-Era Teams

Based on SPACE, DORA, and real-world experience, here are concrete metrics to track:

Quantitative Metrics

Metric	What It Measures	AI-Era Considerations
Main Branch Success Rate	% of commits that pass CI on main	Should improve with AI; if not, AI may be introducing bugs
MTTR	Time to recover from incidents	AI-assisted debugging should reduce this
Time to First Commit (new devs)	Onboarding effectiveness	AI should accelerate ramp-up
Code Review Turnaround	Time from PR open to merge	AI-generated code may need more careful review
Test Coverage Delta	Change in test coverage over time	AI can generate tests; is coverage improving?

Qualitative Metrics

Developer Experience Surveys: Regular pulse checks on tool satisfaction, flow state, friction points.
AI Tool Usefulness Ratings: For each major task type, how helpful is AI? (Scale 1-5)
Knowledge Retention: Are developers learning, or becoming dependent on AI? Periodic assessments can reveal this.

Tooling: Waydev, LinearB, and Beyond

Several platforms now offer AI-era productivity analytics:

Waydev: Integrates with Git, Jira, and CI/CD to provide DORA metrics and developer analytics. Offers AI-specific insights.
LinearB: Focuses on workflow metrics, identifying bottlenecks in the development process. Good for measuring cycle time and review efficiency.
Pluralsight Flow (formerly GitPrime): Deep git analytics with focus on team patterns and individual contribution.
Jellyfish: Connects engineering metrics to business outcomes, helping justify AI tool investments.

When evaluating tools, ensure they can:

Distinguish between AI-assisted and non-AI-assisted work (if your tools support this tagging)
Provide qualitative feedback mechanisms alongside quantitative data
Avoid creating perverse incentives (e.g., rewarding lines of code)

Avoiding Measurement Pitfalls

Don’t use metrics punitively. Metrics are for learning, not for ranking developers. The moment metrics become tied to performance reviews, they get gamed.
Don’t measure too many things. Pick 5-7 key metrics across SPACE dimensions. More than that creates noise.
Do measure trends, not absolutes. A team’s MTTR improving over time is more meaningful than comparing MTTR across different teams.
Do include qualitative data. Numbers without context are dangerous. Regular conversations with developers provide essential context.
Do revisit metrics regularly. As AI tools evolve, so should your measurement approach.

Conclusion

Measuring developer productivity in the AI era requires abandoning simplistic velocity metrics in favor of holistic frameworks like SPACE and outcome-oriented measures like DORA. The „10x productivity“ hype should be tempered with realistic expectations: AI provides meaningful but not transformative gains, and those gains vary significantly by task type and developer experience.

The organizations that will thrive are those that invest in thoughtful measurement — combining quantitative data with qualitative insights, tracking outcomes rather than output, and continuously refining their approach as AI tools mature.

Start by auditing your current metrics. Are they measuring activity or productivity? Then layer in SPACE dimensions and DORA outcomes. Finally, talk to your developers — their lived experience with AI tools is the most valuable data point of all.

März 13, 2026März 13, 2026

Intent-Driven Infrastructure: From IaC Scripts to Self-Reconciling Platforms

Introduction

For years, Infrastructure as Code (IaC) has been the gold standard for managing cloud resources. Tools like Terraform, Pulumi, and CloudFormation brought version control, repeatability, and collaboration to infrastructure management. But as cloud environments grow in complexity, a fundamental tension has emerged: IaC scripts describe how to build infrastructure, not what infrastructure should look like.

Intent-driven infrastructure flips this paradigm. Instead of writing imperative scripts or even declarative configurations that describe specific resources, you express intents — high-level descriptions of desired outcomes. The platform then continuously reconciles reality with intent, automatically correcting drift, scaling resources, and enforcing policies.

This article explores how intent-driven infrastructure works, the technologies enabling it, and practical steps to adopt this approach in your organization.

The Limitations of Traditional IaC

Traditional IaC has served us well, but several pain points are driving the need for evolution:

Configuration Drift: Despite declarative tools, drift between desired and actual state is common. Manual changes, failed applies, and partial rollbacks create inconsistencies that require human intervention to resolve.
Brittle Pipelines: CI/CD pipelines for infrastructure often break on edge cases — timeouts, API rate limits, dependency ordering. Recovery requires manual debugging and re-running pipelines.
Cognitive Overhead: Developers must understand cloud-provider-specific APIs, resource dependencies, and lifecycle management. This creates a bottleneck where only specialized engineers can make infrastructure changes.
Day-2 Operations Gap: Most IaC tools excel at provisioning but struggle with ongoing operations — scaling, patching, certificate rotation, and compliance enforcement.

What is Intent-Driven Infrastructure?

Intent-driven infrastructure introduces a higher level of abstraction. Instead of specifying individual resources, you express intents like:

“I need a production-grade PostgreSQL database with 99.9% availability, encrypted at rest, accessible only from the application namespace, with automated backups retained for 30 days.”

The platform interprets this intent and:

Compiles it into concrete resource definitions (RDS instance, security groups, backup policies, monitoring rules)
Validates against organizational policies (cost limits, security requirements, compliance rules)
Provisions the resources across the appropriate cloud accounts
Continuously reconciles — if drift is detected, the platform automatically corrects it

Core Architectural Patterns

Kubernetes as Universal Control Plane

The Kubernetes API server and its reconciliation loop have proven to be remarkably versatile. Projects like Crossplane leverage this pattern to manage any infrastructure resource through Kubernetes Custom Resource Definitions (CRDs). The key insight: the reconciliation loop that keeps your pods running can also keep your cloud infrastructure aligned with intent.

Crossplane Compositions as Intent Primitives

Crossplane v2 Compositions allow platform teams to define reusable, opinionated templates that abstract away provider-specific complexity. A single DatabaseIntent CRD can provision an RDS instance on AWS, Cloud SQL on GCP, or Azure Database — the developer only expresses intent, not implementation.

apiVersion: platform.example.com/v1alpha1
kind: DatabaseIntent
metadata:
  name: orders-db
spec:
  engine: postgresql
  version: "16"
  availability: high
  encryption: true
  backup:
    retentionDays: 30
  network:
    allowFrom:
      - namespace: orders-app

Policy Guardrails: OPA, Kyverno, and Cedar

Intent without governance is chaos. Policy engines ensure that every intent is validated before execution:

OPA (Open Policy Agent) / Gatekeeper: Rego-based policies for Kubernetes admission control. Powerful but requires learning a new language.
Kyverno: YAML-native policies that feel natural to Kubernetes operators. Lower barrier to entry, excellent for common patterns.
Cedar: AWS-backed authorization language for fine-grained access control. Emerging as a standard for application-level policy.

Together, these tools enforce constraints like cost ceilings, security baselines, and compliance requirements — automatically, at every change.

Continuous Reconciliation vs. Imperative Apply

The fundamental shift from traditional IaC to intent-driven infrastructure is moving from imperative apply (run a pipeline to make changes) to continuous reconciliation (the platform constantly ensures reality matches intent). This eliminates drift by design rather than detecting it after the fact.

Orchestration Platforms: Humanitec and Score

Humanitec provides an orchestration layer that translates developer intent into fully resolved infrastructure configurations. Using Score (an open-source workload specification), developers describe what their application needs without specifying how it is provisioned. The platform engine resolves dependencies, applies organizational rules, and generates deployment manifests.

Benefits in Practice

Faster Recovery: When infrastructure drifts or fails, the reconciliation loop automatically corrects it. MTTR drops from hours to minutes.
Safer Changes: Policy gates validate every change before execution. No more “oops, I deleted the production database” moments.
Developer Velocity: Developers express intent in familiar terms, not cloud-provider-specific configurations. Time-to-production for new services drops significantly.
Compliance by Default: Security, cost, and regulatory policies are enforced continuously, not checked periodically.
AI-Agent Compatibility: Intent-based APIs are natural interfaces for AI agents. An AI coding assistant can express “I need a cache with 10GB capacity” without understanding the intricacies of ElastiCache configuration.

Challenges and Guardrails

Intent-driven infrastructure is not without its challenges:

Abstraction Leakage: When things go wrong, engineers need to understand the underlying resources. Too much abstraction can make debugging harder.
Policy Complexity: As organizations grow, policy definitions can become complex and conflicting. Invest in policy testing and simulation.
Observability: You need new metrics — not just “is the resource healthy?” but “is the intent satisfied?” Intent satisfaction metrics are a new concept for most teams.
Migration Path: Existing Terraform/Pulumi codebases represent significant investment. Migration must be gradual, starting with new workloads and selectively adopting intent-driven patterns for existing ones.
Organizational Change: Intent-driven infrastructure shifts responsibilities. Platform teams own the abstraction layer; application teams own the intents. This requires clear role definitions and trust.

Getting Started: A Minimal Viable Implementation

Start Small: Pick one workload type (e.g., databases) and create an intent CRD using Crossplane Compositions.
Add Policy Gates: Implement basic Kyverno policies for cost limits and security baselines.
Enable Reconciliation: Let the Crossplane controller continuously reconcile. Monitor drift detection and auto-correction rates.
Measure Impact: Track MTTR, change drift frequency, time-to-recover, and developer satisfaction.
Iterate: Expand to more resource types, add more sophisticated policies, and integrate with your IDP (Internal Developer Portal).

Conclusion

Intent-driven infrastructure represents the next evolution of Infrastructure as Code. By shifting from imperative scripts to declarative intents backed by continuous reconciliation and policy guardrails, organizations can build platforms that are more resilient, more secure, and more developer-friendly.

The tools are maturing rapidly — Crossplane, Humanitec, OPA, Kyverno, and the broader Kubernetes ecosystem provide a solid foundation. The question is no longer whether to adopt intent-driven patterns, but how fast your team can start the journey.

Start with a single workload, prove the value, and scale from there. Your future self — debugging a production issue at 3 AM — will thank you when the platform auto-heals before you even finish your coffee.

März 4, 2026März 6, 2026

Internal Developer Portals: Backstage, Port.io, and the Path to Self-Service Platforms

Platform Engineering: The 2026 Megatrend

The days when developers had to write tickets and wait for days for infrastructure are over. Internal Developer Portals (IDPs) are the heart of modern Platform Engineering teams — enabling self-service while maintaining governance.

Comparing the Contenders

Backstage (Spotify)

The open-source heavyweight from Spotify has established itself as the de facto standard:

Software Catalog — Central overview of all services, APIs, and resources
Tech Docs — Documentation directly in the portal
Templates — Golden paths for new services
Plugins — Extensible through a large community

Strength: Flexibility and community. Weakness: High setup and maintenance effort.

Port.io

The SaaS alternative for teams that want to be productive quickly:

No-Code Builder — Portal without development effort
Self-Service Actions — Day-2 operations automated
Scorecards — Production readiness at a glance
RBAC — Enterprise-ready access control

Strength: Time-to-value. Weakness: Less flexibility than open source.

Cortex

The focus is on service ownership and reliability:

Service Scorecards — Enforce quality standards
Ownership — Clear responsibilities
Integrations — Deep connection to monitoring tools

Strength: Reliability engineering. Weakness: Less developer experience focus.

Software Catalogs: The Foundation

An IDP stands or falls with its catalog. The core questions:

What do we have? — Services, APIs, databases, infrastructure
Who owns it? — Service ownership must be clear
What depends on what? — Dependency mapping for impact analysis
How healthy is it? — Scorecards for quality standards

Production Readiness Scorecards

Instead of saying „you should really have that,“ scorecards make standards measurable:

Service: payment-api
━━━━━━━━━━━━━━━━━━━━
✅ Documentation    [100%]
✅ Monitoring       [100%]
⚠️  On-Call Rotation [ 80%]
❌ Disaster Recovery [ 20%]
━━━━━━━━━━━━━━━━━━━━
Overall: 75% - Bronze

Teams see at a glance where action is needed — without anyone pointing fingers.

Integration Is Everything

An IDP is only as good as its integrations:

CI/CD — GitHub Actions, GitLab CI, ArgoCD
Monitoring — Datadog, Prometheus, Grafana
IaC — Terraform, Crossplane, Pulumi
Ticketing — Jira, Linear, ServiceNow
Cloud — AWS, GCP, Azure native services

The Cultural Shift

The biggest challenge isn’t technical — it’s the shift from gatekeeping to enablement:

Old (Gatekeeping)	New (Enablement)
„Write a ticket“	„Use the portal“
„We’ll review it“	„Policies are automated“
„Takes 2 weeks“	„Ready in 5 minutes“
„Only we can do that“	„You can, we’ll help“

Getting Started

The pragmatic path to an IDP:

Start small — A software catalog alone is valuable
Pick your battles — Don’t automate everything at once
Measure adoption — Track portal usage
Iterate — Take developer feedback seriously

Platform Engineering isn’t a product you buy — it’s a capability you build. IDPs are the visible interface to that capability.

Februar 23, 2026Februar 23, 2026

Agentic AI in the SDLC: From Copilot to Autonomous DevOps

The Evolution Beyond AI-Assisted Development

We’ve all gotten comfortable with AI assistants in our IDEs. Copilot suggests code, ChatGPT explains errors, and various tools help us write tests. But there’s a fundamental shift happening: AI is moving from assistant to agent.

The difference? An assistant waits for your prompt. An agent takes initiative.

What Does „Agentic AI“ Mean for the SDLC?

Traditional AI in development is reactive. You ask a question, you get an answer. Agentic AI is different—it operates with goals, not just prompts:

Planning — Breaking complex tasks into actionable steps
Tool Use — Interacting with APIs, CLIs, and infrastructure directly
Reasoning — Making decisions based on context and constraints
Persistence — Maintaining state across multiple interactions
Self-Correction — Detecting and recovering from errors

Imagine telling an AI: „We need a new microservice for payment processing with PostgreSQL, deployed to our EU cluster, with proper security policies.“ An agentic system doesn’t just write the code—it provisions the database, creates the Kubernetes manifests, configures network policies, sets up monitoring, and opens a PR for review.

The Architecture of Agentic DevSecOps

Building autonomous AI into your SDLC requires more than just API keys. You need infrastructure designed for agent operations:

1. Agent-Native Infrastructure

AI agents need first-class platform support:

apiVersion: platform.example.io/v1
kind: AIAgent
metadata:
  name: infra-provisioner
spec:
  provider: anthropic
  model: claude-3
  mcpEndpoints:
    - kubectl
    - crossplane-claims
    - argocd
  rbacScope: namespace/dev-team
  rateLimits:
    requestsPerMinute: 30
    resourceClaims: 5

This isn’t hypothetical—it’s where platform engineering is heading. Agents as managed workloads with proper RBAC, quotas, and audit trails.

2. Multi-Layer Guardrails

Autonomous AI requires autonomous safety. A five-layer approach:

Input Validation — Schema enforcement, prompt injection detection
Action Scoping — Resource limits, allowed operations whitelist
Human Approval Gates — Critical actions require sign-off
Audit Logging — Every agent action traceable and reviewable
Rollback Capabilities — Automated recovery from failed operations

The goal: let agents move fast on routine tasks while maintaining human oversight where it matters.

3. GitOps-Native Agent Operations

Every agent action should be a Git commit. Database provisioned? That’s a Crossplane claim in a PR. Deployment scaled? That’s a manifest change with full history. This gives you:

Complete audit trail
Easy rollback (git revert)
Review workflows for sensitive changes
Drift detection (desired state vs. actual)

Real-World Agent Workflows

Here’s what becomes possible:

Scenario: Production Incident Response

Alert fires: „Payment service latency > 500ms“
Agent analyzes metrics, traces, and recent deployments
Identifies: database connection pool exhaustion
Creates PR: increase pool size + add connection timeout
Runs canary deployment to staging
Notifies on-call engineer for production approval
After approval: deploys to production, monitors recovery

Time from alert to fix: minutes, not hours.

Scenario: Developer Self-Service

Developer: „I need a PostgreSQL database for my new service, small size, EU region, with daily backups.“

Agent:

Creates Crossplane Database claim
Provisions via the appropriate cloud provider
Configures External Secrets for credentials
Adds Prometheus ServiceMonitor
Updates team’s resource inventory
Responds with connection details and docs link

No tickets. No waiting. Full compliance.

The Security Imperative

With great autonomy comes great responsibility. Agentic systems in your SDLC must be security-first by design:

Zero Trust — Agents authenticate for every action, no ambient authority
Least Privilege — Granular RBAC scoped to specific resources and operations
No Secrets in Prompts — Credentials via Vault/External Secrets, never in context
Network Isolation — Agent workloads in dedicated, policy-controlled namespaces
Immutable Audit — Every action logged to tamper-evident storage

Getting Started

You don’t need to build everything at once. A pragmatic path:

Start with observability — Let agents read metrics and logs (no write access)
Add diagnostic capabilities — Agents can analyze and recommend, humans execute
Enable scoped automation — Agents can act within strict guardrails (dev environments first)
Expand with trust — Gradually increase scope based on demonstrated reliability

The Future is Agentic

The SDLC has always been about automation—from compilers to CI/CD to GitOps. Agentic AI is the next layer: automating the decisions, not just the execution.

The organizations that figure this out first will ship faster, respond to incidents quicker, and let their engineers focus on the creative work that humans do best.

The question isn’t whether to adopt agentic AI in your SDLC. It’s how fast you can build the infrastructure to do it safely.

This is part of our exploration of AI-native platform engineering at it-stud.io. We’re building open-source tooling for agentic DevSecOps—follow along on GitHub.

Februar 21, 2026Februar 21, 2026

AI Observability: Why Your AI Agents Need OpenTelemetry

The Black Box Problem in AI Agents

When you deploy an AI agent in production, you’re essentially running a complex system that makes decisions, calls external APIs, processes data, and interacts with users—all in ways that can be difficult to understand after the fact. Traditional logging tells you that something happened, but not why or how long or at what cost.

For LLM-based systems, this opacity becomes a serious operational challenge:

Token costs can spiral without visibility into per-request usage
Latency issues hide in the pipeline between prompt and response
Tool calls (file reads, API requests, code execution) happen invisibly
Context window management affects quality but rarely surfaces in logs

The answer? Observability—specifically, distributed tracing designed for AI workloads.

OpenTelemetry: The Standard not only for AI Observability

OpenTelemetry (OTEL) has emerged as the industry standard for collecting telemetry data—traces, metrics, and logs—from distributed systems. What makes it particularly powerful for AI applications:

Traces Show the Full Picture

A single user message to an AI agent might trigger:

Webhook reception from Telegram/Slack
Session state lookup
Context assembly (system prompt + history + tools)
LLM API call to Anthropic/OpenAI
Tool execution (file read, web search, code run)
Response streaming back to user

With OTEL traces, each step becomes a span with timing, attributes, and relationships. You can see exactly where time is spent and where failures occur.

Metrics for Cost Control

OTEL metrics give you counters and histograms for:

tokens.input / tokens.output per request
cost.usd aggregated by model, channel, or user
run.duration_ms to track response latency
context.tokens to monitor context window usage

This transforms AI spend from „we used $X this month“ to „user Y’s workflow Z costs $0.12 per run.“

Practical Setup: OpenClaw + Jaeger

At it-stud.io, we tested OpenClaw as our AI agent framework – already supporting OTEL by default – and enabled full observability with a simple configuration change:

{
  "plugins": {
    "allow": ["diagnostics-otel"],
    "entries": {
      "diagnostics-otel": { "enabled": true }
    }
  },
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://localhost:4318",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "sampleRate": 1.0
    }
  }
}

For the backend, we chose Jaeger—a CNCF-graduated project that provides:

OTLP ingestion (HTTP on port 4318)
Trace storage and search
Clean web UI for exploration
Zero external dependencies (all-in-one binary)

What You See: Real Traces from AI Operations

Once enabled, every AI interaction generates rich telemetry:

openclaw.model.usage

Provider, model name, channel
Input/output/cache tokens
Cost in USD
Duration in milliseconds
Session and run identifiers

openclaw.message.processed

Message lifecycle from queue to response
Outcome (success/error/timeout)
Chat and user context

openclaw.webhook.processed

Inbound webhook handling per channel
Processing duration
Error tracking

From Tracing to AI Governance

Observability isn’t just about debugging—it’s the foundation for:

Cost Allocation

Attribute AI spend to specific projects, users, or workflows. Essential for enterprise deployments where multiple teams share infrastructure.

Compliance & Auditing

Traces provide an immutable record of what the AI did, when, and why. Critical for regulated industries and internal governance.

Performance Optimization

Identify slow tool calls, optimize prompt templates, right-size model selection based on actual latency requirements.

Capacity Planning

Metrics trends inform scaling decisions and budget forecasting.

Getting Started

If you’re running AI agents in production without observability, you’re flying blind. The good news: implementing OTEL is straightforward with modern frameworks.

Our recommended stack:

Instrumentation: Framework-native (OpenClaw, LangChain, etc.) or OpenLLMetry
Collection: OTEL Collector or direct OTLP export
Backend: Jaeger (simple), Grafana Tempo (scalable), or Langfuse (LLM-specific)

The investment is minimal; the visibility is transformative.

At it-stud.io, we help organizations build observable, governable AI systems. Interested in implementing AI observability for your team? Get in touch.