AI Platform Engineering & MLOps · Part XXII of 34
Golden paths for ML
Paved-road templates that survive contact with users — how platform teams define, template, and evolve the three canonical ML golden paths, and why the deprecation contract matters as much as the path itself.
Platform teams are in the business of removing decisions. Every time a data scientist has to figure out which container registry to push to, which experiment-tracking endpoint to configure, or which serving framework to use, they are spending cognitive budget on infrastructure rather than on the model. The golden path pattern— first articulated publicly by Spotify’s engineering team and formalised in Skelton and Pais’s team interaction model — addresses this directly: define a paved, opinionated workflow for the common cases, pre-wire the integrations, and let product teams walk the path without understanding what is under it.
Spotify’s 2020 engineering blog post “How We Use Golden Paths to Solve Fragmentation in Our Software Ecosystem” describes the paved-road metaphorprecisely: a path is not a mandate. A team that needs to diverge can do so, but they leave the path and take on the maintenance burden of whatever they build instead. The CNCF TAG App Delivery Platforms Whitepaper (2023) formalises this at the industry level, describing the platform’s job as offering “a bundle often described as a golden path” accompanied by an initial project template and documentation. Both framings share a key discipline: a golden path is only golden if it is kept up to date. A stale path is worse than no path — it channels teams into known-bad configurations.
This article defines the three canonical golden paths for an ML platform, the mechanism used to stamp them out as templates, the governance gates wired into each path, and — critically — the deprecation contract that keeps a path trustworthy over time.
What makes a path “golden”
Skelton and Pais’s Team Topologies (IT Revolution Press, 2019) frames the platform team as a stream-aligned team’s internal supplier. The platform team’s primary output is not running services — it is reducing the cognitive load on product teams. The paved road is the primary mechanism: an opinionated, tested, integrated path that a product team can follow without needing to understand the platform in depth.
Three properties distinguish a golden path from a mere tutorial:
- Scaffolded, not described. A team starting the path runs one command (or clicks one button in an internal developer portal) and receives a working repository skeleton, pre-wired CI, and pre-configured integrations. Documentation exists, but the path does not require the team to read it before getting started.
- Enforces a gate. At some point in the path — typically at promotion time or deployment time — an automated gate runs checks (eval scores, model card completeness, latency regressions, security scans). A path without a gate is a convenience; a path with a gate is a quality mechanism.
- Versioned and deprecatable. Platform teams evolve their stack. A path is a contract with its consumers. Consumers deserve a defined notice window — typically measured in weeks to months — and a migration script when a path is deprecated. Without this contract, teams fear using the path at all.
The three canonical ML golden paths
Three paths address the bulk of ML workloads on a modern AI platform. Each is described by its trigger (what causes a team to walk this path), its key inputs and outputs, and the gate it enforces.
01
Batch inference pipeline
Scheduled predictions on a corpus
02
Model serving: real-time inference
Request-time predictions for a live service
03
GenAI feature with vector index
RAG-backed search or Q&A surface
Path 1 — Batch inference pipeline
Trigger: an ML team has a trained model that produces predictions on a schedule — nightly fraud scores, weekly recommendations, monthly risk ratings — rather than in real time.
Input:a trained model artefact promoted to the model registry’s staging stage, and a data source reference (a feature store view, a data-lake partition, or a streaming-snapshot export).
Output: predictions written to an output store (object storage, a database table, or a downstream event stream), with row-count and schema assertions confirming the run succeeded.
Gate: an output-validation step inside the pipeline — schema check, row-count assertion, and a lightweight quality metric check — that blocks promotionof the batch job to the model registry’s production stage if any assertion fails. A GitOps controller (e.g. Argo CD or Flux) then detects the production-stage promotion and syncs the scheduled-job manifest to the cluster. Downstream systems see only production-stage outputs.
The pipeline definition lives in a scaffolded Git repository. The scaffold (produced by an internal developer portal template or a Backstage Software Template) pre-wires the experiment tracker, the model registry credential, the output-store path convention, and the CI pipeline that validates the pipeline definition itself before it runs in production.
Path 2 — Model serving: real-time inference
Trigger: an ML team has a model that must produce predictions at request time — fraud detection on a payment, ranking on a search query, content moderation on a submitted post.
Input: a model artefact in the registry, plus a serving manifest (an InferenceService definition for a serving runtime such as KServe, BentoML, or Seldon Core) authored by the ML engineer and committed to a deployment Git repository.
Output: a stable, versioned prediction endpoint consumed by application engineers. The endpoint URI does not change across model revisions — only the model revision behind it changes.
Gate: a CI gate on the deployment repository PR that runs three checks: (1) model-card completeness— the model card must document intended use, training data provenance, and known limitations; (2) eval-score threshold — the model’s offline evaluation score must exceed the team’s configured minimum; (3) latency regression test — shadow inference against a canary endpoint must show P95 latency within the configured tolerance of the current production model.
After the gate passes and the PR is merged, the GitOps controller syncs the InferenceService manifest. Traffic is initially split — for example, 5% to the new revision, 95% to the previous. A progressive-delivery controller (e.g. Argo Rollouts or Flagger) watches prediction latency, error rate, and prediction-quality metrics. If metrics stay within bounds across a configurable observation window, traffic advances to 100% for the new revision. If metrics breach bounds, the rollout is automatically aborted and the previous revision retakes full traffic. The Argo Rollouts project documents the AnalysisRun and Rollout resource types that implement this pattern.
Path 3 — GenAI feature with a vector index
Trigger: an application team wants to add retrieval-augmented generation (RAG)to a product — a search surface, a Q&A interface, a document assistant.
Input: a document corpus with a defined data access credential (scoped read-only), and a choice of embedding endpointfrom the platform’s model catalogue.
Output: a running RAG feature backed by a scheduled indexing pipeline and a vector store query endpoint (e.g. pgvector, Qdrant, Weaviate, or Milvus). The LLM inference endpoint is provided by the platform — either self-hosted or a proxied external API — so the application team does not manage model serving directly.
Gate: an offline evaluation harness that runs on the indexing pipeline’s output — measuring retrieval recall on a ground-truth question-answer set — and an online evaluation surface (explicit feedback signals captured in the application layer). The offline recall gate blocks promotion to production if recall falls below threshold; the online gate feeds a monitoring dashboard rather than blocking deployment, since production traffic is the only source of real query distribution.
The templating mechanism
A golden pathis not a document — it is an executable template. The CNCF Platforms Whitepaper describes this as offering an “initial project template and documentation, a bundle often described as a golden path.” The Backstage Software Templates specification (API version scaffolder.backstage.io/v1beta3) is one widely adopted mechanism: a YAML Template document with a spec.parameters section (the inputs the user provides — project name, team, data source reference) and a spec.steps section (the actions the scaffolder runs: fetch a skeleton, render files from a template, open a repository, register the new component in the catalog).
scaffolder-template-batch-model.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: batch-model-pipeline
title: Batch Model Pipeline
description: Golden path for scheduling batch inference jobs
spec:
owner: platform-team
type: ml-pipeline
parameters:
- title: Project details
required: [modelName, teamSlug, outputStorePrefix]
properties:
modelName:
type: string
description: Name of the model (must match registry slug)
teamSlug:
type: string
description: Your team identifier for RBAC and labelling
outputStorePrefix:
type: string
description: Object-store prefix for batch output (e.g. s3://data/predictions/)
steps:
- id: fetch-skeleton
name: Fetch pipeline skeleton
action: fetch:template
input:
url: ./skeleton
values:
modelName: ${{ parameters.modelName }}
teamSlug: ${{ parameters.teamSlug }}
outputStorePrefix: ${{ parameters.outputStorePrefix }}
- id: publish
name: Create Git repository
action: publish:github
input:
repoUrl: github.com?owner=${{ parameters.teamSlug }}&repo=${{ parameters.modelName }}-pipeline
- id: register
name: Register in catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}For teams not running an internal developer portal with a scaffolding engine, the same outcome is achievable with Argo CD ApplicationSets using the Cluster generator pattern: a single ApplicationSet template is parameterised from registered cluster Secrets, stamping out one Application per cluster (or per environment) without manual duplication. The Argo CD documentation describes the Cluster generator as the primary mechanism for multi-cluster template instantiation. Kustomize base-plus-overlay provides the per-environment patch layer in both cases — a base directory holds the canonical manifest, and an overlay directory for each environment (dev, staging, production) holds only the values that differ.
applicationset-batch-model.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: batch-model-pipeline
namespace: argocd
spec:
generators:
- clusters: {} # one Application per registered cluster Secret
template:
metadata:
name: '{{name}}-batch-model'
spec:
project: ml-workloads
source:
repoURL: https://git.example.com/platform/batch-model-base
targetRevision: HEAD
path: overlays/{{metadata.labels.env}}
destination:
server: '{{server}}'
namespace: ml-inference
syncPolicy:
automated:
prune: true
selfHeal: trueThe choice of mechanism — portal scaffold, ApplicationSet, or Kustomize overlay— depends on what the platform already operates. The key discipline is the same in all cases: the template is the canonical source of truth for the path. A team that modifies the generated skeleton is drifting from the path; the platform team’s tooling detects that drift via GitOps sync-status checks and surfaces it to both teams.
The composer below lets you assemble a paved-road template from building blocks and see the support, drift-reduction, and velocity trade-offs in real time.
Golden Path Composer
Toggle building blocks to assemble your paved-road template. Watch the support, drift reduction, and velocity trade-offs update in real time.
Scaffold
Gate
Delivery
Deprecation
Trade-off readout
Production-ready golden path
Consider a deprecation contract — consumers need a defined notice window to trust the path.
3 blocks selected
Governance gates wired into the path
A golden path is valuable because it enforces good defaults automatically. The gates that matter most for ML workloads sit at three points:
- 1Registry promotion gate. Before a model artefact moves from staging to production in the model registry, it must pass automated checks: minimum eval score, model card completeness, and (for regulated industries) an explicit reviewer signoff. The model registry's webhook or event integration triggers the CI gate; the gate's pass/fail result is written back to the registry as a metadata annotation. This makes the gate auditable — any downstream system can query whether a given model version passed all gates.
- 2Deployment PR gate. When an ML engineer opens a PR against the deployment repository, the CI pipeline runs the model-card check, eval-score threshold, and latency regression test (Path 2) or recall-on-ground-truth check (Path 3). This gate runs in the CI system — not in the cluster — so it fails fast, before the GitOps controller ever sees the manifest.
- 3Runtime rollout gate. After deployment, the progressive-delivery controller observes live metrics. For serving models (Path 2), this means request latency and error rate from the serving layer's metrics endpoint, plus any model-quality signal the application emits. For batch models (Path 1), this means the output-validation step in the pipeline itself. The rollout gate is the safety net for the cases the CI gate did not catch — distribution shift detected only under real traffic, latency regression that appears only at production request volumes.
| Gate | Where it runs | What it checks | Applies to |
|---|---|---|---|
| Registry promotion | CI (registry webhook) | Eval score, model card, reviewer signoff | All paths |
| Deployment PR | CI (PR pipeline) | Model card, eval threshold, latency regression / recall | Path 2, Path 3 |
| Runtime rollout | Cluster (progressive delivery) | Latency, error rate, prediction quality, output validation | All paths |
The deprecation contract
A golden path that cannot be deprecated safely becomes technical debt. Platform teams that skip the deprecation contract find themselves maintaining old path versions indefinitely — because consumers are stuck on them, because no migration tooling was provided, because the notice window was too short. The pattern for a trustworthy deprecation contract has four steps:
- Announce with a defined notice window. Consumers of the path get a notice period — typically measured in weeks to months — before the old path is removed. No standard mandates a specific number; the appropriate window depends on the consumer's release cadence and the complexity of the migration.
- Provide a migration script or automated PR. The platform team does not announce a deprecation and leave consumers to figure out the migration themselves. The scaffolding system opens automated PRs against consumer repositories — replacing old template references with the new version, updating dependency pins, adjusting CI configuration. Backstage's Software Templates and the scaffolder action system support this pattern natively.
- Track adoption. The platform team maintains an inventory of which repositories are on which path version — sourced from the IDP catalog or from Git metadata. Deprecation is not complete until every consumer has migrated or has been deliberately granted an extension.
- Remove on schedule. The old path version is removed at the end of the notice window. Exceptions are tracked explicitly and have an expiry date. An exception that has no expiry date is a permanent fork — the condition that the deprecation contract exists to prevent.
The timeline below shows what happens when a team leaves the paved road. Choose an off-ramp point and play the simulation to see how drift, support, and maintenance cost compound over time.
Template Drift Timeline
Choose when a team leaves the paved road, then play the timeline to see how support, drift, and maintenance cost evolve compared to a team that stays on-path.
When does the team take the off-ramp?
Timeline
Team scaffolds from the golden path template.
Diverge point: Month 4— click any dot to jump to that point
The off-ramp and when to take it
Golden paths address the majority of workloads, not all of them. Teams encounter off-rampswhen their requirements exceed the path’s design envelope:
- Path 1 off-ramps include multi-node distributed training jobs, non-standard output destinations, and pipeline dependencies on systems the platform does not yet integrate with.
- Path 2 off-ramps include streaming inference (event-triggered prediction), multi-model ensembles, and custom pre/post-processing pipelines that do not fit the serving runtime's transformer abstraction.
- Path 3 off-ramps include hybrid search (keyword plus semantic), custom re-ranking pipelines, and multi-turn agent loops with tool use — which extend beyond simple RAG into agentic infrastructure.
The discipline at the off-ramp matters more than the path itself. When a team hits an off-ramp, the platform team has three options: extend the path (add the capability to the template), document the divergence pattern (add it to an extension catalogue), or accept the team building independently (with explicit acknowledgement that they own the maintenance). Which option applies depends on how many teams share the need. A one-team requirement is a candidate for independent build; a requirement shared by three or more teams is a candidate for path extension.
1 team needs it
Accept independent build
Explicit acknowledgement of maintenance ownership.
2 teams need it
Document divergence pattern
Add to extension catalogue; watch for a third team.
3+ teams need it
Extend the path
Add the capability to the template itself.
Connecting the three paths to the broader platform
The three paths are built on top of platform capabilities described elsewhere in this series. The toolchain that makes path scaffolding possible — experiment trackers, model registries, serving runtimes, vector stores — is covered in the composable AI toolchain article. The GitOps machinery that makes the deployment step in Paths 1 and 2 work — the controller, the manifest conventions, the sync policies — is covered in the CI/CD and GitOps article. A golden path is not a platform feature in isolation — it is the orchestrated composition of several platform capabilities into an end-to-end workflow a product team can actually use.
The discoverability of the paths is equally important. A golden path that is not surfaced in the internal developer portal is a golden path that most teams will not find. The IDP catalog — whether Backstage-based or another portal — should surface the available templates, the version each team is on, and the status of any active deprecations. Discoverability is not a UX concern; it is a platform adoption concern.
References
- [1] Spotify Engineering. “How We Use Golden Paths to Solve Fragmentation in Our Software Ecosystem.” 2020. engineering.atspotify.com
- [2] Skelton, M. & Pais, M. Team Topologies: Organizing Business and Technology Teams for Fast Flow. IT Revolution Press, 2019. teamtopologies.com
- [3] CNCF TAG App Delivery. “Platforms Whitepaper.” 2023. tag-app-delivery.cncf.io
- [4] Backstage.io. “Writing Software Templates” (scaffolder.backstage.io/v1beta3). Backstage documentation. backstage.io/docs
- [5] Argo CD project. “ApplicationSet Cluster Generator.” Argo CD documentation. argo-cd.readthedocs.io
Continue the Journey
AI Platform maturity — five levels and the single move that unlocks each
The next article in the series: five maturity levels and what it takes to advance from a wiki page to fully automated golden paths.
Read articleAI PlatformMulti-tenancy on a shared AI platform — quotas, fairness, and the noisy-neighbour problem
The cross-team quota and fairness model that golden paths are scaffolded against — how resource allocation interacts with path-level defaults.
Read articleAI PlatformWhat an AI Platform Team Actually Owns — and What It Does Not
The ownership model behind platform teams and why reducing cognitive load on product teams is the primary output — not running services.
Read articleAI PlatformFour organisational patterns for shipping ML — and when each one breaks
The organisational structures that golden paths must fit — centralised platform, embedded ML, federated hub-and-spoke, and fully distributed.
Read article