The six roles on an AI platform — what each does and what each is fluent in
A practical breakdown of the six roles on an AI platform team: job-to-be-done, skill bar, day-one tools, and the anti-patterns that surface when a role is mis-hired.
Articles covering artificial intelligence, machine learning, model serving, MLOps, and modern AI infrastructure.
31 articles in this category
A practical breakdown of the six roles on an AI platform team: job-to-be-done, skill bar, day-one tools, and the anti-patterns that surface when a role is mis-hired.
A precise definition of the AI Platform team as a platform-as-a-product function, its three ownership layers, its accountability boundary, and how it hands off to the MLOps engineer.
Model cards, lineage chains, and audit trails mapped to the four questions every regulator asks: what is it, where did it come from, who approved it, and is what you're serving what you signed?
DRA reached GA in Kubernetes v1.34. Here is why it exists, what the resource.k8s.io API looks like.
How distributed ML training maps onto Kubernetes: PyTorchJob, MPIJob, and RayJob; why gang scheduling is a correctness requirement
Why BLEU and ROUGE fail LLM systems, how LLM-as-judge works, and how to build a deterministic CI gate from probabilistic scores.
A vendor-neutral mapping of EU AI Act obligations and the NIST AI RMF to concrete platform controls, with a risk-tier decision tree and the four operational practices every AI platform team needs.
HAMi virtualises GPU memory and compute at the CUDA-call layer — hard memory caps, soft compute throttling, no MIG-capable silicon required.
Five levels from ad-hoc to self-improving, the evidence pattern at each, and the single highest-leverage move that advances you to the next.
Four serving runtime families, the predictor/transformer/explainer pipeline pattern, routing-layer trade-offs, and the autoscaling signals that hold up under real inference traffic.
Six open problems no production AI platform has solved in 2026: eval confidence intervals, agent runtime standards, GPU provenance, regulator-readable lineage, DRA migration, and multi-cluster quota.
The AI gateway as a platform primitive: key management, spend enforcement, semantic caching, routing/failover, guardrails, and OTel-GenAI observability.
Why a model registry is non-negotiable at platform scale, the three registry patterns, four lifecycle states with explicit gates, and curation policy as code.
What classical MLOps disciplines carry into LLM systems, what is genuinely new, and the three wrongly-assumed-new capabilities that waste re-investment.
Prompt templates and tool definitions determine LLM system behaviour as surely as model weights. Here is the versioning, registry, and rollback architecture that keeps them under control.
How to select MIG profiles for A100, H100, and B200 GPUs, avoid the fragmentation trap that catches every team, and observe per-partition metrics with DCGM.
How to wire metrics, traces, and SLOs across LLM calls, tool invocations, and agent turns using the OpenTelemetry GenAI semantic conventions.
How Kueue, Volcano, and the NVIDIA GPU Operator compose into a three-layer stack — and where each layer fails when something goes wrong.
How to decompose RAG failures into three distinct modes, measure faithfulness and relevance independently, and grade agent trajectories with OTel GenAI conventions.
Batch and online inference have opposite optimisation functions. This article covers resource shape, latency metrics, autoscaling signals, and how to size GPU capacity for a target p95.
MLOps defined — not as a tools list but as a discipline. Four reference framings, three points of real disagreement, and a vocabulary the rest of this series builds on.
A defensible decision tree routing you to time-slicing, MPS, MIG, HAMi, or no sharing — based on GPU SKU, isolation requirements, and workload mix.
Training, fine-tuning, batch inference, online inference, and agent
Four adaptation modes — full fine-tune, LoRA, QLoRA, RLHF/DPO — mapped to GPU memory, data shape, wall-clock, and the RAG-vs-fine-tune decision rule every ML team needs.
Walk the eight-stage ML lifecycle from problem framing to model retirement, and learn what changes at each stage when you move from a notebook to a production system.
The frontier of LLM research: reasoning models, test-time compute, state-space models, multimodality, and the architectural bets shaping the next generation of AI systems.
From the original Transformer to today's frontier models
Why attention replaced recurrence: the problem RNNs couldn't solve, the Attention Is All You Need breakthrough, and the training objective behind every modern LLM — with interactive visualizations.
Master the internals of modern AI with 23 interactive visualizations covering attention, KV-cache, tokenization, MoE, speculative decoding, and production optimizations.
When Antigravity says 'Indexing,' it's building a local RAG pipeline. Explore the semantic code graph architecture with this interactive visualization.
Master the art of running multiple Claude Code instances in parallel using git worktrees, tmux, and automation. An interactive guide to AI-powered parallel development.