AI & Machine Learning

Articles covering artificial intelligence, machine learning, model serving, MLOps, and modern AI infrastructure.

31 articles in this category

AI Platform Engineering & MLOpsAI & Machine Learning

The six roles on an AI platform — what each does and what each is fluent in

A practical breakdown of the six roles on an AI platform team: job-to-be-done, skill bar, day-one tools, and the anti-patterns that surface when a role is mis-hired.

asleekgeek

12 min read

AI & Machine LearningDevOps

What an AI Platform Team Actually Owns — and What It Does Not

A precise definition of the AI Platform team as a platform-as-a-product function, its three ownership layers, its accountability boundary, and how it hands off to the MLOps engineer.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Governance and lineage — answering the four questions a regulator will ask

Model cards, lineage chains, and audit trails mapped to the four questions every regulator asks: what is it, where did it come from, who approved it, and is what you're serving what you signed?

asleekgeek

10 min read

AI Platform Engineering & MLOpsCloud Native

Dynamic Resource Allocation: What Changes When Devices Become First-Class

DRA reached GA in Kubernetes v1.34. Here is why it exists, what the resource.k8s.io API looks like.

asleekgeek

9 min read

AI Platform Engineering & MLOpsCloud Native

Training workloads on Kubernetes — operators, gang scheduling, and checkpointing

How distributed ML training maps onto Kubernetes: PyTorchJob, MPIJob, and RayJob; why gang scheduling is a correctness requirement

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Eval as a Test Suite: LLM-as-Judge in CI Without Flaky Merges

Why BLEU and ROUGE fail LLM systems, how LLM-as-judge works, and how to build a deterministic CI gate from probabilistic scores.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Responsible AI and Governance — Turning the EU AI Act into a Platform Checklist

A vendor-neutral mapping of EU AI Act obligations and the NIST AI RMF to concrete platform controls, with a risk-tier decision tree and the four operational practices every AI platform team needs.

asleekgeek

12 min read

AI Platform Engineering & MLOpsAI & Machine Learning

HAMi — fractional GPU on the GPUs you actually have

HAMi virtualises GPU memory and compute at the CUDA-call layer — hard memory caps, soft compute throttling, no MIG-capable silicon required.

asleekgeek

10 min read

AI Platform Engineering & MLOpsAI & Machine Learning

AI Platform maturity — five levels and the single move that unlocks each

Five levels from ad-hoc to self-improving, the evidence pattern at each, and the single highest-leverage move that advances you to the next.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Serving patterns for production ML — runtimes, routing, and the autoscaling signals that matter

Four serving runtime families, the predictor/transformer/explainer pipeline pattern, routing-layer trade-offs, and the autoscaling signals that hold up under real inference traffic.

asleekgeek

10 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Where this goes next — the open problems on a 2026 AI platform

Six open problems no production AI platform has solved in 2026: eval confidence intervals, agent runtime standards, GPU provenance, regulator-readable lineage, DRA migration, and multi-cluster quota.

asleekgeek

9 min read

AI Platform Engineering & MLOpsCloud Native

The AI gateway — what it is, when you need one, and where it sits

The AI gateway as a platform primitive: key management, spend enforcement, semantic caching, routing/failover, guardrails, and OTel-GenAI observability.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Model Registry as the Spine — Repository Patterns, Lifecycle States, and Curation Policy

Why a model registry is non-negotiable at platform scale, the three registry patterns, four lifecycle states with explicit gates, and curation policy as code.

asleekgeek

10 min read

AI Platform Engineering & MLOpsAI & Machine Learning

MLOps vs LLMOps — the 60 / 40 seam

What classical MLOps disciplines carry into LLM systems, what is genuinely new, and the three wrongly-assumed-new capabilities that waste re-investment.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Prompts and tools are code — versioning, registries, and the rollback story

Prompt templates and tool definitions determine LLM system behaviour as surely as model weights. Here is the versioning, registry, and rollback architecture that keeps them under control.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

MIG configuration — profile selection, fragmentation, and the partition contract

How to select MIG profiles for A100, H100, and B200 GPUs, avoid the fragmentation trap that catches every team, and observe per-partition metrics with DCGM.

asleekgeek

9 min read

AI Platform Engineering & MLOpsCloud Native

Observability for GenAI: Prometheus, Grafana, Tempo, and the OpenTelemetry GenAI Conventions

How to wire metrics, traces, and SLOs across LLM calls, tool invocations, and agent turns using the OpenTelemetry GenAI semantic conventions.

asleekgeek

9 min read

AI Platform Engineering & MLOpsCloud Native

The GPU scheduling stack: queue admission, gang scheduling, and hardware abstraction in three layers

How Kueue, Volcano, and the NVIDIA GPU Operator compose into a three-layer stack — and where each layer fails when something goes wrong.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

RAG and agent observability: scoring retrieval and generation separately, then the trajectory

How to decompose RAG failures into three distinct modes, measure faithfulness and relevance independently, and grade agent trajectories with OTel GenAI conventions.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Inference workloads — batch vs online, latency budgets, and where the serving runtime sits

Batch and online inference have opposite optimisation functions. This article covers resource shape, latency metrics, autoscaling signals, and how to size GPU capacity for a target p95.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

What is MLOps in 2026? A defensible working definition

MLOps defined — not as a tools list but as a discipline. Four reference framings, three points of real disagreement, and a vocabulary the rest of this series builds on.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Picking a GPU-sharing mechanism — a decision tree

A defensible decision tree routing you to time-slicing, MPS, MIG, HAMi, or no sharing — based on GPU SKU, isolation requirements, and workload mix.

asleekgeek

9 min read

AI Platform Engineering & MLOpsAI & Machine Learning

The five canonical AI/ML workload shapes

Training, fine-tuning, batch inference, online inference, and agent

asleekgeek

10 min read

AI Platform Engineering & MLOpsAI & Machine Learning

Fine-tuning, LoRA, QLoRA, RLHF/DPO — picking the adaptation that fits your budget

Four adaptation modes — full fine-tune, LoRA, QLoRA, RLHF/DPO — mapped to GPU memory, data shape, wall-clock, and the RAG-vs-fine-tune decision rule every ML team needs.

asleekgeek

14 min read

AI Platform Engineering & MLOpsAI & Machine Learning

The ML lifecycle, end to end, in production

Walk the eight-stage ML lifecycle from problem framing to model retirement, and learn what changes at each stage when you move from a notebook to a production system.

asleekgeek

10 min read

AI & Machine Learning

LLM Deep Dive Part III: What's Emerging

The frontier of LLM research: reasoning models, test-time compute, state-space models, multimodality, and the architectural bets shaping the next generation of AI systems.

asleekgeek

30 min read

AI & Machine Learning

LLM Deep Dive Part II: From Transformer to Modern LLMs

From the original Transformer to today's frontier models

asleekgeek

40 min read

AI & Machine Learning

LLM Deep Dive Part I: Setting the Stage

Why attention replaced recurrence: the problem RNNs couldn't solve, the Attention Is All You Need breakthrough, and the training objective behind every modern LLM — with interactive visualizations.

asleekgeek

25 min read

AI & Machine LearningArtificial Inteligence

Transformer & LLM Architecture Deep Dive

Master the internals of modern AI with 23 interactive visualizations covering attention, KV-cache, tokenization, MoE, speculative decoding, and production optimizations.

asleekgeek

AI & Machine Learning

Understanding Antigravity's Indexing Process: A Deep Dive

When Antigravity says 'Indexing,' it's building a local RAG pipeline. Explore the semantic code graph architecture with this interactive visualization.

asleekgeek

12 min read

AI & Machine LearningArtificial Inteligence

The Parallel Claude Workflow: Running Multiple AI Agents Like It's 1995 Multi-Tasking

Master the art of running multiple Claude Code instances in parallel using git worktrees, tmux, and automation. An interactive guide to AI-powered parallel development.

asleekgeek

20 min read

Back to Categories