AI development pipeline - Jacob P Evans

The goal: file a GitHub Issue, grab coffee, come back to a PR that’s been implemented, tested, and reviewed by multiple AI models — just waiting for a thumbs up. Not fully there yet, but close enough to be dangerous. Humans decide what to build. AI agents handle the how. Automation runs the boring parts. A human gives the final sign-off. Claude, Gemini, Copilot, and local MLX models each do what they’re best at — the right model for the right job instead of throwing everything at one.

The pipeline

Green nodes are human, coral are AI, ink are automation. The role is the node colour; the chain is the timeline — no zigzag because the diagram never asks the arrows to leave a column.

The model-routing philosophy

Every model has a sweet spot:

Claude — best at multi-file refactors, deep reasoning, agentic loops. The default for non-trivial implementation work.
Gemini — great for second opinions, code review, broad context understanding.
GitHub Copilot — fastest for line-level completions inside the editor. Cheap for high-volume routine work.
Local MLX — Apple Silicon native inference for typo fixes, quick edits, and “I don’t want to burn cloud tokens on this” tasks.

The routing is opinionated, not magic: clear rules in ~/CLAUDE.md and AGENTS.md say which model to use when.

Local AI gateway (Bifrost)

Bifrost is the OpenAI-compatible HTTP gateway that sits between every AI tool on the workstation and whichever provider eventually answers the call. It exposes http://localhost:30080/v1/chat/completions and fans out to OpenAI, Gemini, OpenRouter, and the local MLX server based on the task class.

Never hardcode model identifiers in committed config. Models change weekly; identifiers rot. Tools resolve task classes (Research, Coding, Review, Pre-commit) to a current model at call time via listmodels.
Local MLX models carry an mlx-local/ prefix when called through Bifrost (it expects provider/model format). Calling the vllm-mlx server directly on port 11434 uses the bare HuggingFace model ID — no prefix.
Cloud models go in unprefixed. Bifrost handles routing; do not add a provider prefix yourself.

When localOnlyMode is enabled (or the --local flag is passed), every task routes to the MLX inference server on port 11434 and no cloud API calls happen. Verify the LaunchAgent is running before invoking: launchctl list | grep vllm-mlx.

PAL MCP

PAL MCP is the multi-model coordination layer on top of Bifrost. Two tools matter:

Tool	Purpose
`clink`	Multi-model parallel calls — research and exploration
`consensus`	Multi-model agreement — critical decisions

Every other PAL tool has a native Claude Code equivalent; for single-model calls, use Bifrost directly.

Priority order

Anthropic official — Claude Code plugins, skills, patterns
Bifrost AI gateway — multi-provider routing at localhost:30080
PAL MCP — only for clink and consensus
Personal or custom — only when no alternative exists

Repos that power this pipeline

ai-assistant-instructions

Universal AI configuration layer — rules, permissions, workflows, agents.

claude-code-plugins

Commands, skills, hooks, agents for Claude Code.

nix-ai

Nix package and config layer for every AI coding tool.

claude-code-routines

Scheduled remote-agent routines on Claude.ai.

ai-workflows

Reusable GitHub Copilot agentic workflows.

raycast-smart-issue

Raycast extension for AI-drafted GitHub issues via local MLX.

See AI Development · Overview for what each one does in detail. For the actual triggers, callers, and skill mechanics that drive this pipeline end-to-end, see Automation · Overview.

Observability layer

Every interaction in this pipeline emits OpenTelemetry. The data flows through Cribl Edge, Cribl Stream, and Splunk; purpose-built Splunk apps visualize it. If an AI agent touched code, there’s a trace.

Documentation Index

​The pipeline

​The model-routing philosophy

​Local AI gateway (Bifrost)

​PAL MCP

​Priority order

​Repos that power this pipeline