General7 min read

Free Claude Code Routing: NIM, OpenRouter, DeepSeek, and Local Models

routingopenrouternimdeepseeklocalcostmodels

You Do Not Have to Use Anthropic's API Directly

By default, Claude Code talks to Anthropic's servers. That is the supported path and the one most developers use. But the CLI is built on the Anthropic API, which means you can point it at other endpoints that speak the same protocol — third-party providers who have negotiated API access and offer it at different price points or quota structures.

This is not officially documented by Anthropic, but it works and has become a standard way to run Claude Code on a tighter budget or with infrastructure you already have.

Setting Up Routing via Environment Variables

The routing mechanism is straightforward: set ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY to redirect where Claude Code sends its requests. The CLI does not validate that the requests are going to Anthropic's servers — it just sends them wherever you tell it.

export ANTHROPIC_BASE_URL=https://your-endpoint.com/v1
export ANTHROPIC_API_KEY=your-api-key

Claude Code picks these up on startup. Change them and restart the session for the new routing to take effect.

NVIDIA NIM: Claude on NVIDIA Infrastructure

NVIDIA's NIM (AI Foundation Endpoints) service offers Claude models through NVIDIA's own cloud infrastructure. If you have NVIDIA GPU instances or NVIDIA Cloud credits, NIM can be cost-effective — you use infrastructure you already pay for rather than adding a separate Anthropic bill.

export ANTHROPIC_BASE_URL=https://ai.api.nvidia.com/v1
export ANTHROPIC_API_KEY=your-nvidia-api-key

NIM is an enterprise product. Pricing is negotiated based on your NVIDIA relationship, so it is not a public marketplace where you can just sign up and go. If you are already a NVIDIA customer, talk to your account team about NIM access. Throughput tends to be good since it runs on NVIDIA's own infrastructure.

OpenRouter: The Aggregation Layer

OpenRouter is the easiest alternative to set up because it is a public API with a free tier. It aggregates access to multiple model providers behind a single API interface — you get Claude models alongside models from Google, Meta, Mistral, and others, all through one key and one endpoint.

export ANTHROPIC_BASE_URL=https://openrouter.ai/api/v1
export ANTHROPIC_API_KEY=your-openrouter-key

The advantage for Claude Code specifically: one config change, and all your Claude Code traffic routes through OpenRouter instead of Anthropic. OpenRouter's free tier gives you a limited monthly allocation — enough to experiment and test the setup. Beyond that, you pay per token at rates that are typically competitive with direct API costs.

The trade-off is latency. Every request goes through OpenRouter's routing layer before reaching the actual model provider. For short interactions, the added delay is minor. For long sessions with many back-and-forth exchanges, it adds up.

DeepSeek: The Low-Cost Option

DeepSeek offers a Claude-compatible API endpoint at the lowest price point among the alternatives. The setup is the same pattern:

export ANTHROPIC_BASE_URL=https://api.deepseek.com/v1
export ANTHROPIC_API_KEY=your-deepseek-key

DeepSeek works for basic Claude Code usage — tool calls, multi-step reasoning, code generation. The limitation is feature parity. The Anthropic API has evolved over time with new capabilities. Third-party implementations like DeepSeek's tend to lag, particularly around newer tool use features and model-specific optimizations. Check what features you rely on before committing to DeepSeek as a primary provider.

Local Models: llama.cpp and Ollama

For fully offline use, you can route Claude Code to a local model running via llama.cpp or Ollama. This requires a machine with enough RAM or VRAM to run the model, and the setup is more involved.

export ANTHROPIC_BASE_URL=http://localhost:11434/v1
export ANTHROPIC_API_KEY=ollama

For Claude Code specifically, this is a compromise. The CLI is designed around Claude models — their context lengths, their tool use behavior, their reasoning style. A local model like Llama will not replicate that experience. You can get basic tool use working with an Ollama server that exposes a Claude-compatible endpoint, but the quality of reasoning and code generation will differ from what Claude produces.

The use case for local routing is experimentation and development environments where network access is restricted. Do not expect the same output quality.

What You Trade Off

Routing through third parties or local infrastructure means you are no longer on Anthropic's managed infrastructure. The trade-offs are consistent: rate limiting becomes the provider's responsibility rather than Anthropic's, latency may increase, and feature availability depends on the provider's implementation quality. For personal projects and experimentation, these trade-offs are usually fine. For production team deployments where reliability is critical, the direct Anthropic API is the safer choice.

Get Started with Claude Code

Start building with Claude Code today. Free to download, powerful enough for production.