Back to Blog
AI & Automation10 min read

OSS Light Instance of Claude Code: A Privacy-Focused AI Coding Assistant

Alex Ozhima
|January 28, 2026

Why This Matters

Claude Code is an excellent AI coding assistant, but there are scenarios where you need more control:

  • Data Privacy: Your code and data never leave your trusted infrastructure
  • Cost Control: Use subscription-based or self-hosted models instead of pay-per-token APIs
  • Air-Gapped Environments: Work in secure environments without external API access
  • Model Flexibility: Use any OpenAI-compatible model (Qwen, Llama, Mistral, etc.)

Architecture Overview

The Chain:

  1. OpenCode - OSS Claude Code alternative with TUI, sends requests using Anthropic Messages API format
  2. LiteLLM - Universal proxy that translates between API formats and routes to any provider
  3. Model Provider - Any OpenAI-compatible endpoint (Ollama, vLLM, OpenRouter, etc.)

Tools & Versions

Setup Guide

1. Install OpenCode

curl -fsSL https://opencode.ai/install | bash

2. Install LiteLLM (via uv)

# No global install needed - run directly with uv
uv run --python 3.12 --with 'litellm[proxy]' litellm --version

3. Configuration Files

.env - Environment variables (keep private, add to .gitignore):

ANTHROPIC_API_KEY=dummy
PROVIDER_API_KEY=your-actual-api-key-here

litellm_config.yaml - LiteLLM proxy configuration:

model_list:
  - model_name: claude-3-7-sonnet-latest
    litellm_params:
      model: Qwen/Qwen3-Coder-480B-A35B-Instruct
      api_base: https://your-provider-api.example.com/v1
      api_key: os.environ/PROVIDER_API_KEY
      custom_llm_provider: openai

  - model_name: claude-3-7-sonnet-20250219
    litellm_params:
      model: Qwen/Qwen3-Coder-480B-A35B-Instruct
      api_base: https://your-provider-api.example.com/v1
      api_key: os.environ/PROVIDER_API_KEY
      custom_llm_provider: openai

litellm_settings:
  set_verbose: false

general_settings:
  enable_jwt_auth: false

router_settings:
  enable_anthropic_messages: true  # Critical! Routes /v1/messages through model_list

opencode.json - OpenCode configuration:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "anthropic": {
      "options": {
        "baseURL": "http://localhost:4000/v1"
      }
    }
  },
  "model": "claude-3-7-sonnet-20250219",
  "tui": {
    "theme": "opencode"
  }
}

4. Running the Stack

Terminal 1 - Start LiteLLM Proxy:

set -a && source .env && set +a
uv run --python 3.12 --with 'litellm[proxy]' litellm --config litellm_config.yaml --port 4000

Terminal 2 - Start OpenCode:

set -a && source .env && set +a
opencode

Why LiteLLM is Necessary

You might wonder: "Why not connect OpenCode directly to the OpenAI-compatible API?"

The problem is API format incompatibility:

ClientAPI FormatEndpoint
OpenCode (Anthropic provider)Anthropic Messages/v1/messages
OpenCode (OpenAI provider)OpenAI Responses/v1/responses
Most custom providersOpenAI Chat/v1/chat/completions

OpenCode v1.1.39 uses:

  • Anthropic provider/v1/messages format
  • OpenAI provider/v1/responses format (new API, not widely supported)

Most OpenAI-compatible providers (Ollama, vLLM, etc.) only support /v1/chat/completions.

LiteLLM bridges this gap by:

  1. Accepting Anthropic Messages API requests on /v1/messages
  2. Translating them to OpenAI Chat Completions format
  3. Forwarding to your chosen provider

The key setting is router_settings.enable_anthropic_messages: true.

Capabilities

OpenCode provides a full coding assistant experience:

  • File reading and editing
  • Code generation and refactoring
  • Bug fixing and debugging
  • Test writing
  • Web search (native support)

Our Model Choice: Qwen

We use Qwen/Qwen3-Coder-480B-A35B-Instruct:

  • 480B parameters (35B active with MoE)
  • Optimized for code generation
  • Excellent instruction following
  • Available via various cloud providers or self-hosted

Success Story: Data Analytics

We successfully used this setup for logistics data analysis:

Task: Analyze 220K+ rows of fuel consumption data across a vehicle fleet

Results:

  • Parsed complex Excel files with Cyrillic column names
  • Generated Python analysis scripts
  • Created visualizations and HTML reports
  • Identified vehicles with abnormal fuel consumption patterns
  • Compared vehicles in repair vs. on-line status

The AI assistant handled the entire workflow: understanding data structure, writing pandas code, generating charts, and producing actionable insights.

Subscription-Based Model APIs

A significant advantage of this approach: unlimited usage pricing.

While major providers (OpenAI, Anthropic) charge per-token, some alternatives offer subscription models:

ProviderModelPricing Model
Cloud providersQwen, Llama, etc.Subscription tiers
Local OllamaAny GGUFHardware cost only
Self-hosted vLLMAny HF modelInfrastructure cost

For heavy coding assistant usage—where each request consumes 10–20K input tokens plus 100–500 output tokens, and you might make thousands of requests per day—subscription models can be significantly more economical.

Self-Hosting Cost Estimates

For organizations wanting full data control, here's what self-hosting requires.

Throughput Requirements

Typical coding assistant workload:

  • Input: 10,000 - 20,000 tokens per request (code context + conversation)
  • Output: 100 - 500 tokens per response
  • Target: Interactive response times (< 5 seconds for short responses)

Hardware Requirements by Model Size

ModelParametersActive ParamsMin VRAMRecommended GPUs
Qwen3-Coder-8B8B8B16 GB1x RTX 4090
Qwen3-Coder-32B32B32B64 GB2x RTX 4090 / 1x A100
Qwen3-235B-A22B235B22B48 GB1x A100-80GB
Qwen3-Coder-480B-A35B480B35B80 GB1x H100 / 2x A100

MoE models (A22B, A35B) = only active parameters loaded during inference

Cloud GPU Rental Costs (2025-2026 estimates)

GPUVRAMOn-Demand (USD/hr)Reserved (USD/month)
RTX 409024 GB0.40-0.80200-400
A100 40GB40 GB1.50-2.50800-1200
A100 80GB80 GB2.50-4.001200-2000
H10080 GB3.50-5.002000-3500
H200141 GB5.00-8.003000-5000

Cost Comparison Example

Scenario: Development team of 5, ~10M tokens/day total

OptionMonthly Cost (USD)Notes
Claude API~11003/M input + 15/M output
OpenAI GPT-4o~9002.50/M input + 10/M output
Subscription API~100-300Unlimited tier
Self-hosted A100~1500+ setup/maintenance
Local RTX 4090 (8B model)~50Electricity only, lower quality

Recommendations

  1. Small teams / Light usage: Use subscription APIs
  2. Medium teams / Privacy focus: Rent cloud GPUs with vLLM
  3. Enterprise / Air-gapped: On-premise H100/H200 cluster
  4. Experimentation: Local Ollama with smaller models (8B-32B)

Software Stack for Self-Hosting

# vLLM server (recommended for production)
pip install vllm
vllm serve Qwen/Qwen3-Coder-32B-Instruct --port 8000

# Or Ollama (easier setup)
ollama run qwen3:32b

Then point LiteLLM to your local endpoint:

model_list:
  - model_name: claude-3-7-sonnet-20250219
    litellm_params:
      model: Qwen/Qwen3-Coder-32B-Instruct
      api_base: http://localhost:8000/v1  # vLLM
      # api_base: http://localhost:11434/v1  # Ollama
      api_key: dummy
      custom_llm_provider: openai

Conclusion

With OpenCode + LiteLLM, you can build a privacy-focused, cost-effective alternative to Claude Code that:

  • Keeps your data within trusted infrastructure
  • Works with any OpenAI-compatible model provider
  • Provides full coding assistant capabilities
  • Scales from local Ollama to enterprise GPU clusters

The key insight is using LiteLLM's enable_anthropic_messages setting to bridge OpenCode's Anthropic API format to standard OpenAI-compatible providers.

Alex Ozhima

Alex Ozhima

Founder & CEO at Katlextech

Ready to Ship Your Product?

Let's discuss how we can implement these strategies for your business