Bleeding Llama bug exposes Ollama process memory

An out-of-bounds read in Ollama’s GGUF model loader (CVE-2026-7482, CVSS 9.1) allows unauthenticated attackers to upload crafted GGUF files to leak process memory via /api/push.

Security researchers have identified a critical vulnerability in the open-source Ollama framework that can expose an instance’s process memory. Tracked as CVE-2026-7482 and nicknamed “Bleeding Llama,” the flaw scores 9.1 on the CVSS scale and affects Ollama releases before 0.17.1. Researchers estimate the issue likely impacts hundreds of thousands of internet-accessible servers running Ollama.

The bug is an out-of-bounds read in the GGUF model loader. When a GGUF file declares tensor offsets and sizes larger than the file’s actual length, the loader reads past the allocated heap buffer during quantization. The problem is tied to a function named WriteTo() that uses Go’s unsafe operations, allowing code to bypass Go’s memory-safety checks. An attacker can craft a GGUF with an inflated tensor shape, POST it to an exposed /api/create endpoint to trigger the read, then use /api/push to send the resulting model artifact — which can include leaked heap data — to an external registry.

Exposed data can include environment variables, API keys, system prompts, and user conversation content stored in process memory. Researchers described the exploitation sequence as uploading a malicious GGUF file to a reachable server, initiating model creation via /api/create to cause the out-of-bounds read, and pushing the resulting artifact with /api/push to exfiltrate memory contents. Dor Attias, a Cyera security researcher, warned that an attacker could recover API keys, proprietary code, customer contracts, and other sensitive materials from AI inference memory.

Ollama runs locally and has a large user base on its public repository. The project’s REST API does not include built-in authentication, so internet-facing instances are especially at risk. Project maintainers released fixes in version 0.17.1; users are advised to upgrade affected installations. Operators should also limit network exposure, place instances behind firewalls, and use an authentication proxy or API gateway in front of Ollama servers. Teams should audit running instances for external access, review environment variables, and rotate secrets that may have been stored in process memory.

Separately, researchers at Striga disclosed two Windows updater vulnerabilities that can be chained to achieve persistent code execution. The issues are CVE-2026-42248, a missing signature verification that allows unsigned update binaries to be installed, and CVE-2026-42249, a path traversal vulnerability that uses unsanitized HTTP response headers to build installer staging paths. A vulnerable Windows client that polls an attacker-controlled update server can receive an executable that is written into the Windows Startup folder and run at each login.

The Windows flaws affect certain Ollama desktop client versions. Researchers advised disabling automatic updates and removing any Ollama shortcut from the Startup folder to prevent silent on-login execution until patches are available. Bartłomiej Dmitruk of Striga noted the chain can produce persistent, silent execution at the level of the user running Ollama; removing a dropped binary from the Startup folder stops that persistence, while the updater issues remain.

Administrators should verify their Ollama versions, apply vendor patches where available, disable automatic updates on affected Windows clients if fixes are not yet issued, remove startup shortcuts, audit logs and network traffic for unexpected pushes to external registries, and rotate any secrets that might have been exposed.

Articles by this author