Memory issue after latest kernel update

I was using gemma 2b it on my jetson nano orin with no issues until a latest update, but started getting memory errors ever since.
I tried fiddling with it since i had a bare metal installation, but couldn’t get it working .

Today i tried the jetson containers and the problem still remains, so i figured it must be the kernel update? is this s known issue? how do i fix it?

*** Please note that this reply is generated by LLM automatically ***

Based on the information provided, it appears that you are experiencing memory errors on your Jetson Orin Nano device after a recent update. You have tried using a bare metal installation and Jetson containers, but the issue persists.

Firstly, it’s essential to note that the Jetson Orin Nano is a relatively new device, and it’s not uncommon to encounter issues with new hardware and software updates.

Regarding the kernel update, it’s possible that the update has introduced a bug or incompatibility that’s causing the memory errors. However, without more information about the specific error messages you’re seeing, it’s challenging to provide a definitive answer.

That being said, here are a few potential steps you can take to troubleshoot and potentially resolve the issue:

  1. Check the kernel version: Verify that you are running the latest kernel version. You can do this by running the command uname -r in the terminal. If you’re not running the latest version, you can try updating the kernel to see if it resolves the issue.
  2. Check for firmware updates: Ensure that your device’s firmware is up-to-date. You can check for firmware updates using the sudo apt-get update and sudo apt-get upgrade commands.
  3. Disable any recently installed modules: If you’ve recently installed any new modules or drivers, try disabling them to see if they’re causing the issue.
  4. Try a different kernel configuration: If you’re using a custom kernel configuration, try switching to a default configuration to see if it resolves the issue.
  5. Check for memory-related issues: Run the command dmesg | grep -i memory to check for any memory-related error messages. This can help you identify if the issue is related to memory allocation or other memory-related problems.
  6. Try a different boot mode: If you’re booting in a specific mode (e.g., recovery mode), try booting in a different mode to see if it resolves the issue.

In terms of known issues, I couldn’t find any specific information about memory errors related to the Jetson Orin Nano and kernel updates. However, it’s possible that the issue is related to a specific module or driver that’s not compatible with the latest kernel version.

To further troubleshoot the issue, I would recommend checking the following resources:

  • The official NVIDIA Jetson Orin Nano documentation and forums
  • The Linux kernel documentation and mailing lists
  • Online forums and communities dedicated to Linux and embedded systems

If you’re still unable to resolve the issue, you may want to consider reaching out to NVIDIA support or seeking help from a Linux expert.

In comparison to other Jetson devices, the Orin Nano is a relatively new device, and it’s possible that the issue is specific to this device or its configuration. However, the troubleshooting steps I provided above are general and can be applied to other Jetson devices as well.

In terms of the Jetson AGX Orin, it’s a more powerful device than the Orin Nano, and it’s possible that the issue is not related to the device itself but rather to the kernel or software configuration. However, without more information, it’s challenging to provide a definitive answer.

I hope this helps, and please let me know if you have any further questions or if there’s anything else I can help you with.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

so the update that broke it was this:

Start-Date: 2025-10-31  15:58:54
Commandline: apt upgrade -y
Requested-By: frederico (1000)
Upgrade: nvidia-l4t-weston:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), containerd.io:arm64 (1.7.28-0\~ubuntu.22.04\~jammy, 1.7.28-1\~ubuntu.22.04\~jammy), docker-compose-plugin:arm64 (2.39.4-0\~ubuntu.22.04\~jammy, 2.40.3-1\~ubuntu.22.04\~jammy), nvidia-l4t-vulkan-sc-samples:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), bind9-host:arm64 (1:9.18.39-0ubuntu0.22.04.1, 1:9.18.39-0ubuntu0.22.04.2), docker-ce-cli:arm64 (5:28.5.0-1\~ubuntu.22.04\~jammy, 5:28.5.1-1\~ubuntu.22.04\~jammy), nvidia-l4t-vulkan-sc-sdk:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), libctf-nobfd0:arm64 (2.38-4ubuntu2.8, 2.38-4ubuntu2.10), libssh-4:arm64 (0.9.6-2ubuntu0.22.04.4, 0.9.6-2ubuntu0.22.04.5), nvidia-l4t-firmware:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), wpasupplicant:arm64 (2:2.10-6ubuntu2.2, 2:2.10-6ubuntu2.3), nvidia-l4t-kernel-oot-headers:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033), libxml2-dev:arm64 (2.9.13+dfsg-1ubuntu0.9, 2.9.13+dfsg-1ubuntu0.10), nvidia-l4t-oem-config:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-jetson-multimedia-api:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), xserver-xorg-core:arm64 (2:21.1.4-2ubuntu1.7\~22.04.15, 2:21.1.4-2ubuntu1.7\~22.04.16), nvidia-l4t-libwayland-egl1:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-wayland:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), libnvidia-container1:arm64 (1.17.8-1, 1.18.0-1), nvidia-l4t-kernel-oot-modules:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033), nvidia-l4t-kernel:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033), libbinutils:arm64 (2.38-4ubuntu2.8, 2.38-4ubuntu2.10), nvidia-l4t-graphics-demos:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), gir1.2-javascriptcoregtk-4.0:arm64 (2.48.5-0ubuntu0.22.04.1, 2.48.7-0ubuntu0.22.04.2), nvidia-l4t-3d-core:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-nvpmodel:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-cuda-utils:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), gir1.2-webkit2-4.0:arm64 (2.48.5-0ubuntu0.22.04.1, 2.48.7-0ubuntu0.22.04.2), libwbclient0:arm64 (2:4.15.13+dfsg-0ubuntu1.8, 2:4.15.13+dfsg-0ubuntu1.10), libssh-gcrypt-4:arm64 (0.9.6-2ubuntu0.22.04.4, 0.9.6-2ubuntu0.22.04.5), nvidia-l4t-tools:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), libsmbclient:arm64 (2:4.15.13+dfsg-0ubuntu1.8, 2:4.15.13+dfsg-0ubuntu1.10), nvidia-l4t-core:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), libxml2:arm64 (2.9.13+dfsg-1ubuntu0.9, 2.9.13+dfsg-1ubuntu0.10), libcudnn9-samples:arm64 (9.13.1.26-1, 9.14.0.64-1), nvidia-l4t-kernel-dtbs:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033), nvidia-l4t-optee:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-cuda:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), poppler-utils:arm64 (22.02.0-2ubuntu0.10, 22.02.0-2ubuntu0.11), nvidia-l4t-dla-compiler:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-nvml:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), docker-ce:arm64 (5:28.5.0-1\~ubuntu.22.04\~jammy, 5:28.5.1-1\~ubuntu.22.04\~jammy), nvidia-l4t-openwfd:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-nvfancontrol:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), xserver-xorg-legacy:arm64 (2:21.1.4-2ubuntu1.7\~22.04.15, 2:21.1.4-2ubuntu1.7\~22.04.16), nvidia-l4t-libwayland-client0:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-nvsci:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-init:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-gbm:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-jetsonpower-gui-tools:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-vulkan-sc:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-display-kernel:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033), libpoppler-glib8:arm64 (22.02.0-2ubuntu0.10, 22.02.0-2ubuntu0.11), nvidia-l4t-configs:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-pva:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-multimedia:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), xserver-common:arm64 (2:21.1.4-2ubuntu1.7\~22.04.15, 2:21.1.4-2ubuntu1.7\~22.04.16), xwayland:arm64 (2:22.1.1-1ubuntu0.19, 2:22.1.1-1ubuntu0.20), nvidia-l4t-multimedia-utils:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-x11:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), docker-ce-rootless-extras:arm64 (5:28.5.0-1\~ubuntu.22.04\~jammy, 5:28.5.1-1\~ubuntu.22.04\~jammy), nvidia-l4t-apt-source:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), libctf0:arm64 (2.38-4ubuntu2.8, 2.38-4ubuntu2.10), libpoppler118:arm64 (22.02.0-2ubuntu0.10, 22.02.0-2ubuntu0.11), nvidia-l4t-kernel-headers:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033), libcudnn9-cuda-12:arm64 (9.13.1.26-1, 9.14.0.64-1), nvidia-l4t-bootloader:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), snapd:arm64 (2.68.5+ubuntu22.04.1, 2.71+ubuntu22.04), nvidia-l4t-gstreamer:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), binutils-aarch64-linux-gnu:arm64 (2.38-4ubuntu2.8, 2.38-4ubuntu2.10), libjavascriptcoregtk-4.0-18:arm64 (2.48.5-0ubuntu0.22.04.1, 2.48.7-0ubuntu0.22.04.2), distro-info-data:arm64 (0.52ubuntu0.9, 0.52ubuntu0.11), linux-firmware:arm64 (20220329.git681281e4-0ubuntu3.39, 20220329.git681281e4-0ubuntu3.40), binutils-common:arm64 (2.38-4ubuntu2.8, 2.38-4ubuntu2.10), libnvidia-container-tools:arm64 (1.17.8-1, 1.18.0-1), nvidia-l4t-camera:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), xserver-xephyr:arm64 (2:21.1.4-2ubuntu1.7\~22.04.15, 2:21.1.4-2ubuntu1.7\~22.04.16), nvidia-l4t-libwayland-cursor0:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), bind9-libs:arm64 (1:9.18.39-0ubuntu0.22.04.1, 1:9.18.39-0ubuntu0.22.04.2), nvidia-l4t-nvpmodel-gui-tools:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), samba-libs:arm64 (2:4.15.13+dfsg-0ubuntu1.8, 2:4.15.13+dfsg-0ubuntu1.10), libcudnn9-dev-cuda-12:arm64 (9.13.1.26-1, 9.14.0.64-1), nvidia-l4t-initrd:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-container-toolkit:arm64 (1.17.8-1, 1.18.0-1), nvidia-container-toolkit-base:arm64 (1.17.8-1, 1.18.0-1), libwebkit2gtk-4.0-37:arm64 (2.48.5-0ubuntu0.22.04.1, 2.48.7-0ubuntu0.22.04.2), libcudnn9-static-cuda-12:arm64 (9.13.1.26-1, 9.14.0.64-1), nvidia-l4t-libwayland-server0:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-xusb-firmware:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), nvidia-l4t-jetson-io:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), libcudnn9-headers-cuda-12:arm64 (9.13.1.26-1, 9.14.0.64-1), binutils:arm64 (2.38-4ubuntu2.8, 2.38-4ubuntu2.10), nvidia-l4t-vulkan-sc-dev:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033), linux-libc-dev:arm64 (5.15.0-157.167, 5.15.0-161.171)
End-Date: 2025-10-31  16:03:41

$ uname -r

5.15.148-tegra

$ free

               total        used        free      shared  buff/cache   available
Mem:         7802808      374480     6835356       26864      592972     7191996
Swap:        3901392           0     3901392

Is it possible to boot the previous kernel, like grub allows you to on the jetson? that would be a simple way to check

Hi,

This is a known issue, and our internal team is working on the problem.
Will share more information if we get any updates.

You can also find more details in the topic below:

Thanks.

1 Like

Hi @AastaLLL ,

So I tried a bunch of stuff before deciding to give up on this. What I noticed is that ( htop at the same ) is that there is a bunch of cache being used and when it fills up ram it starts to complain.

I think it is the same problem as the thread, this is what i get:


ξ‚’ frederico-jetson ξ‚° frederico ξ‚° (e) gemma-env ξ‚° ~ ξ‚° local-gemma --model 2b --device cuda

Loading model with the following characteristics:
- Model name: google/gemma-2-2b-it
- Assistant model name: None
- Device: cuda
- Default data type: torch.bfloat16
- Optimization preset: exact
- Generation arguments: {'do_sample': True, 'temperature': 0.7}
- Base prompt: None

Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:09<00:00,  4.85s/it]
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
Traceback (most recent call last):
  File "/home/frederico/github/gemma-env/bin/local-gemma", line 8, in <module>
    sys.exit(main())
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/local_gemma/cli.py", line 210, in main
    model = LocalGemma2ForCausalLM.from_pretrained(
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/local_gemma/modeling_local_gemma_2.py", line 189, in from_pretrained
    model.to(device, dtype=preset_kwargs["torch_dtype"])
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3156, in to
    return super().to(*args, **kwargs)
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1318, in to
    return self._apply(convert)
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 897, in _apply
    module._apply(fn)
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 897, in _apply
    module._apply(fn)
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 897, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 924, in _apply
    param_applied = fn(param)
  File "/home/frederico/github/gemma-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1304, in convert
    return t.to(
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch.

I tried :

echo 3 | sudo tee /proc/sys/vm/drop_caches

while loading the model to see if it would get rid of the cache, but i guess it is from the same process, so it doesnt.

I also tried loading with this:

export PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:1500"

and other settings like :

export CUDA_VISIBLE_DEVICES=0
export PYTORCH_CUDA_ALLOC_CONF="backend:cudaMalloc"

(i think you need to write backend:cudaMallocAsync though, but it also doesnt look like it changes anything)

From what i understand what the kernel is trying to do is load the memory twice. Like it doesnt know that it’s the same memory that is shared between gpu and cpu, so it doesnt make any difference and you dont need to load it first to one then the other. But I know nothing about this :D

Anyway, I guess if the solution RN is to reflash the SD card with the old kernel, which will make me lose all the other things i installed successfully. So I will wait until you guys have a better solution.

Thank you!

–
frederico.

1 Like

Hi,

Thanks for sharing the testing.
The issue is related to a recent security fix which you can find more information in the topic shared above.

Thanka.