Accelerated Memory Scan like INTEL using CUDA framework


I recently came across an article wherein Intel had come up with Accelerated Memory Scan. Is there any way I can do this using CUDA. If then, how so? Please guide.

This approach only works efficiently in scenarios where the CPU and GPU share the same host memory.

For nVidia products this is implemented in some embedded solutions like in the Tegra/Jetson product line.

I am not sure if there is any low end nVidia product where the graphics chip interfaces with the host memory instead of a dedicated VRAM.


So there isn’t any way to do it for GEforce graphics card?

Discrete nVidia GPUs (with dedicated VRAM) can access host memory only if it’s page locked and either mapped into a unified memory space via cudaMallocManaged() or explicitly mapped into GPU memory space e.g. via cudaHostAllocMapped()

Both methods don’t allow you to map the complete physical host memory to the GPU - and in particular not memory that is already in use by other processes or the kernel. So unfortunately the GPU can’t be your threat detector because this would have to work around the safety mechanism of per process memory isolation.

Also this type of GPU access to host memory incurs the overhead of a required PCIe bus transfer, which is severely bandwidth limited and likely much slower than the CPU accessing that memory.