Using thrust with memory-intensive structures

Hello! I have a vector of big structures (approximately 20Kb) and I want to use thrust operations on it, but I get Formal parameter space overflowed error. Is it possible to use thrust with structures of this size? Are there any techniques to make it possible? Sadly, I can’t make structures any smaller.

Unless thrust has specific limits, you should be OK up to 32kB, as long as you are on at least Cuda 12.1 and Volta architecture:

2 Likes

Can thrust access constant memory?

I believe it can. But I need to modify the data that I pass to thrust.

based on the blog, If you are running on a cc7.0 or higher GPU, and CUDA 12.1 or newer, then compile for that arch:

# cat t331.cu
#include <thrust/device_vector.h>


struct stuff{
  int d[1024*20/4];
};

typedef struct stuff s;
int main(){

  thrust::device_vector<s> a(256);
}
# nvcc -o t331 t331.cu
/usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/agent_launcher.h(66): Error: Formal parameter space overflowed (20496 bytes required, max 4096 bytes allowed) in function _ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_20__uninitialized_fill7functorINS_10device_ptrI5stuffEES8_EEmEESA_mEEvT0_T1_

# nvcc -o t331 t331.cu -arch=sm_89
#

(CUDA 12.2)

For all the usual AoS/SoA reasons/discussions, this might have performance ramifications, at least compared to something like thrust::device_vector<int> ... or similar.

1 Like

So, I just need to split the data that thrust needs to access from the other data?

Can I make thrust pass objects by reference and not by value? That way there will be no need to copy large amount of data.

I don’t know what that means.

CUDA in general does not work very well with pass-by-reference to a kernel, unless the data reference is to managed memory or pinned memory.

I don’t think you’ll be able to overcome the “constructor” issue (…uninitialized_fill_functor…) I pointed out with trivial adjustments to your code, but without a clear, compilable example of what you are doing, I couldn’t really offer advice.

1 Like

Basically, what I need to do is a hashmap with integer keys and large structs as values. So I think that your solution with SoA is good.