PCL::PtrStepSz on register memory or not?

Hi guys, my program has many kernels that use this struct to access to GPU mem: http://docs.pointclouds.org/trunk/structpcl_1_1gpu_1_1_ptr_step_sz.html
As my understand, the compiler will not put this type of struct into register memory but I could not find any documentation or ref talking about this.
Do you guys agree with me? If yes, where it stores this struct (L1, L2 or global memory?) if I don’t upload it into shared memory?

I don’t know that to be true. It may depend on how you use the struct and other factors. Things that require run-time computed indexing generally won’t be placed in registers.

It depends on how you use it. The two most commonly used spaces are global memory and local memory. If you start with an array of these struct on the host, then transfer to the device, they will initially be in global memory.

If you define a variable in your device code:

struct PCL_struct A;

Then A will live in local memory. Register usage is not a separate logical space; it is part of the logical local space. Register usage is a compiler optimization and/or a necessary artifact of the GPU load/store architecture. Things like L1, L2 are physical resources, not logical spaces.

very nice, thanks for your fast reply. I’ve created almost PtrStep stuff in host code and then pass into kernels. As what you said, my structs will be stored in global memory. So loading things into shared mem is the best aproach, right?

shared memory may be beneficial when there is data reuse. There are a few other corner cases where it is useful.

I can’t speculate as to whether it is a good idea for what you want to do or not.

yes, in my use case, the data will be reused. But I wonder “few other case”, what do you mean?

https://devblogs.nvidia.com/parallelforall/efficient-matrix-transpose-cuda-cc/

thanks, it’s very helpful