I do not know why this happens.
compile : nvcc -o scatter.cu -w
output : ptxas error : Entry function ‘_Z15explore_subsetsP3curS0_PbPiS2_PfP17curandStateXORWOW’ uses too much local data (0x6db8 bytes, 0x4000 max)
code :
#define PROBLEM_SIZE 1000;
typedef struct cur{
float vector[PROBLEM_SIZE];
float cost;
float dist;
bool ref_new;
}ref_set_str;
i use the memcpy.
memcpy(best, &candidate, sizeof(ref_set_str));
If you set the size of the problem_size on the 600 error.
ptxas error : Entry function ‘_Z15explore_subsetsP3curS0_PbPiS2_PfP17curandStateXORWOW’ uses too much local data (0x6db8 bytes, 0x4000 max)
I do not know why that error.
please help me.
Probably you haven’t shown enough code to work things out. GPUs are limited in the amount of local memory per thread they can allocate.
Apparently you have a kernel like this:
explore_subsets(cur*, cur*, bool*, int*, int*, float*, curandStateXORWOW*)
That kernel is using too much local data. The limit on local data usage varies by compute capability:
[url]Programming Guide :: CUDA Toolkit Documentation
Your limit is 0x4000 i.e. 16KB per thread, so I guess you are compiling for a cc1.x device.
If you are actually using a cc2.x or newer device, you may be able to work around this error just by compiling for the proper compute capability. If you are actually using a cc1.x device, then you will need to reduce the local memory usage per thread.
An example of local memory usage would be:
float vector[PROBLEM_SIZE];
if that declaration occurred in ordinary thread code (i.e. not in global memory or shared memory). So each ref_set_str that you declare in thread code will use at least 4Kbytes (and the limit is 16KB for cc1.x).