I am perplexed about a couple of issues with CUDA: I wanted to know if this is just my experience or others also have faced similiar issues and if there are any workarounds to this. I have noticed that if there is a error with running the CUDA function, the control quickly returns to the host machine in 10s of microseconds or so. Is there any way to capture this error or to find out more about what happened ? for instance I could be copying elements to an array which I have allocated using cudaMalloc and the number of elements are more than the memory allocated or there might be issues related to register utilization for instance, trying to use more registers than is physically present? Has anyone else faced these issues?
The second issue I have is that sometimes the global memory of the GPUs seems like it retains the data from the previous computation: so if in the previous case, the first time I ran the code, it executed allright, and then if I run the program again this time by trying to use more elements than is allocated, then it passes the control quickly to the CPU but the results of the computation still seem to be correct from last time. I am wondering if this is again something that I have only faced for instance. thanks for the replies.