Hi:
I came across a problem when running the cuda demo vectorAddDrv. I know vectorAddDrv using cuda driver API cuLaunchKernel to launch GPU kernel. The threadsPerBlock = 1024, when the element number N = 50000000, the blockDimX = (50000000 + 1023) / 1024 = 48829. there is no error launching cuLaunchKernel. But if I change N = 70000000, the blcokDimX = 68360, there has CUDA_ERROR_INVALID_VALUE error throw from cuLaunchKernel. I know someone said there is 65536 limitation on function parameter. But if I need large element number, how can I work around this limitation? Please help me? Thank you