I created a few small lookup tables in shared memory. Got “Unspecified launch failure” until I changed the array size from 7 to 8. I’m confident that there was no out of bounds indexing. Can you confirm this as a bug? (Spent a lot of time before I found that solution). Had no problems with array size of 7 with local arrays.
The shared arrays are declared in a device function called by the kernel not in the kernel itself but that should be fine I suppose?
That should be fine but I’ve seen on some instances where i have multiple calls to the same device function that the shared memory usage goes through the ceiling ( i think that was when unrolling).
An “Unspecified launch failure” would indicate that as a possibility.
Have you checked your ptx output ( -ptxas-options=-v) for how much resources your kernel uses?