Code compiled with -m64 fails but with -m32 works just fine Are there any special moments to remembe

I have a kernel that used to work normally for almost 1.5 years, it has been continuously maintained and improved for all this time. Main project can be compiled for x86 and x64 so I use both -m32 and -m64 keys.

After addition of two code snippets into the kernel (added code has been checked and rechecked hundred of times, it is absolutely similar to already present code that used to work fine) it began to fail with “Unspecified launch failure” error. It is very surprising that this kernel (with added code) works without any errors when compiled with -m32 but fails with -m64.

I have nailed the problem down to the condition when single line of code being commented makes the code work without “unspecified launch failures” but this line of code has absolutely nothing with code snippets that has been actually added. In general, commenting one case in a switch statement makes the kernel work - but the statement that needs to be commented is not called even a single time, it is a code that does not work under testing conditions.

All the facts make me think that it is a bug in nvcc, however, the goal is to make the kernel work, not to point that there are bugs in nvcc. The fact that I can’t extract small piece of code that reproduces the issue is also not helpful. I would appreciate any suggestions if someone has already fixed something similar (-m32 that works and -m64 that don’t).

Thanks in advance.

I have a kernel that used to work normally for almost 1.5 years, it has been continuously maintained and improved for all this time. Main project can be compiled for x86 and x64 so I use both -m32 and -m64 keys.

After addition of two code snippets into the kernel (added code has been checked and rechecked hundred of times, it is absolutely similar to already present code that used to work fine) it began to fail with “Unspecified launch failure” error. It is very surprising that this kernel (with added code) works without any errors when compiled with -m32 but fails with -m64.

I have nailed the problem down to the condition when single line of code being commented makes the code work without “unspecified launch failures” but this line of code has absolutely nothing with code snippets that has been actually added. In general, commenting one case in a switch statement makes the kernel work - but the statement that needs to be commented is not called even a single time, it is a code that does not work under testing conditions.

All the facts make me think that it is a bug in nvcc, however, the goal is to make the kernel work, not to point that there are bugs in nvcc. The fact that I can’t extract small piece of code that reproduces the issue is also not helpful. I would appreciate any suggestions if someone has already fixed something similar (-m32 that works and -m64 that don’t).

Thanks in advance.

Bother to debug your kernell with debugger? Check for pointer variables if you have any, its size changed. Maybe you allocated somewhere in host code array of device pointers etc.

Bother to debug your kernell with debugger? Check for pointer variables if you have any, its size changed. Maybe you allocated somewhere in host code array of device pointers etc.

It’s windows and I don’t have the Nsight. All I can do is printf and change the code to see how it affects the workflow.

It’s windows and I don’t have the Nsight. All I can do is printf and change the code to see how it affects the workflow.

What GPU?

What GPU?

I can say Nsight is free and easy to install. You just need to second gpu for screen output. And you can debug on same machine.

I can say Nsight is free and easy to install. You just need to second gpu for screen output. And you can debug on same machine.

Do you use pointers in your code? Some untrivial operations like pointer to int, etc? Pointer arrays in shared memory? does program work in debug compilation?

Do you use pointers in your code? Some untrivial operations like pointer to int, etc? Pointer arrays in shared memory? does program work in debug compilation?

GPU: I have both GTX480 and two GTX295 installed, code works similarly on GF100 and GT200.

All the pointer-related stuff is double-checked - everything is OK and safe.

Nsight: if I have three cards in system (monitor is connected to the only one) I can run and debug the kernel on one of GPUs that are not connected to the monitor ?

GPU: I have both GTX480 and two GTX295 installed, code works similarly on GF100 and GT200.

All the pointer-related stuff is double-checked - everything is OK and safe.

Nsight: if I have three cards in system (monitor is connected to the only one) I can run and debug the kernel on one of GPUs that are not connected to the monitor ?

Yep, one gpu or integrated video for monitor and another gpu for debug. I suggested that some array’s of pointer size is changed and so on. Or something like it. Also compilation without optimization could help.

Yep, one gpu or integrated video for monitor and another gpu for debug. I suggested that some array’s of pointer size is changed and so on. Or something like it. Also compilation without optimization could help.

Heh … Nsight does not support WinXP ???

Heh … Nsight does not support WinXP ???

Yes. But you are working with 64 bit system. Do you?

Yes. But you are working with 64 bit system. Do you?