We have an application that spawns several threads that each do some computation on a S870. At the start of the thread, cudaSetDevice is used to set the cuda context. The work performed during the thread execution appears correct, and no error is returned after calling cudaThreadSynchronize. However, when the thread terminates, the program seg faults. I finally figured out that cudaThreadExit is where the seg fault is occurring, but I have no idea why. If we set the application up to only use 1 device, the program executes as expected with no seg faults, its only when two or more devices are used that the seg fault is happening.
Has any one seen this type of behavior before, or have any hints on what could seg fault cudaThreadExit? We’re using boost on ubuntu to do the threading, and I’m not convinced that we’re doing everything correctly. Thanks.