Possible NVCC compiler bug Two 'breaks' in different loops :(

CUDA_user · December 19, 2007, 4:03pm

Hi,

I am new here.

I discovered this quite unpleasant nvcc compiler bug not long ago.

A kernel with two “break” operations in different loops works incorrectly.

Sometimes my system hangs (only the reset button can help), sometimes the kernel terminates as expected but the computation results are incorrect.

I am using:

SDK: NVIDIA_CUDA_SDK_1.0

TOOLKIT: NVIDIA_CUDA_Toolkit_1.0_suse_10.2_x86_64

DRIVER: NVIDIA-Linux-x86_64-100.14.11-pkg2

OS: openSUSE 10.2 (X86-64)

Video card: GeForce 8800GTS with 640 Mb RAM

Hardware: 2048 MB RAM, Intel Q6600 2.4 GHz

On the other system the execution of a kernel with two breaks in different loops leads to reboot !

Recompiling the source code under CUDA 1.1 do not solve the problem.

Second system:

SDK: NVIDIA_CUDA_SDK_1.1

TOOLKIT: NVIDIA_CUDA_Toolkit_1.1_Suse10.2_x86_64

DRIVER: NVIDIA Driver for Linux with CUDA Support (169.04)

OS: openSUSE 10.3 (X86-64)

Video card: GeForce 8800GTX with 768 Mb RAM

Hardware: 2048 MB RAM, Athlon 64 X2 4600+

Here is a code sample with the bug:

Code description:

I created a large array of ‘unsigned ints’, initialized them with 0’s and copied the array into the device’s memory.

Each of 48 threads I created gets it’s own part of the array to process.

If any thread finds an array member with value != 0 it writes an ‘1’ to an array in the shared memory and exits the first loop with ‘break’ statement.

Second loop tests if any member of an array in the shared memory is != 0.

If such value is found - the loop will be terminated with the second ‘break’ statement.

If I remove one or two breaks in this code the kernel works perfectly !

I zipped the source code and the makefile into BugReport01.zip

Sorry for my English. I do my best. ;)

#ifndef _TEMPLATE_KERNEL_H_

#define _TEMPLATE_KERNEL_H_

#include <stdio.h>

__global__ void

MyKernel(unsigned int* g_input, unsigned int operand_length)

{

   // Shared Memory (quite small -  num_threads * sizeof( int) )

    extern  __shared__  int shared_array[];

   // Thread ID

    const unsigned int tid = threadIdx.x;

    

    // Number of threads

    const unsigned int num_threads = blockDim.x;

   // each thread gets its own portion of data

    unsigned int pointer_position;

   // if all members of array 'g_input' == 0 then is_not_null will be 0

    // else is_not_null will be 1

    unsigned int is_not_null;

    

    // each thread writes results to dedicated position of shared_array (shared_array[tid])

    shared_array[tid] = 0;

   for (unsigned int i = 0; i < operand_length; ++i)

    {

        // each thread gets its own portion of data

        pointer_position = tid * operand_length + i;

       // if at least one  member of array is !=0 then

        if (g_input[pointer_position] != 0)

        {

            // write an 1 into shared_array[tid] ...

            shared_array[tid] = 1;

            // ... and exit the loop

           // This is one of two 'break' in my code that cause malfunctions

            // if you remove it the kernel will work perfectly

            break;

        }

    }

    __syncthreads();

   // The first thread (with tid = 0) collects all results from 'shared_array'

    // Other threads have nothing to do

    if (tid == 0)

    {

        is_not_null = 0;

        for (unsigned int i = 0; i < num_threads; ++i)

        {

            // If at least one thread detected an number != 0 in array 'g_input'

            // (at least one member of shared_array is 1)

            if (shared_array[i] != 0)

            {

                // at least one member of array is not null !

                s_not_null = 1;

               // exit the loop

               // This is one of two 'break' in my code that cause malfunctions

                // if you remove it the kernel will work perfectly

                break;

            }

        }

    }

    __syncthreads();

}

#endif // #ifndef _TEMPLATE_KERNEL_H_

BugReoprt01.zip (2.79 KB)

netllama · December 20, 2007, 12:40am

I’m attempting to replicate this problem, however its not clear what the desired behavior of your test app should be?

When I build & attempt to run your test app on Linux-x86_64 against CUDA_1.1, it just runs indefinitely, however there is never any instability.

Also, please generate and attach an nvidia-bug-report.log.

thanks,
Lonni

CUDA_user · January 16, 2008, 3:42pm

Hi,

here I am at last.

This code example does nothing useful. It’s a part of a larger project, I simplified the code as much as possible to present the bug.

When executed the code causes one system to reboot, and the other system stops responding. It’s a single purpose of the code :)

Sometimes (1 out 10 tries) the application runs without any instabilities.

If you try to execute the app 4 or 5 times in a row it will crash for sure (at least on my both computers).

If you won’t be able to replicate my problem, please try to remove both “break” statements from the code (in BugReport01_kernel.cu), and compare application’s execution time.

Without "break"s the application runs much faster (in that rare case my PC survived the execution of this code).

Here is it. I compressed the file because the forum’s software doesn’t allow me to attach a .log file.

Thanks !
bug_report_logfile.zip (33.3 KB)

netllama · January 16, 2008, 5:05pm

I’ve run your test app 10 times in a row, and there was no instability at any time.

I’m not sure that I understand how removing the break statements from BugReport01_kernel.cu to make the application run faster relates to the instability you’ve reported here. Can you clarify?

Does this problem persist if X is NOT running while your app is running?
Have you verified that you’re using the latest motherboard BIOS?
Does this problem persist with a more recent kernel (2.6.18 is rather old)?
Does this problem persist if your kernrel is not using vesafb?

Topic		Replies	Views
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3464	March 10, 2011
CUDA parallelization fail..? CUDA Programming and Performance	3	3372	June 8, 2008
__constant memory issues CUDA Programming and Performance	32	12538	September 15, 2010
Program gives unexpected error compiles smooth, but output is unexpected result CUDA Programming and Performance	5	3297	October 17, 2011
CUDA compile trouble CUDA Programming and Performance	47	5122	November 8, 2010
nvcc Segmentation Fault (addressing issue with stack vars?) local to global gather CUDA Programming and Performance	8	9960	August 15, 2010
Silent kernel failure CUDA Programming and Performance	25	8320	May 18, 2020
Inconsistent CUDA Kernel Execution Times in Sequential Execution CUDA Programming and Performance cuda	6	259	June 11, 2024
Instability/deadlock on bigger arrays CUDA Programming and Performance	2	3304	July 25, 2008
kernels timeout or hang intermitently CUDA Programming and Performance	9	3734	July 25, 2013

Possible NVCC compiler bug Two 'breaks' in different loops :(

Related topics