Nsight Visual Studio Edition Version 5.2.0.16321 Internal debugger error

zhuchw · July 5, 2018, 7:25am

Hi,dear all

I use thrust:sort() ,get error ,such as

First-chance exception at 0x000007FEFD6BA06D in xxx.exe: Microsoft C++ exception: thrust::system::system_error at memory location 0x000000000021D600.

Unhandled exception at 0x000007FEFD6BA06D in xxx.exe: Microsoft C++ exception: thrust::system::system_error at memory location 0x000000000021D600.

invalid device pointer
{device_radix_sort_dispatch.cuh}

Internal debugger error occurred while attempting to launch _ZN6thrust6system4cuda6detail4cub_30DeviceRadixSortDownsweepKernelINS3_23DeviceRadixSortDispatchILb0EjNS3_8NullTypeEiE18PtxDownsweepPolicyELb0EjS6_iEEvPT1_SA_PT2_SC_PT3_SD_iibbNS3_13GridEvenShareISD_EE in
CUcontext 0x0a5cab00, CUmodule 0xa96699a0:
code patching failed for unknown reason.

All breakpoints for function _ZN6thrust6system4cuda6detail4cub_30DeviceRadixSortDownsweepKernelINS3_23DeviceRadixSortDispatchILb0EjNS3_8NullTypeEiE18PtxDownsweepPolicyELb0EjS6_iEEvPT1_SA_PT2_SC_PT3_SD_iibbNS3_13GridEvenShareISD_EE have been removed.

See Output View for additional messages of this type.

my env is:

NVIDIA Nsight Visual Studio Edition Version 5.2
{Build Number 5.2.0.16321}
CUDA8.0 >{CUDA Toolkit 7.5 or 8.0}
GTX1080
Visual Studio 2013
Driver{376.51}>{NVIDIA Display Driver version 376.09}

codes as:
thrust::device_vector bucket_indices(nLevelSize);
xxx…
thrust::sort(bucket_indices.begin(), bucket_indices.end());

can anyone tell me how i can resolve it?
thrust bug?
nsight bug?
my code bug?

first exec is ok.
second exec is error. such as for 1:10

thanks very much!

zhuchw · July 6, 2018, 1:28am

Hi,Dear all
Please Help Me! Urgent!

harryz · July 6, 2018, 5:49am

Hello, this may be caused by the following reasons.

TDR is triggered and the driver is reset, you can disable the TDR in nsight monitor
Maybe there is a memory leak in thrust, you can try to disable the nsight memcheck and have a try.
Maybe there is a memory leak in your own code, I think all of these errors occurred from corrupted data.

zhuchw · July 6, 2018, 6:47am

Hello,Dear harryz_

first,thanks very much for your reply.

I haved {1} disable the TDR in nsight monitor and {2} disable the nsight memcheck,
but this error also occur.
so sure that the error caused by a memory leak in my own code.

I am confused that
first time exec is ok {n=0}
second time exec is error {n=1} inside a for loop.

for (int n = 0;n<3;n++)
{
my code exec;
}

why this case occur?
how i can quick fix this memory leakage error?

hope that your helps!thanks very much!

harryz · July 6, 2018, 7:14am

Can you give me a mini sample to reproduce the issue?

zhuchw · July 6, 2018, 7:14am

Hello, this may be caused by the following reasons.

TDR is triggered and the driver is reset, you can disable the TDR in nsight monitor

Maybe there is a memory leak in thrust, you can try to disable the nsight memcheck and have a try.

Maybe there is a memory leak in your own code, I think all of these errors occurred from corrupted data.

Hello,Dear harryz_

first,thanks very much for your reply.

I haved {1} disable the TDR in nsight monitor and {2} disable the nsight memcheck,
but this error also occur.
so sure that the error caused by a memory leak in my own code.

I am confused that
first time exec is ok {n=0}
second time exec is error {n=1} inside a for loop.

for (int n = 0;n<3;n++)
{
my code exec;
}

why this case occur?
how i can quick fix this memory leakage error?

hope that your helps!thanks very much!

A supplement:
I not find any memory leakage in my own code by nsight memcheck tool.

harryz · July 6, 2018, 7:23am

Can you give me a mini sample to have a try?

zhuchw · July 6, 2018, 8:02am

this is my mini sample code, thanks very much!

my env is:

NVIDIA Nsight Visual Studio Edition Version 5.2
{Build Number 5.2.0.16321}
CUDA8.0 >{CUDA Toolkit 7.5 or 8.0}
GTX1080
Visual Studio 2013
Driver{376.51}>{NVIDIA Display Driver version 376.09}

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

#include

#include <cuda_runtime.h>

#include <helper_functions.h>
#include <helper_cuda.h>

#include “device_launch_parameters.h”

using namespace std;

// thrust
#include <thrust/host_vector.h> // host vector
#include <thrust/device_vector.h> // device vector
#include <thrust/device_ptr.h> // device ptr
#include <thrust/iterator/counting_iterator.h> // iterator
#include <thrust/transform.h>
#include <thrust/sort.h>
#include <thrust/reduce.h>
#include <thrust/generate.h>
#include <thrust/binary_search.h>
#include <thrust/random.h>
#include <thrust/functional.h>

struct point_to_bucket_index : public thrust::unary_function<float2, unsigned int>
{
// constructor function
host device point_to_bucket_index(unsigned int width, unsigned int height) :w(width), h(height){}

__host__ __device__ unsigned int operator()(float2 p) const
{
	// find the raster indices of p's bucket
	unsigned int x = static_cast<unsigned int>(p.x * (w - 1));
	unsigned int y = static_cast<unsigned int>(p.y * (h - 1));

	// return the bucket's linear index
	return y * w + x;
}

// member variable
unsigned int w, h;

};

global void TransToFloat2(const float* d_Input1, const float* d_Input2, float2* d_Output, const int nSize, const int nBlocksNumX)
{
const int tid = (blockIdx.y * nBlocksNumX + blockIdx.x) * blockDim.x + threadIdx.x;

if (tid < nSize)
{
	d_Output[tid] = make_float2(d_Input1[tid], d_Input2[tid]);
}

}

void MiniLevelCompute(const int nLevel)
{
int nMaxIterNum = 15;

int nWidth = 512;
int nHeight = 512;
int nSlice = 32;


int nLevelWidth = int((nWidth - 1) / pow(2, nLevel)) + 1;
int nLevelHeight = int((nHeight - 1) / pow(2, nLevel)) + 1;
int nLevelSlice = int((nSlice - 1) / pow(2, nLevel)) + 1;

int nLevelSize = nLevelWidth * nLevelHeight * nLevelSlice;

int nDataSizeByte = nLevelSize * sizeof(float);

float* d_ImageDataRef_tmp;
float* d_ImageDataMov_tmp;

checkCudaErrors(cudaMalloc((void**)&d_ImageDataRef_tmp, nDataSizeByte));
checkCudaErrors(cudaMalloc((void**)&d_ImageDataMov_tmp, nDataSizeByte));

checkCudaErrors(cudaMemset(d_ImageDataRef_tmp, 0, nDataSizeByte));
checkCudaErrors(cudaMemset(d_ImageDataMov_tmp, 0, nDataSizeByte));


int nHistBinsNum = 150;

thrust::device_vector<unsigned int> bucket_begin(nHistBinsNum * nHistBinsNum);
thrust::device_vector<unsigned int> bucket_end(nHistBinsNum * nHistBinsNum);
thrust::device_vector<unsigned int> bucket_indices(nLevelSize);
thrust::device_vector<unsigned int> bucket_sizes(nHistBinsNum * nHistBinsNum);

float2 *d_points;
cudaMalloc((void**)&d_points, sizeof(float2)* nLevelSize);


int nBlocksNumX = 1024;
int nThreadsNumPerBlock = 256;

dim3 nblocks;
nblocks.x = nBlocksNumX;
nblocks.y = ((1 + (nLevelSize - 1) / nThreadsNumPerBlock) - 1) / nBlocksNumX + 1;

for (int nIter = 0; nIter < nMaxIterNum; nIter++)
{

	TransToFloat2 << <nblocks, nThreadsNumPerBlock >> >(d_ImageDataRef_tmp, d_ImageDataMov_tmp, d_points, nLevelSize, nBlocksNumX);

	thrust::device_ptr<float2> points_t(d_points);

	thrust::transform(points_t, points_t + nLevelSize, bucket_indices.begin(), point_to_bucket_index(nHistBinsNum, nHistBinsNum));


	cout << "debug Iter : " << nIter << endl;

	thrust::sort(bucket_indices.begin(), bucket_indices.end());

	cout << "debug Iter : " << nIter << endl;
}

cudaFree(d_points);
cudaFree(d_ImageDataMov_tmp);
cudaFree(d_ImageDataMov_tmp);

}

int test_CUDA_error()
{
int nDevice = 0;

cudaSetDevice(nDevice);

int nDeviceCount;

cudaDeviceProp cDeviceProp;

cudaGetDeviceCount(&nDeviceCount);

cudaGetDeviceProperties(&cDeviceProp, nDevice);

if (1)
{
	cout << "Using device # " << nDevice << endl;

	cout << "Max threads per block: " << cDeviceProp.maxThreadsPerBlock << endl;

	cout << "Max Threads DIM: " << cDeviceProp.maxThreadsDim[0] << " x " << cDeviceProp.maxThreadsDim[1] << " x " << cDeviceProp.maxThreadsDim[2] << endl;

	cout << "Max Grid Size: " << cDeviceProp.maxGridSize[0] << " x " << cDeviceProp.maxGridSize[1] << " x " << cDeviceProp.maxGridSize[2] << endl;

	printf("Device %d: \"%s\" with Compute %d.%d capability\n", nDevice, cDeviceProp.name, cDeviceProp.major, cDeviceProp.minor);
}

for (int nLevel = 1; nLevel >= 0; nLevel--)
{
	MiniLevelCompute(nLevel);
}

return 0;

}

harryz · July 6, 2018, 9:46am

I can repo it on my 1070 + cuda 9.2, your app crashes at line 92 thrust::device_vector bucket_begin(nHistBinsNum * nHistBinsNum); in the second level, it causes gpu exception, so the nsight doesn’t work, it even doesn’t work without nsight. I’m not familiar with thrust programming, I think you should post your question at cuda programming forum.

zhuchw · July 6, 2018, 11:47am

hello,harryz_

thanks very much also.

so sure that this error is about thrust ,without about nsight?

you mean that i should post my question at this website
https://devtalk.nvidia.com/default/board/57/cuda-programming-and-performance/

or other good position?

harryz · July 9, 2018, 2:07am

Not sure if it’s thrust’s issue, but definitely it is not nsight’s issue, your program also crashes without nsight.

zhuchw · July 9, 2018, 2:16am

OK,thanks!

sure that it’s my program 's issue.
i debugging it now.

Topic		Replies	Views
thrust issue? please help me! someone familiar with thrust. CUDA Programming and Performance	4	1889	July 6, 2018
Next-Gen debugger fails to start Nsight Visual Studio Edition	36	7136	March 16, 2018
Nsight 5.2/VS2012 C++/CUDA 7.5 Crash Nsight Visual Studio Edition	20	2442	March 23, 2017
Nsight version 2023.1 for Visual Studio 2022 hits breakpoints incorrectly Nsight Visual Studio Edition cuda , nsight	0	889	April 5, 2023
Nsight Monitor crash on 5.5 Nsight Visual Studio Edition	27	3268	May 26, 2018
Unable to debug CUDA samples in GTX1080 Nsight Visual Studio Edition	22	3906	July 21, 2017
[Thrust] using thrust::unique causes LNK2001: unresolved external symbol __fatbinwrap ... _cuda_devi... GPU-Accelerated Libraries	11	3188	July 7, 2017
CUDA 4.0 RC linking error on MSVC and dynamics runtime libraries /MD bug in nvcc? CUDA Programming and Performance	13	21919	May 30, 2011
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3455	July 28, 2023
CUDA thrust compilation error CUDA Programming and Performance	85	313	April 4, 2025

Nsight Visual Studio Edition Version 5.2.0.16321 Internal debugger error

Related topics