Should I worry about these invalid reads? Valgrind invalid reads reported for Memcpy2D

Hi all,

I am doing a lot of malloc/memcpy in my program. I’ve checked my code and there shouldn’t be anything wrong. Most of the Memcpy2D’s are fine, however, there are two which reported errors. The only thing different about the host arrays which have the reported errors are that I had to convert them from 24 bits word to 32 bits words. But then I am memcpy’ing 4 such arrays and only 2 reported memory errors.

I am only seeing this in valgrind running deviceemu. On release mode, all my memcpy2D gives invalid reads error. The program seems to be running fine, but I haven’t been able to verify my outputs yet.

Maybe this isn’t something that would come back and haunt me later on??

EDIT: Reproducible case in reply.

Valgrind

copy size onto device

size of pitch: 4

size of cpitch: 0

copy refid

copy refloc

copy ilist

copy crefid

copy crefloc

copy cilist

malloc binayseqs

copy binayseqs

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C27200: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98cd7 is 1 bytes before a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C27208: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98cd6 is 2 bytes before a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C27212: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98cd5 is 3 bytes before a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C2721C: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98cd4 is 4 bytes before a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

copy bseq

copy reg

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C272B8: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98ca8 is 0 bytes after a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C272BF: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98ca9 is 1 bytes after a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C272C8: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98caa is 2 bytes after a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==29800== 

==29800== Invalid read of size 1

==29800==	at 0x4C272D1: memcpy (mc_replace_strmem.c:402)

==29800==  Address 0x6c98cab is 3 bytes after a block of size 40 alloc'd

==29800==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

copy cbseq

copy creg

before 1kernel invocation: Total GPU Memory: 536150016, free memory: 480644864

before 1kernel invocation: Total GPU Memory: 536608768, free memory: 489701120

malloc/memcpy code

cout<<"copy size onto device"<<endl;

	cutilSafeCall(cudaMalloc((void**) &d_size, sizeof(int)*num));

	cutilSafeCall(cudaMemcpy(d_size, h_size, sizeof(int)*num, cudaMemcpyHostToDevice));

	

	//direct chain

	cout<<"size of pitch: "<<sizeof(ref_id_t)*maxM<<endl;

	cout<<"size of cpitch: "<<sizeof(ref_id_t)*cmaxM<<endl;

	cutilSafeCall(cudaMallocPitch((void**) &d_refid, &p_refid, maxM*sizeof(ref_id_t), num));

	cout<<"copy refid"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_refid, p_refid, h_refid, maxM*sizeof(ref_id_t), maxM*sizeof(ref_id_t), num, cudaMemcpyHostToDevice));

	cutilSafeCall(cudaMallocPitch((void**) &d_refloc, &p_refloc, maxM*sizeof(ref_loc_t), num));

	cout<<"copy refloc"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_refloc, p_refloc, h_refloc, maxM*sizeof(ref_loc_t), maxM*sizeof(ref_loc_t), num, cudaMemcpyHostToDevice));

	

	cutilSafeCall(cudaMallocPitch((void**) &d_ilist, &p_ilist, maxM*sizeof(int), num));

	cout<<"copy ilist"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_ilist, p_ilist, h_ilist, maxM*sizeof(int), maxM*sizeof(int), num, cudaMemcpyHostToDevice));

	

	//complementary chain

	

	cutilSafeCall(cudaMallocPitch((void**) &d_crefid, &p_crefid, cmaxM*sizeof(ref_id_t), num));

	cout<<"copy crefid"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_crefid, p_crefid, h_crefid, cmaxM*sizeof(ref_id_t), cmaxM*sizeof(ref_id_t), num, cudaMemcpyHostToDevice));

	

	cutilSafeCall(cudaMallocPitch((void**) &d_crefloc, &p_crefloc, cmaxM*sizeof(ref_loc_t), num));

	cout<<"copy crefloc"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_crefloc, p_crefloc, h_crefloc, cmaxM*sizeof(ref_loc_t), cmaxM*sizeof(ref_loc_t), num, cudaMemcpyHostToDevice));

	

	cutilSafeCall(cudaMallocPitch((void**) &d_cilist, &p_cilist, cmaxM*sizeof(int), num));

	cout<<"copy cilist"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_cilist, p_cilist, h_cilist, cmaxM*sizeof(int), cmaxM*sizeof(int), num, cudaMemcpyHostToDevice));

	

	cout<<"malloc binayseqs"<<endl;

	// alloc binay seqs

	cutilSafeCall(cudaMallocPitch((void**) &d_bseq, &p_bseq, 12*FIXELEMENT*sizeof(bit32_t), num));

	cutilSafeCall(cudaMallocPitch((void**) &d_reg, &p_reg, 12*FIXELEMENT*sizeof(bit32_t), num));

	cutilSafeCall(cudaMallocPitch((void**) &d_cbseq, &p_cbseq, 12*FIXELEMENT*sizeof(bit32_t), num));

	cutilSafeCall(cudaMallocPitch((void**) &d_creg, &p_creg, 12*FIXELEMENT*sizeof(bit32_t), num));

	cout<<"copy binayseqs"<<endl;

	//copy binary  seqs

	cutilSafeCall(cudaMemcpy2D(d_bseq, p_bseq, h_bseq, 12*FIXELEMENT*sizeof(bit32_t), 12*FIXELEMENT*sizeof(bit32_t), num, cudaMemcpyHostToDevice));

	cout<<"copy bseq"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_reg, p_reg, h_reg, 12*FIXELEMENT*sizeof(bit32_t), 12*FIXELEMENT*sizeof(bit32_t), num, cudaMemcpyHostToDevice));

	cout<<"copy reg"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_cbseq, p_cbseq, h_cbseq, 12*FIXELEMENT*sizeof(bit32_t), 12*FIXELEMENT*sizeof(bit32_t), num, cudaMemcpyHostToDevice));

	cout<<"copy cbseq"<<endl;

	cutilSafeCall(cudaMemcpy2D(d_creg, p_creg, h_creg, 12*FIXELEMENT*sizeof(bit32_t), 12*FIXELEMENT*sizeof(bit32_t), num, cudaMemcpyHostToDevice));

	cout<<"copy creg"<<endl;

code that convert 24->32 bits

extern "C" void 

cpyBinaySeq(bit24_t bseq[][FIXELEMENT], bit24_t reg [][FIXELEMENT], bit24_t cbseq[][FIXELEMENT], bit24_t creg[][FIXELEMENT], size_t size, int tt)

{

		bit32_t *temp1 = (bit32_t*)malloc(sizeof(bit32_t)*12*FIXELEMENT);

		bit32_t *temp2 = (bit32_t*)malloc(sizeof(bit32_t)*12*FIXELEMENT);

		bit32_t *temp3 = (bit32_t*)malloc(sizeof(bit32_t)*12*FIXELEMENT);

		bit32_t *temp4 = (bit32_t*)malloc(sizeof(bit32_t)*12*FIXELEMENT);

		

		for(int j=0; j<12; j++) {

			for(int i=0; i<FIXELEMENT; i++) {

				//memcpy(&temp1[j*FIXELEMENT + i], &bseq[i][j], sizeof(bit24_t));

				temp1[j*FIXELEMENT + i] = (bit32_t)bseq[i][j].a;

				temp2[j*FIXELEMENT + i] = (bit32_t)reg[i][j].a;

				temp3[j*FIXELEMENT + i] = (bit32_t)cbseq[i][j].a;

				temp4[j*FIXELEMENT + i] = (bit32_t)creg[i][j].a;

			}

		}

		h_bseq[tt] = temp1;

		h_reg[tt] = temp2;

		h_cbseq[tt] = temp3;

		h_creg[tt] = temp4;

		h_size[tt] = (int)size;

}

Hi,

I’ve written some code which reproduces the error. I am also getting some segmentation faults here.

EDIT: my bad… valgrind outputs were not correct… please see attachments for FULL reproductible case.

Please help! I am really struck here.

test.cu

int myTest() {

int jj, kk;

	cout<<"set host data"<<endl;

	bit32_t **h_data = (bit32_t**)malloc(sizeof(bit32_t*)*3);

	bit32_t **h_data2 = (bit32_t**)malloc(sizeof(bit32_t*)*3);

	for(jj = 0; jj<3; jj++) {

		h_data[jj] = (bit32_t*)malloc(sizeof(bit32_t)*3);

		h_data2[jj] = (bit32_t*)malloc(sizeof(bit32_t)*3);

		for(kk =0; kk<3; kk++) {

			h_data[jj][kk] = (bit32_t)kk;

			cout<<h_data[jj][kk];

		}

		cout<<endl;

	}

	bit32_t *dd_data;

	size_t pi;

	cutilSafeCall(cudaMallocPitch((void**) &dd_data, &pi, 3*sizeof(bit32_t), 3));

	cutilSafeCall(cudaMemcpy2D(dd_data, pi, h_data, 3*sizeof(bit32_t), 3*sizeof(bit32_t), 3, cudaMemcpyHostToDevice));

	cutilSafeCall(cudaMemcpy2D(h_data2, pi, dd_data, pi, pi, 3, cudaMemcpyDeviceToHost));

	cout<<"print copied back data"<<endl;

	for(jj=0; jj<3; jj++){

		for(kk = 0; kk<3; kk++)

			cout<<h_data2[jj][kk];

		cout<<endl;

		free(h_data[jj]);

		//cudaFree(dd_data[jj]);

	}

		free(h_data);

cudaFree(dd_data);

	return 0;

}

valgrind

==32011== Memcheck, a memory error detector.

==32011== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.

==32011== Using LibVEX rev 1854, a library for dynamic binary translation.

==32011== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.

==32011== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.

==32011== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.

==32011== For more details, rerun with: -v

==32011== 

set host data

012

012

012

==32011== Invalid read of size 1

==32011==	at 0x4C27200: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C8738: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B5877: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x400FE7: myTest() (test.cu:35)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f3fb is 11 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400EF1: myTest() (test.cu:21)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid read of size 1

==32011==	at 0x4C27208: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C8738: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B5877: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x400FE7: myTest() (test.cu:35)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f3fa is 10 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400EF1: myTest() (test.cu:21)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid read of size 1

==32011==	at 0x4C27212: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C8738: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B5877: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x400FE7: myTest() (test.cu:35)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f3f9 is 9 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400EF1: myTest() (test.cu:21)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid read of size 1

==32011==	at 0x4C2721C: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C8738: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B5877: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x400FE7: myTest() (test.cu:35)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f3f8 is 8 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400EF1: myTest() (test.cu:21)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272BC: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7DA5: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f438 is 0 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F01: myTest() (test.cu:22)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272C4: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7DA5: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f439 is 1 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F01: myTest() (test.cu:22)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272CD: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7DA5: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f43a is 2 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F01: myTest() (test.cu:22)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272D6: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7DA5: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f43b is 3 bytes after a block of size 24 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F01: myTest() (test.cu:22)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272BC: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7E8F: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f460 is 8 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F16: myTest() (test.cu:24)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272C4: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7E8F: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f461 is 7 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F16: myTest() (test.cu:24)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272CD: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7E8F: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f462 is 6 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F16: myTest() (test.cu:24)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272D6: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7E8F: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f463 is 5 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F16: myTest() (test.cu:24)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272BC: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7EA8: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f4a0 is 8 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F24: myTest() (test.cu:25)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272C4: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7EA8: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f4a1 is 7 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F24: myTest() (test.cu:25)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272CD: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7EA8: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f4a2 is 6 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F24: myTest() (test.cu:25)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid write of size 1

==32011==	at 0x4C272D6: memcpy (mc_replace_strmem.c:402)

==32011==	by 0x54C7EA8: (within /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x54B58ED: cudaMemcpy2D (in /usr/local/cuda/lib/libcudart.so.2.0)

==32011==	by 0x401044: myTest() (test.cu:36)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f4a3 is 5 bytes before a block of size 12 alloc'd

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F24: myTest() (test.cu:25)

==32011==	by 0x401122: main (test.cu:15)

print copied back data

==32011== 

==32011== Use of uninitialised value of size 8

==32011==	at 0x5765103: (within /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x576BC16: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x576BE46: std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x577EA9B: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x4010B3: myTest() (ostream:199)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Conditional jump or move depends on uninitialised value(s)

==32011==	at 0x576510E: (within /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x576BC16: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x576BE46: std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x577EA9B: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/libstdc++.so.6.0.10)

==32011==	by 0x4010B3: myTest() (ostream:199)

==32011==	by 0x401122: main (test.cu:15)

000

==32011== 

==32011== Invalid free() / delete / delete[]

==32011==	at 0x4C252AF: free (vg_replace_malloc.c:323)

==32011==	by 0x4010DA: myTest() (test.cu:43)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x6a3f468 is 0 bytes inside a block of size 12 free'd

==32011==	at 0x4C252AF: free (vg_replace_malloc.c:323)

==32011==	by 0x4010D0: myTest() (test.cu:42)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Use of uninitialised value of size 8

==32011==	at 0x4010A7: myTest() (ostream:199)

==32011==	by 0x401122: main (test.cu:15)

012

==32011== 

==32011== Conditional jump or move depends on uninitialised value(s)

==32011==	at 0x4C25265: free (vg_replace_malloc.c:323)

==32011==	by 0x4010DA: myTest() (test.cu:43)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== Invalid read of size 4

==32011==	at 0x4010A7: myTest() (ostream:199)

==32011==	by 0x401122: main (test.cu:15)

==32011==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

==32011== 

==32011== Process terminating with default action of signal 11 (SIGSEGV)

==32011==  Access not within mapped region at address 0x0

==32011==	at 0x4010A7: myTest() (ostream:199)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== ERROR SUMMARY: 168 errors from 22 contexts (suppressed: 8 from 1)

==32011== malloc/free: in use at exit: 13,264 bytes in 42 blocks.

==32011== malloc/free: 61 allocs, 21 frees, 18,831 bytes allocated.

==32011== For counts of detected errors, rerun with: -v

==32011== searching for pointers to 42 not-freed blocks.

==32011== checked 633,032 bytes.

==32011== 

==32011== 

==32011== 24 bytes in 1 blocks are definitely lost in loss record 6 of 33

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F01: myTest() (test.cu:22)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== 

==32011== 36 (24 direct, 12 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 33

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400EF1: myTest() (test.cu:21)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== 

==32011== 36 bytes in 3 blocks are definitely lost in loss record 11 of 33

==32011==	at 0x4C265AE: malloc (vg_replace_malloc.c:207)

==32011==	by 0x400F24: myTest() (test.cu:25)

==32011==	by 0x401122: main (test.cu:15)

==32011== 

==32011== LEAK SUMMARY:

==32011==	definitely lost: 84 bytes in 5 blocks.

==32011==	indirectly lost: 12 bytes in 1 blocks.

==32011==	  possibly lost: 0 bytes in 0 blocks.

==32011==	still reachable: 13,168 bytes in 36 blocks.

==32011==		 suppressed: 0 bytes in 0 blocks.

==32011== Reachable blocks (those to which a pointer was found) are not shown.

Thanks for checking!

Attached reproducible case.
test.txt (1.36 KB)
Makefile.txt (2.29 KB)