Weird issue with Unified Memory

Hi there, while I was studying the Unified Memory I am encountering an illegal memory access that I can’t really figure out.

That’s the code (heavily taken from code-samples/dataElem_um_c++_2.cu at master · NVIDIA-developer-blog/code-samples · GitHub):

It is running on Windows 10, with CUDA 11.4 on a GTX 1080 Ti


#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <iostream>
#include <cstring>



class Managed {
public:
	void *operator new(size_t len) {
		void *ptr;
		printf("Copying memory (managed class): size_it is %i\n", len);
		cudaMallocManaged(&ptr, len);
		cudaDeviceSynchronize();
		printf("Returning ptr!\n");
		return ptr;
	}

	void operator delete(void *ptr) {
		cudaDeviceSynchronize();
		cudaFree(ptr);
	}
};

__host__ __device__ bool cuStrcmp(const char *a, const char *b) {
	bool str_end_a = false;
	bool str_end_b = false;
	int i = 0;
	while (!(str_end_a || str_end_b)) {
		if (a[i] == '\0')
			str_end_a = true;
		if (b[i] == '\0')
			str_end_b = true;
		if (a[i] != b[i]) {
			return false;
		}

		i++;
	}
	return (str_end_a && str_end_b);
}




class cuString : public Managed {
private:
	char * s;
	short int length;
	cuString();

	void copyConstChar(const char * input) {
		length = (short int)strlen(input) + 1;
		printf("Copying char array of size %i\n", length);
		cudaMallocManaged(&s, length);
		memcpy(s, input, length);

	}
public:
	cuString(const std::string& s) {
		printf("String copy constructor\n");
		copyConstChar(s.c_str());
	}

	cuString(const cuString& other) {
		printf("cuString copy constructor\n");
		length = other.length;
		cudaMallocManaged(&s, length);
		memcpy(s, other.s, length);
	};

	cuString(const char * input) {
		copyConstChar(input);
	}

	__device__ __host__ unsigned short int get_length() const{
		return length;
	}

	__device__ __host__ const char * c_str() const{
		return s;
	}

	__device__ __host__ bool operator==(const cuString& other) const{
		if (length != other.length)
			return false;
		short int i = 0;
		for (; i < length; i++) 
			if (s[i] != other.s[i])
				return false;
		
		return true;
	}

	__device__ __host__ bool operator==(const char* other) const {
		return cuStrcmp(s, other);
	}

	~cuString() {
		printf("Freeing cuStr memory\n");
		cudaFree(s);
	};

};


class DataElem : public Managed {
public:
	const cuString str2;
	const double t;

	__host__ std::string get_str2() {
		return std::string("str2");
	}


	DataElem() :str2("prova2"), t(5.0) {};
};


__global__ void printKernel(DataElem* p) {
	printf("(device) PP is %f\n", p->t);
	printf("(device) PP is (%s; %f)\n", p->str2.c_str(), p->t);
	if (p->str2.c_str() == "teststring")
		printf("Hey, operator works!\n");
	else
		printf("Hey, operator also works!\n");
}

__host__ void printKernelHost(DataElem* p) {
	printf("(host) PP is (%s; %s; %f)\n", p->str2.c_str(), p->str2.c_str(), p->t);
	if (p->str2.c_str() == "teststring")
		printf("Hey, operator works!\n");
	else
		printf("Hey, operator also works!\n");
}





int main()
{

	std::string s = "teststring";
	// cuString * cu = new cuString(s);
	DataElem * p = new DataElem();
	printKernelHost(p);
	printKernel << <1, 1 >> > (p);
	cudaError_t cudaerr = cudaDeviceSynchronize();
	if (cudaerr != cudaSuccess)
		printf("kernel launch failed with error \"%s\".\n",
			cudaGetErrorString(cudaerr));


    return 0;
}


The output is:

Copying memory (managed class): size_it is 24
Returning ptr!
Copying char array of size 5
(host) PP is (test; test; 5.000000)
Hey, operator also works!
(device) PP is 0.000000
kernel launch failed with error "an illegal memory access was encountered".

The weird thing is that if, I swap the order of const cuString str2 and const double t; in the class DataElem, thus it becomes:

class DataElem : public Managed {
public:
	const double t;
	const cuString str2;

	__host__ std::string get_str2() {
		return std::string("str2");
	}


	DataElem() :str2("prova2"), t(5.0) {};
};

The output now is:

Copying memory (managed class): size_it is 24
Returning ptr!
Copying char array of size 5
(host) PP is (test; test; 5.000000)
Hey, operator also works!
(device) PP is 5.000000
(device) PP is (test; 5.000000)
Hey, operator also works!

And it works correctly. Can someone help me what’s happening here? Sorry if the code is messy.

Thank you for your help!

When I run your code on linux I get no errors.

It might be helpful if you indicate whether you are building a debug or release project on windows.

It will probably be helpful if you use the method described here to isolate the error to a specific line of kernel code.

I also doubt the code you have shown here lines up with your output exactly. The code you have shown here would print prova2;prova2, whereas your purported output shows test;test

Possibly a Windows-specific issue, because I see the following output when I build and run on my Windows 10 system (driver Version: 472.12):

C:\Users\Norbert\My Programs>unif_mem
Copying memory (managed class): size_it is 24
Returning ptr!
Copying char array of size 7
(host) PP is (prova2; prova2; 5.000000)
Hey, operator also works!
(device) PP is 0.000000
kernel launch failed with error "an illegal memory access was encountered".

.

I haven’t studied it carefully but you may be running into this, specifically:

  • All direct and indirect base classes B of T are empty and the type of the first field F of T uses B in its definition, such that B is laid out at offset 0 in the definition of F.

Where F/B referred to here would be your cuString class member type or one of its parents.

Thanks to everything for the help!

@Robert_Crovella thank you, I may have miscopied the output

by the way, the Windows exe was compiled in debug