Illegal memory access was encountered with pointers and objects

I have a list of pointers to objects that are implementations of a virtual class
Everything is correctly in GPU memory, but when I call a function passing a pointer to a variable it returns an illegal memory access

__device__ vec3_t march(sphere **obj_ls, int count) {
		vec3_t next_orig;
		vec3_t color = {1,1,1};
		for (int i = 0; i < n_seg; i++) {
			next_orig = orig + delta * dir;
			for (int j = 0; j < count; j++) {
				if (obj_ls[j]->is_inside(next_orig, &color)) {
					test_func(&color);
					goto endLoop;
				}
					
			}
			orig = next_orig;
		}
		endLoop:
		return color;
	}

The function should implement a ray marching
sphere is an object that contains the following definition of is_inside:

__device__ bool is_inside(vec3_t point, vec3_t* col) override {
			float a, b, c;
			a = orig.x - point.x;
			b = orig.y - point.y;
			c = orig.z - point.z;
			if ((a * a + b * b + c * c) > radius_sqr)
				return false;

			*col = { 0,0,0 };
			return true;
		}

That overrides a function in a virtual class.
If I change the pointer to color to a reference it works fine, otherwise memcheck returns the following output:
========= Invalid global read of size 8 bytes

========= at 0xea0 in C:/Users/vicin/Desktop/PoliTo/GPUProg/final_project/BlackHoleSim/BlackHoleSim/ray.h:51:ray::march(sphere **, int)

========= by thread (0,11,0) in block (0,0,0)

========= Address 0x7ff6a0998510 is out of bounds

========= and is 140,666,050,543,361 bytes after the nearest allocation at 0x742a00600 of size 16 bytes

========= Device Frame:C:/Users/vicin/Desktop/PoliTo/GPUProg/final_project/BlackHoleSim/BlackHoleSim/tracing.cu:25:render(cv::cuda::PtrStepSz<_vec3>, int, int, camera *, sphere **, int) [0x1d20]

I’m running this code on a RTX 3060 laptop GPU

Hope someone can help me

In the part of the code which you have not shown, you probably have incorrectly initialized the different spheres. objects with virtual device functions need to be treated differently.

Thank you for the reply
This is how I create the list of objects:

sphere** createScene() {
    int size = 2;

    sphere** scene = (sphere**)malloc(sizeof(sphere*)*size);
    sphere** scene_gpu;

    scene[0] = sphere(vec3_t{0,0,0}, 0.3f).allocGPU();
    scene[1] = sphere(vec3_t{1,0,0}, 0.2f).allocGPU();

    cudaMalloc(&scene_gpu, sizeof(sphere*) * size);
    cudaMemcpy(scene_gpu, scene, sizeof(sphere*) * size, cudaMemcpyHostToDevice);
    return scene_gpu;
}

And this is the class “sphere”:

#pragma once

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "structs.h"
#include "object.h"
#include <iostream>


class sphere : public object{
	public:
		__host__ __device__  sphere(vec3_t origin, float radius) :object(origin), radius(radius) { radius_sqr = radius * radius; }

		__device__ bool is_inside(vec3_t point, vec3_t* col) override;


		__host__ __device__ float get_radius() { return radius; }
		__host__ __device__ float get_radius_sqr() { return radius_sqr; }
		__host__ __device__ vec3_t get_origin() { return orig; }
		sphere *allocGPU()
		{
			sphere* gpu;
			cudaMalloc(&gpu, sizeof(sphere));
			cudaMemcpy(gpu, this, sizeof(sphere), cudaMemcpyHostToDevice);

			return gpu;
		}

	private:
		
		float radius, radius_sqr;

};

When I make it a “stand alone” class it works, it appears to be a combination of using a pointer in a function in an override function, I’m still learning and I would like to understand as better as I can the functionalities of CUDA

sphere *allocGPU()
	{
		sphere* gpu;
		cudaMalloc(&gpu, sizeof(sphere));
		cudaMemcpy(gpu, this, sizeof(sphere), cudaMemcpyHostToDevice); <----- error

		return gpu;
	}

The marked line will not only copy the member variables of the sphere to the GPU, but also the vtable.
However, the function pointers in the vtable point to host functions (of the original cpu sphere). Those are invalid in GPU code and lead to illegal memory access.

The simplest solution would be to use cudaMallocManaged to allocate a cpu sphere and use the same pointer for host and device.

(You have a memory leak in createScene() because you never free scene)

(I edited the answer)

Is there a way to avoid using unified memory and keep the same structure? since I need to improve as much as I can the performance
And why does it work if I change the pointer to color to a reference to color?

What do you mean by “it works” ? Does not produce illegal memory access ? Or does it produce the correct result?
If you post a complete minimal example that produces the correct result with a reference and produces an memory access error with a pointer, I could take a look.

I don’t think there will be much of a difference with unified memory. After all objects have been migrated to the device, the access is as fast as ordinary global memory.

You can fix the vtables on the device by copy-constructing the object from itself using placement-new. This requires knowing the derived class type of each sphere.

It produces the correct result.
I realized the problem was given by the fact that the function overrides another function (and by changing the arguments I changed the signature) and not by the argument itself.

You can fix the vtables on the device by copy-constructing the object from itself using placement-new. This requires knowing the derived class type of each sphere.

I kinda understood my mistake and it works fine now I’ll look more into what you said.
Thank you you’ve been very useful and accurate I appreciate it.