Nested struct problem

Hello all,

I’m trying to port Radius-CUDA (a GPGPU raytracer) to work on Linux. The original windows code is available here. Radius-CUDA is a CUDA adaptation of Radius (which is a Linux raytracer to begin with), so the conversion itself shouldn’t be too painful… or so I thought. The problem is with a nested anonymous struct that appears in the CUDA code. The relevant code is as follows:

struct vec_t {

	float x,y,z;

  __device__ vec_t() {}

	__device__ vec_t(const float a, const float b, const float c) : x(a), y(b), z(c) {}

	__device__ vec_t(const float a) : x(a), y(a), z(a) {}

	__device__ vec_t operator+(const vec_t &v) const { return vec_t(x+v.x,y+v.y,z+v.z); }

	__device__ vec_t operator-(const vec_t &v) const { return vec_t(x-v.x,y-v.y,z-v.z); }

	__device__ vec_t operator-() const { return vec_t(-x,-y,-z); }

	__device__ vec_t operator*(const float d) const { return vec_t(x*d,y*d,z*d); }

	__device__ vec_t cross(const vec_t &v) const { return vec_t(y*v.z-z*v.y,z*v.x-x*v.z,x*v.y-y*v.x); }

	__device__ vec_t normalize() const { return *this * (1.f/sqrtf(magsqr())); }

	__device__ float norm() const { return sqrtf(magsqr()); }

	__device__ float dot(const vec_t &v) const { return x*v.x+y*v.y+z*v.z; }

	__device__ float magsqr() const { return dot(*this); }

	__device__ float get_min() const { return fminf(fminf(x,y),z); }

	__device__ float get_max() const { return fmaxf(fmaxf(x,y),z); }

	__device__ vec_t perm_x() const { return vec_t(x, y, z); }

	__device__ vec_t perm_y() const { return vec_t(y, z, x); }

	__device__ vec_t perm_z() const { return vec_t(z, x, y); }


struct sampler_t {

	vec_t top, dx, dy;

	__device__ vec_t map(const point_t &screen) const {

		return vec_t(top + dx*float(screen.x) + dy*float(screen.y));



struct camera_t {

  struct {

	  vec_t eye,dir, up, right;

	  float fovx;

	  int world_up_index;

	  sampler_t sampler;


  __device__ inline void look_at(const vec_t &target, const int up_idx = -1);

  __device__ inline void set_fovx(const float degree) { fovx = DEGRAD(degree) * .5f; }

  __device__ inline void set_eye(const vec_t &v) { eye = v; }

  __device__ inline float get_fovx() const { return RADDEG(fovx)*2.f; }

  __device__ inline const vec_t &get_eye() const { return eye; }

  __device__ inline const vec_t &get_up() const { return up; }

  __device__ inline const vec_t &get_dir() const { return dir; }

  __device__ inline const vec_t &get_right() const { return right; }

  __device__ inline void set_world_up_index(int idx) { world_up_index = idx; }

  __device__ inline int get_world_up_index() const { return world_up_index; }


The compiler chokes on the following declaration:

__device__ __constant__ camera_t cu_cam;

The compiler errors are:

I’m using gcc-4.2.4 with the 177.80 drivers and the CUDA-2.0 SDK. However, I’ve also tried compiling with gcc-4.1 and get the same errors. Obviously, this code compiles just swimmingly using MSVC in Visual Studio 2008. Any ideas?



I ported radius-cuda to Linux yesterday. Attached is a patch (sorry about the silly .txt extension - the forum would not let me upload it otherwise). After applying it, change into the src directory, point the CUDA_SDK line in the Makefile at your CUDA SDK installation directory and run make. If all goes well, you will end up with an executable called radius. To run it, change into the etc directory and run …/src/radius (it looks for the model file in the current directory, hence the need to change into etc first).

I tried very hard to keep the changes to a minimum. Here is a quick run-down: To get rid of the problems with the anonymous struct inside camera_t, I named the struct d and changed all references to it accordingly. Next, CUDA complained about vec_t having a non-empty constructor. I fixed this by removing the struct’s constructors altogether and initializing its members directly instead. As constructing a vec_t in this way just to extract the three floats it contains right away is a bit silly, I modified frgb_to_int to take three float arguments directly. The final change was to expose sys::cpu::get_cpu_frequency() and use it to retrieve the CPU frequency in sys::laps_t::bootstrap(). The frequency would normally be retrieved internally during sys::cpu::bootstrap() but the code in fails to call this method; without the initialization, the light source would never move and the rays per second would always be shown as zero.

For the record, the measurements I get using a 280GTX for the default view are as follows:

    With shadows: 7-14 Mray/s

    With shading: 35.1 Mray/s

    Without shading: 35.2 Mray/s

The figures reported by the ray tracer actually refer to eye rays only, so the total number of rays in the first case is higher as an additional shadow ray is cast whenever the eye ray hits an object.

I would be curious to hear back from others what kind of timings they get and whether the patch works for them in the first place.
patch_radius_cuda.txt (9.68 KB)


I’d actually solved the problem another way (passing the camera struct as an argument to the kernel). However, I’d be curious to try your patch. However, when I tried to apply the patch to a fresh copy of the radius-cuda source, it failed. I go the following errors:

patching file src/Makefile

patching file src/

Hunk #1 FAILED at 118.

Hunk #2 FAILED at 161.

Hunk #3 FAILED at 297.

Hunk #4 FAILED at 522.

Hunk #5 FAILED at 542.

Hunk #6 FAILED at 568.

Hunk #7 FAILED at 588.

Hunk #8 FAILED at 619.

Hunk #9 FAILED at 628.

9 out of 9 hunks FAILED – saving rejects to file src/

patching file src/

Hunk #1 FAILED at 36.

1 out of 1 hunk FAILED – saving rejects to file src/

patching file src/

Hunk #1 FAILED at 80.

1 out of 1 hunk FAILED – saving rejects to file src/

patching file src/sys_cpu.h

Hunk #1 FAILED at 56.

1 out of 1 hunk FAILED – saving rejects to file src/sys_cpu.h.rej

This is with a brand new version of the source available from… Is this the source against which you created the patch? It’s the only radius-cuda source I know about. Perhaps you could try regenerating it.



I was getting the same failures before but the version I uploaded works for me. It seems to be some kind of whitespace issue - maybe the text file is getting mangled somewhere in transfer. Here is the same patch zipped so that hopefully, no forum software or browser will feel compelled to mess with it. Let me know how that goes.

As for passing the camera as a parameter - I wanted to avoid this as it places the camera in shared instead of constant memory, possibly changing the kernel’s performance characteristics. (2.53 KB)


It seems that this link does not work:…

I tried to download it but it says requested url is not found.

Will anybody be willing to upload it?

Thank you.


The source code for the original version is not availabe on… anymore.

Can anyone repost it?

Thank you very much in advanced!!!