Attempting to use/understand how to use expression templates in CUDA code

I’ve been attempting to write some expression template code in CUDA, however, it doesn’t seem to be working correctly. On the surface, it just isn’t producing the correct answer, when I try to debug it I see that the expression types appear to be generated correctly by nvcc, however, the problem comes when launching the assignment kernel. At this point, all I can see in cuda-gdb is a an invalid access from thread 0/block 0.

I’ve found myself confused on a few issues as well:

  1. When should I use host or device for a constructor? If I don’t specify device does that mean I can never have an instance of the class on the device?

  2. When I pass parameters to a kernel from host code, does it call the copy constructor or something else?

In any case, I’ve attached the code here, any ideas or thoughts are welcome.

Thanks,

  • Justin

P.S. the two header files I’ve attached, “vector.cu and opreps.cu” are actually supposed to be cuh files, for some reason I’m not permitted to upload cuh files.
opreps.cu (1.55 KB)
vector.cu (2.95 KB)
test.cu (300 Bytes)

It might be useful that you take a look at already developed libraries using expression templates, see, for example,

https://github.com/jaredhoberock/newton

and

http://www.orangeowlsolutions.com/bluebird