problem running demos

For any that use textures


cutilCheckMsg() CUTIL CUDA error: cudaBindTexture failed in file <>, line 54 : invalid texture reference.

And for simpleGL you see an attached screenshot, like you see in the middle of the black background there seem a litle red point.

Here are other runs

$ cd ~/NVIDIA_CUDA_SDK/bin/linux/release/

$ ./alignedTypes 

Allocating memory...

Generating host input data array...

Uploading input data to GPU memory...

Testing misaligned types...


cutilCheckMsg() CUTIL CUDA error: testKernel() execution failed

 in file <>, line 223 : invalid device function .

$ ./template 

cutilCheckMsg() CUTIL CUDA error: Kernel execution failed in file <>, line 117 : invalid device function .

$ ls | grep est


$ ./bandwidthTest 

Running on......

	  device 0:GeForce 8600M GT

Quick Mode

Host to Device Bandwidth for Pageable memory


Transfer Size (Bytes)	Bandwidth(MB/s)

 33554432		1055.7

Quick Mode

Device to Host Bandwidth for Pageable memory


Transfer Size (Bytes)	Bandwidth(MB/s)

 33554432		565.8

Quick Mode

Device to Device Bandwidth


Transfer Size (Bytes)	Bandwidth(MB/s)

 33554432		8681.5

&&&& Test PASSED

Press ENTER to exit...

$ ./BlackScholes 

Initializing data...

...allocating CPU memory for options.

...allocating GPU memory for options.

...generating input data in CPU mem.

...copying input data to GPU mem.

Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...

cutilCheckMsg() CUTIL CUDA error: BlackScholesGPU() execution failed in file <>, line 195 : invalid device function .

$ ./clock 


time = -1020950724

Press ENTER to exit...

$ ./convolutionFFT2D 

Input data size		   : 1000 x 1000

Convolution kernel size   : 7 x 7

Padded image size		 : 1006 x 1006

Aligned padded image size : 1024 x 1024

Allocating memory...

Generating random input data...

Creating FFT plan for 1024 x 1024...

Uploading to GPU and padding convolution kernel and input data...

...initializing padded kernel and data storage with zeroes...

...copying input data and convolution kernel from host to CUDA arrays

...binding CUDA arrays to texture references

cudaSafeCall() Runtime API error in file <>, line 241 : invalid texture reference.

$ ./dct8x8 

CUDA sample DCT/IDCT implementation


Loading test image: barbara.bmp... [512 x 512]... Success

Running Gold 1 (CPU) version... Success

Running Gold 2 (CPU) version... Success

cudaSafeCall() Runtime API error in file <>, line 245 : invalid texture reference.

Running CUDA 1 (GPU) version...

And commenting a line on Kernel execution failed in file <>, line 79 : invalid device function .

I get

$ bin/linux/release/bitonic 


Press ENTER to exit...

So, the question is, is my computer not able to do CUDA?, is a driver problem?, some info on my system

$ lspci | grep VGA

01:00.0 VGA compatible controller: nVidia Corporation GeForce 8600M GT (rev a1)

$ glxinfo | grep rendering

direct rendering: Yes

$ glxinfo | grep NVIDIA

server glx vendor string: NVIDIA Corporation

client glx vendor string: NVIDIA Corporation

OpenGL vendor string: NVIDIA Corporation

OpenGL version string: 2.1.2 NVIDIA 177.80

OpenGL shading language version string: 1.20 NVIDIA via Cg compiler

$ gcc --version

gcc (GCC) 4.2.4 (Ubuntu 4.2.4-3ubuntu4)

$ g++ --version

g++ (GCC) 4.2.4 (Ubuntu 4.2.4-3ubuntu4)

$ ldconfig -p|grep cuda (ELF) => /usr/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/lib/ (libc6) => /usr/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/ (libc6) => /usr/local/cuda/lib/

So, that is, have a nice day or night and great end and start of year.

It looks like there is something wrong with your install, I would reinstall the driver & toolkit.

It seem that will be best if I try SDK 2.0 but Im not able to find it, I have now

The driver is the default that come with Ubuntu

OpenGL version string: 2.1.2 NVIDIA 177.80

I think I can install the 180 driver, the problem is that when an update of the kernel come in the “updates” and I dont see it, it will break the system.

Also I don’t know “the correct” way of doing and avoiding break in a update of the kernel.

I mean, now I will do some like

1.- Disable 177 from restricted drivers

2.- sudo gdm stop

3.- sudo ./

4.- restart the system

and when a kernel update come, I will

1.- Enable the default 177 restricted drivers from the repo (hope this overwrite the manually installed 180)

2.- Hope that all go OK :nuke:

After try install 180, I find that I was using gcc 1.4, but 180 complaint about kernel being compiled for gcc 4.3, I can relink gcc to point to 4.3, but the question that raise is if the interface of the kernel is for gcc 4.3 and SDK 2.1 dont work OK with 4.3… this setup will just work?

I would install the things from 2.1 beta if I were you (it is less of a beta, more of a preview in my opinion)

What I do in Fedora:

  • do not install the fedora drivers for the card.

  • install the nvidia driver

  • when there was a kernel update, I will generally not be running the nvidia driver after a reboot.

  • I stop X, re-install the nvidia driver and start X.

So, after install the updates, I go to xorg, put vesa for the driver? and then after check that all is running OK after update with vesa default driver, reinstall the newest drivers?

By the way, the examples work after update to

$ glxinfo |grep NVIDIA

server glx vendor string: NVIDIA Corporation

client glx vendor string: NVIDIA Corporation

OpenGL vendor string: NVIDIA Corporation

OpenGL version string: 2.1.2 NVIDIA 180.06

OpenGL shading language version string: 1.20 NVIDIA via Cg compiler

So after this, hehe, where should I start? I want to first do some point interpolations and things like that, so any pointer is welcome.

it will have automagically loaded another driver in my case.

Start by understanding the SDK examples (reduction, scan, …) There are I believe even interpolation examples.

Thanks, Im already looking in some of them.

I think I have understood the thing about write for one thread run in n, but the partition thingy seem to something I need to test by hand.

Also I need to “model” my mind for write some things in parallel… ummm, for example if I need to calculate the fib function (I write as I redact, so I havent tried search), or other functions that depend on previous data and I dont have a input array or some like that, how to approach this type of problems? (I think of them in this momment problems that generate [not have an initial array] content and problems that process content [have an initial array]).

By the way, there exist something like ##cuda on irc? or I should stick to “General CUDA GPU Computing Discussion” and “CUDA Programming and Development” (thought I don’t see the line that separate each one… ummm :wacko:).

I don’t know of better sources as here. General is about hardware and stuff I think, programming & development is more about things encountered when programming. But that is my idea ;)

It takes a while to ‘think parallel’ is my experience. I think I really got it after about 4 months during my second project (first project was embarrassingly parallel).

don’t really understand what you meant, but you can have a kernel that generates data, and after that another kernel that works on the data generated by the first kernel, if that was your question ;)

I’m refering to the first part (if at less for the moment the problem seem to much linear or dependant in calculate all that come before and can’t be breaked in handle independent each block or unit of processing)…

So, like I know for example fibonacci secuence is recursive then can be moded to be iterative, but it we follow the normal way (generate data or the secuencue from beggining to n), how this generation can be done paralelised?

I mean, “my problem” reside in that I wan to generate for example only 1, 1, 2, 3, 5, 8, 13, 21. That is 8 numbers, so I divide the problem in a: 1 to 3 and b:5 to 21, still I will need to generate a before start with b, but in the long range, thinking that I will let for example a part generate 512 numbers and the next the other 512 and so on, how is this type of problems (that for me look very linear and depend on calculate previous data) can be done in paralel.

So as I think generate the secuence from the first to elements 1, 1 and then continue adding, this dont seem adecuate for paralelise, also I can take the pascal triangle (not a line) and generate the piramid like dependent on the previous line (I mean, the calculation seem secuencial and doesn’t look that can be done in paralel).

Sorry for not be able to explain correctly but Im not natal english speaker and Im new to this way to solve a problem in parallel, hope I do it this time.

mmm, Im doing this for the “shake” or curiosity… :blink:

I don’t think you can easily generate fibonacci numbers in parallel. That you would generate on CPU, move to GPU and process afterwards. There are some things not easily done on GPU I am afraid ;)