Lower register count with cuda toolkit 3.0 beta

I installed and compiled my kernel with the new 3.0 beta toolkit and observed a much lower register count.

max-register is set to 42

With CUDA 2.3: 40 register were used.

With CUDA 3.0b: 24 registers are used.

1>ptxas info : Used 24 registers, 48+16 bytes smem, 12 bytes cmem[1]

I have unrolled my kernel manually so there was many redundant code. While CUDA 2.3 probably

put the calculated indexes into registers CUDA 3.0 recalculates this indexes every time.

This means many calculations have to be done again, but occupacy will be better.

Has someone else observed this?

The emulation mode does not work anymore … with beta 3.0. Device mode runs without errors.

First CUDA call returns “unspecified launch failure in prior launch”.

I compile 32 bit code on Win XP x64. Might be there something wrong?

unspecified launch failure in prior launch

invalid argument

unspecified launch failure in prior launch

invalid argument

unspecified launch failure in prior launch

unspecified launch failure in prior launch

unspecified launch failure in prior launch

unspecified launch failure in prior launch

unspecified launch failure in prior launch

invalid argument

invalid argument

invalid device function

invalid argument

Elapsed CPU time test: 2885.3024277  msec

Press any key to continue . . .

Am also having the same problem its so frustrating… I have to finish something soon and don’t have time to look into cuda gdb; as I don’t have computer science background.

Now the device emu doesn’t work hence I cant test my algorithm … as to why I am not getting correct results… I relied on simple printf’s in device emu till now for debugging and I was happy.

:(

N----I-----T--------I--------N,

Register as a developer (just enrol in the site and get registered in 2 mins - check the cuPrintf thread - tmurray started it) and get the cuPrintf package and be happy.

really you should just learn how to use gdb. cuda-gdb is basically the same as gdb, and I can’t imagine being anywhere close to as productive as I am currently without a decent debugger.

What’s the best way to learn cuda-gdb? It comes with a 14 page PDF which is a great start… there’s a 4 page walkthrough, but I want more of those walkthrough examples to get a feeling of how to use it to attack different kinds of problems. A video would also be useful… just a live capture of a demo on some apps so you can SEE the workflow.

Do you use gdb straight, with ddd, or in emacs?

:) lol … thanks… man.

I just registered … am waiting for the login details … I hope I get them soon (am time bound on a project)