CUDA Toolkit and SDK 2.3 betas available to registered developers

theMarix · June 24, 2009, 7:20am

That’s understandable. Just to be clear, it will only suppress warnings within files in that directory, not globally. You wouldn’t be able to use libc without that option.

hqyang · June 25, 2009, 8:56am

My program using FFT became slow.

Windows Vista and XP, GTS8800, 512M, compiled with SM1.0.

P.M · June 25, 2009, 9:50am

Graet news! Could you provide us with some code samples?

Is it possible to allocate, memcpy half_float*?

Simon_Green · June 25, 2009, 12:52pm

CUDA doesn’t have a built-in half float type, so basically you allocate data using the unsigned short type (which is also 16-bits), and then convert it to float on reading using the new intrinsics.

If you want to use half-precision on the host, I would recommend using the half class included in the OpenEXR distribution (which is compatible with GPU halfs):
[url=“http://www.openexr.com/”]http://www.openexr.com/[/url]

YDD · June 25, 2009, 3:03pm

I’m trying to run the profiler from the 2.3beta (Fedora 10 x86_64). Whenever I try running one of my own programs, the profiler successfully runs it, but then has an “Error reading profiler output.” I can see a bunch of .csv files being produced though. And if I try running one of the SDK examples, it is profiled successfully. Has anyone else seen a similar problem?

I’ve tried adding cudaThreadExit() to the end of my program, and I’ve also got libstdc++.so.6 in /usr/lib/. Also, $LD_LIBRARY_PATH points to the cudaprof bin/ directory.

jph4599 · June 25, 2009, 3:10pm

Are you using the separately packaged profiler?

From the download site:

P.M · June 25, 2009, 3:14pm

I would like to use flt32 on the host, copy them into a flt16 array on the device, perform some computation on it, on download them back in flt32 on the host. How should I do that? (the idea beahind this is to fit a table twice bigger on the device)

Other point, are the specs of flt16 available? Does it have a better precision between [-1;1]?

YDD · June 25, 2009, 3:27pm

I am now (2.3.09), but that’s not fixed the problem.

Simon_Green · June 25, 2009, 4:03pm

To be clear, current GPUs don’t natively perform any computation on fp16 (half) values, they can only convert quickly from fp16 to fp32, perform computation in fp32, and then convert back to fp16. It is only really useful as a storage format.

There are details on half precision here:

http://en.wikipedia.org/wiki/Half_precision

http://www.nvidia.com/dev_content/nvopengl…float_pixel.txt

P.M · June 25, 2009, 4:06pm

Got it! Thanks

jlbriz · July 2, 2009, 6:17pm

Is there a text version of cudaprof, or is there a way to use it in text mode? (There was a text version some time in the past, I think)

THX

-JL

MisterAnderson42 · July 2, 2009, 6:39pm

Yes. There always has been a way to profile from the command line. Just read the manual… /cuda_install_location/doc/CUDA_Profiler_2.2.txt

bbales2 · July 6, 2009, 7:01pm

Any news on the Ocelot release?

Ben

maolimu · July 7, 2009, 1:03pm

Does CUDA 2.3 support texturing from fp16 arrays?
And linear memory?

Did anyone try this?

rohitntu · July 16, 2009, 11:27am

Hi Simon,

Could you please elaborate on where I can find some documentation on using the new intrinsics to convert from half (stored as unsigned short) to float? Is this part of CUDA 2.2?

Thanks,

Rohit

SPWorley · July 16, 2009, 6:20pm

The 2.3 programming guide, section B.9 Time Function… is unchanged since the 2.0 docs.

This paragraph is confusing mostly because it’s wrong. The result of clock() is not per thread at all. A simpler paragraph would work better, something like:

There’s two other unanswerered questions about clock(). Do other blocks on the SM affect clock, or is it really per-block? (I think, but am not positive, that it’s per block. Which means the first sentence of the above paragraph should say “per block” and not “per multiprocessor”)

Do memory latency stall waits (when all threads are paused) get counted in clock()? (I think, but am not positive, that no, there’s no increment during those waits.)

bb_vb · July 17, 2009, 4:13am

Will CUFFT ever support asynchronous execution, and streams? It would be really useful for improving throughput :)

Also, built-in support for (1D at least) FFTs of unlimited size (i.e., memory limited) would be sweet. As was mentioned on these forums before, you can do big FFTs using smaller ones and a matrix transpose; so could this be built in to CUFFT?

Finally, is there an expected date for the 2.3 release?

gatoatigrado · July 19, 2009, 9:25pm

can [linux] 2.6.30 be supported please? there’s some change in some of the header files, so the driver won’t compile.

thanks much,
nicholas

jack · July 20, 2009, 5:02am

I’m not sure if this has been updated already in 2.3, but something I just noticed in 2.2 is that the syntax highlighting (usertype.dat) doesn’t highlight __threadfence() and __threadfence_block(). Kind of small, but there you go…

danko9 · July 22, 2009, 10:32pm

Hi Simon,

Could you please explain this in more detail? Also, is this supported by the runtime API?

So I have an array of 1000 halfs. I declare an unsigned short type pointer and cudamalloc 4000 bytes. Then, when I want to use this array in a kernel, what do I use for this array in the function signature? Unsigned short *? If so, then how do I let CUDA know that I want it to treat this array like floats?

I am sorry to be borish, but this question has a rather urgent business need, and we will soon be a big customer. Thanks.

Topic		Replies	Views
CUDA Toolkit and SDK 2.3 released CUDA Programming and Performance	127	319919	November 3, 2009
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430718	March 25, 2010
CUDA 2.2 beta features CUDA Programming and Performance	146	126617	May 19, 2009
CUDA Toolkit and SDK v2.2 released CUDA Programming and Performance	59	64949	January 25, 2011
CUDA Toolkit 3.0 released CUDA Programming and Performance	62	26419	September 21, 2010
CUDA Toolkit 3.2 release candidate available to registered developers CUDA Programming and Performance	68	63384	December 3, 2010
CUDA 2.1 beta CUDA Programming and Performance	49	67409	December 3, 2008
CUDA 2.0 Beta 2 GTX support, more Linux distros... CUDA Programming and Performance	29	55823	October 30, 2008
CUDA 1.0 FAQ (OBSOLETE) Frequently asked questions about CUDA Announcements	2	75909	February 9, 2009
CUDA 2.1 discussion CUDA Programming and Performance	71	64347	February 17, 2009

CUDA Toolkit and SDK 2.3 betas available to registered developers

Related topics