PTXAS Fatal: Memory Allocation Failure

clwill · June 16, 2016, 8:24pm

I’m trying to build/run even the simplest CUDA app with no success.

I’ve installed the 7.5 toolkit on WIn10, running VS2013. I create a new CUDA project, cut and paste any one of the Thrust example apps into it. It compiles just fine (a bunch of Thrust warnings, but it compiles and links). When I go to run them (again this is ANY sample app), it takes forever and finally says “PTXAS Fatal: Memory Allocation Failure”. I’ve stepped into the code, and it happens on the first line of code that creates any variable.

Again, I’ve tried this on several different samples, anyone have any idea what’s up?

njuffa · June 16, 2016, 9:45pm

In all my years of using CUDA, I don’t think I have ever seen that. My first thought was “corrupt installation”. Did you have a previous version of CUDA installed on this machine? I assume this is a 64-bit windows system with plenty of system memory? Are you able to successfully run trivial CUDA samples that don’t use Thrust?

The fact that the error occurs when you run the app suggests that JIT compilation is being used which would indicate that your build did not specify the target GPU architecture appropriate for your GPU. What is your GPU and what target architecture do you specify in your build (e.g. -arch switch)?

Depending on how much code is being JIT compiled (and I could imagine its a lot when you use Thrust) program startup could take a long time due to JIT compilation overhead. I am not aware of any code size limitation on the JIT compiler, what you see may indicate a bug for which a bug report should be filed. Are you using the latest driver package (the JIT compiler is part of the CUDA driver)?

Robert_Crovella · June 17, 2016, 2:28am

Are you building a debug project or a release project?

Are you building a 32-bit app or a 64-bit app?

What is the GPU you are running on?

Have you specified the correct GPU arch in your compilation command?

clwill · June 17, 2016, 5:11pm

Thank you both for your help. More information:

I downloaded the very latest CUDA from NVidia, and installed it on a virgin installation of VS 2013. I open up VS 2013, click on “new project”, select “NVIDIA CUDA 7.5 Project”, and give it a name. It creates a new project, with a file called “kernel.cu” that has some basic CUDA Sample code in it.

If I simply click on “run”, it builds, and runs fine. Does some simple demo and a printf. Works OK.

If, however, I select all in this .cu file, and cut/paste in ANY of the Thrust examples (from the Thrust GitHub location), what I describe happens. It builds to completion., although there is a flurry of warnings from within Thrust that say “decorated name length exceeded, name was truncated”. I haven’t a clue what that means, yet the names involved are “yuge”. For example, here’s one:

‘thrust::detail::cons<T0,thrust::detail::cons<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::pointer<int,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::permutation_iterator<thrust::detail::normal_iterator<thrust::device_ptr>,thrust::detail::normal_iterator<thrust::pointer<int,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>>,thrust::transform_iterator<thrust::system::cuda::detail::reduce_by_key_detail::tuple_and,thrust::zip_iterator<thrust::tuple<thrust::transform_iterator<thrust::detail::tail_flags<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::pointer<int,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::detail::normal_iterator<thrust::pointer<bool,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,thrust::equal_to<thrust::tuple<int,bool,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,bool,int>::tail_flag_functor,thrust::counting_iterator<IndexType,thrust::use_default,thrust::use_default,thrust::use_default>,thrust::use_default,thrust::use_default>,thrust::detail::normal_iterator<thrust::pointer<bool,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,thrust::use_default,thrust::use_default>,thrust::permutation_iterator<thrust::detail::normal_iterator<thrust::device_ptr>,thrust::detail::normal_iterator<thrust::pointer<int,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>>,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,thrust::detail::cons<thrust::detail::wrapped_function<thrust::detail::binary_transform_if_functor<thrust::plus,thrust::identity>,void>,thrust::detail::cons<int,thrust::detail::map_tuple_to_consthrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type::type>>>>::cons’

Again, lord only knows what that all means.

TXBob, this is, apparently, a 32-bit debug version of the app (I tried a 64-bit version, no change). I’m running the latest Win10 updates, Win10 Pro 64-bit OS on an x64 processor. It is on a good sized machine with an i7, 16gb of memory and two EVGA GeForce GTX 980s in it. I’m not specifying the GPU architecture, it’s the default, and the gencode string is "-gencode=arch=compute_20,code="sm_20,compute_20" which appears to match this architecture.

Thanks for any help.
Chris

clwill · June 17, 2016, 5:34pm

Further research into this “decorated name length” error is that it’s harmless and only causes debugging issues. Not sure why I’d be the first/only one to see it?? No one else gets this?

njuffa · June 17, 2016, 5:52pm

GTX 980 is sm_52, not sm_20 (the compiler uses the least capable supported architecture as the default). By specifying the correct architecture, you will avoid JIT compilation from PTX.

clwill · June 17, 2016, 7:06pm

Thanks. So what exactly do I want there? compute_20,sm_52? Or…?

Thanks so much, I’m a newbie at CUDA – obviously :)

clwill · June 17, 2016, 7:44pm

Cool! I changed to compute_52,sm_52 and it works!

Great. Thanks for your help. Weird that I got the error in the first place? But at least I’m moving ahead.

clwill · June 17, 2016, 8:16pm

Further, Note that I played around, and it seems as long as I’m 5.x or later, it works.

My issue has been that my eventual target architecture is the Jetson TX1, and I want to mimic that during development. So I’ll go for 5.3 (the TX1s level)

Thanks SO much for your help.
Chris

pennny · April 10, 2017, 4:08am

Thanks! It’s great that this post helps me to solve the same issue I have met. I changed the “compute_20,sm_20” to “compute_52,sm_52” and it still didn’t works.I failed to find the compute_ and sm_ for my GPU device (GTX850), so as a wild guess, I changed to “compute_50,sm_50” instead, problem solved!

So may I check how I can know the appropriate/optimal setting for my GPU device? Thanks!

Cheers
Penney

njuffa · April 10, 2017, 5:13pm

IMHO, the easiest way to find your GPU’s compute capability is to consult the handy table in Wikipedia ([url]https://en.wikipedia.org/wiki/CUDA[/url]), which shows the GTX 850M to be a device with compute capability 5.0.

Topic		Replies	Views
Error when compiling for architectures > 3.5 CUDA Programming and Performance	9	1439	July 4, 2016
Determining correct compute capability for a loaded PTX file/kernel ? CUDA Programming and Performance	10	2619	February 11, 2015
Bug with integer division? CUDA Programming and Performance	33	9363	September 9, 2015
CUDA 8.0.26 Inline PTX "addc" Bug CUDA Programming and Performance	18	1490	December 26, 2017
Error in my code... CUDA Programming and Performance	11	2539	December 19, 2014
Running PTX Code from CUDA 4.0 in CUDA 4.1 or CUDA 4.2 CUDA Programming and Performance	5	2477	May 30, 2012
Help me understand "-Xptxas -dlcm=cg" (take 2) CUDA Programming and Performance	1	6993	November 24, 2010
Can no longer create backward compatible CUDA binary with Titan V and CUDA 9 CUDA Setup and Installation	4	1048	August 2, 2018
Using Thrust to sort Unified Memory Buffer? GPU-Accelerated Libraries	8	5125	May 7, 2015
VMD cannot detect CUDA properly CUDA Setup and Installation	10	6948	October 7, 2016

PTXAS Fatal: Memory Allocation Failure

Related topics