CUDA 6.5 segfaults on ARMv8

Dear all,

I have a ARMv8 machine with a NVIDIA Tesla K40 running CentOS 7.1 with CUDA 6.5 (aarch64 version).
I have a problem when I try to call cudaHostAlloc(): there is a crash throwing an UnkwnonwError and a Segmentation Fault. It seems to me CUDA is not able to allocate memory on the host.
I can see the same problem on the bandwidthTest of CUDA Samples (see below) or in the simple test program below.
Pagesize of CentOS is 64K and I wonder if this could eventually compromise the execution of the call.
Does anyone has any other idea on what could be the origin of the problem?

Thank you all.

bandwidth CUDA Sample log

[root@localhost bandwidthTest]# ./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Tesla K40t
Quick Mode

CUDA error at code=30(cudaErrorUnknown) “cudaHostAlloc((void **)&h_odata, memSize, (wc) ? cudaHostAllocWriteCombined : 0)” - Simple test program

#include <stdio.h>

int main()
int in_h;
const int length = 1000;
//in_h = (int
) malloc(lengthsizeof(int)); //works
cudaError_t err = cudaHostAlloc((void**)&in_h,length
printf(“Error status is %s\n”,cudaGetErrorString(err));
for (int i = 0;i<length;i++)
in_h[i]=2; // Segfaults here

Output of simple test program

[root@localhost]# nvcc -target-cpu-arch ARM -o test
[root@localhost]# ./test
Error status is unknown error
Segmentation fault

This is expected, and it has nothing to do with CUDA. Since the memory allocation fails, in_h is an uninitialized pointer, and it is exceedingly likely that it points to memory that does not belong to the process. Therefore, accessing memory through that pointer will cause an access violation, which in Unix-like operating systems is reported as a segfault.

In any C-like language that provides explicit pointers that can point anywhere in memory space, it is the programmer’s responsibility to make sure pointers are valid before they are dereferenced, and in particular to check whether allocation designed to initialize those pointers succeeded and to deal with such events in an orderly fashion.

cudaHostAlloc() is just a thin wrapper around OS API calls, in particular mmap() on Linux.

Hi Njuffa,

I know that be CudaHostAlloc() to allocate the memory on host and that can be mapped by device with zero copy. I’m not good at CUDA code, so you can give me an accurate use code using CudaHostAlloc() for test?


See post #10.

See post #10.

See post #10.

See post #10.

See post #10.

See post #10.

Hi Njuffa,

thank you for your reply.
I understand that if the memory allocation fails than we get a segfault.
What I am trying to understand is why the memory allocation fails.
That piece of code works fine in a x86 machine but not on the aarch64 machine.
Therefore I am trying to guess whether it could be a driver issue or a kernel issue.

I would have expected cudaHostAlloc() to work natively, without problems.
But this seems not to be the case since even the CUDA sample using cudaHostAlloc() fails.
As HuyLe pointed out, could you please help us getting a correct piece of code for testing CudaHostAlloc()?


PS: Sorry for the mess of multiple messages, there was an issue with my browser.

I have not used CUDA on an ARM system. I can only speculate.

What specific ARM system do you have? Maybe you are on a system version not supported by CUDA (check the Getting Started document for your operating system)? Maybe the OS is configured not to allow pinned allocations? Maybe the specific type of pinned memory you are requesting (cudaHostAllocWriteCombined) is not available on ARM? Maybe cudaHostAlloc() is not supported at all on ARM platforms?

I have a system with an APM XGene1 ARM 64-bit CPU.
I tried with all the specific types of pinned memory (default,portable,mapped,writecombined), but none of them works.
I wonder if the OS is configured not to allow pinned allocation: is it possible to retrieve such information from the OS environment itself?

Thanks again.

As I stated, I do not have any experience with CUDA on ARM platforms. The CUDA 6.5 documentation states:

“ARM-64 support has been tested on systems from Cirrascale and E4 based on the AppliedMicro X-Gene-C1 processor”

If your system is one of the two listed above, one would think all CUDA APIs should be supported, unless the documentation specifically states otherwise. If your system is not one of the ones listed above, all bets are off. I would suggest checking with the system vendor, and forums dedicated to these ARM-64 platforms and their operating systems.