wonghoi
November 21, 2010, 10:45pm
1
I am writing a circular buffer class (actually a struct) for running on the device (i.e. kernel). How can I allocate memory in the constructor? It seems like I wasn’t allowed to allocate memory once I’m running inside a kernel, so any suggestion on how to implement this?
Thanks.
wonghoi
November 21, 2010, 10:45pm
2
I am writing a circular buffer class (actually a struct) for running on the device (i.e. kernel). How can I allocate memory in the constructor? It seems like I wasn’t allowed to allocate memory once I’m running inside a kernel, so any suggestion on how to implement this?
Thanks.
cuda 3.2 makes this possible with fermi class hardware. Check the malloc function call support that is new in 3.2
cuda 3.2 makes this possible with fermi class hardware. Check the malloc function call support that is new in 3.2
wonghoi
November 22, 2010, 9:06am
5
I’m using GTX 450 (Fermi) with CUDA Toolkit 3.2, and the host is MATLAB (so I’m compiling the PTX file first). For the two lines in my kernel,
char* ptr = (char*)malloc(123);
free(ptr);
I got:
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(146): error: calling a host function from a device /global function is not allowed
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(147): error: calling a host function from a device /global function is not allowed
Did I miss something? The same thing happens when I tried printf().
wonghoi
November 22, 2010, 9:06am
6
I’m using GTX 450 (Fermi) with CUDA Toolkit 3.2, and the host is MATLAB (so I’m compiling the PTX file first). For the two lines in my kernel,
char* ptr = (char*)malloc(123);
free(ptr);
I got:
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(146): error: calling a host function from a device /global function is not allowed
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(147): error: calling a host function from a device /global function is not allowed
Did I miss something? The same thing happens when I tried printf().
I’m using GTX 450 (Fermi) with CUDA Toolkit 3.2, and the host is MATLAB (so I’m compiling the PTX file first). For the two lines in my kernel,
char* ptr = (char*)malloc(123);
free(ptr);
I got:
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(146): error: calling a host function from a device /global function is not allowed
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(147): error: calling a host function from a device /global function is not allowed
Did I miss something? The same thing happens when I tried printf().
the -arch sm_20 command line option to nvcc is probably needed. You can check how printf should work since there is a printf example in the SDK.
I’m using GTX 450 (Fermi) with CUDA Toolkit 3.2, and the host is MATLAB (so I’m compiling the PTX file first). For the two lines in my kernel,
char* ptr = (char*)malloc(123);
free(ptr);
I got:
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(146): error: calling a host function from a device /global function is not allowed
D:/Work/Research/MAF/simulation/gpu/sysID_branch.cu(147): error: calling a host function from a device /global function is not allowed
Did I miss something? The same thing happens when I tried printf().
the -arch sm_20 command line option to nvcc is probably needed. You can check how printf should work since there is a printf example in the SDK.
wonghoi
November 23, 2010, 7:03pm
9
After I changed from -arch=sm_13 to -arch=sm_20 and I got the following error when MATLAB is trying to generate the kernel from the PTX file:
"??? Error using ==> parallel.gpu.CUDAKernel
An error occurred during PTX compilation of .
The information log was:
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_136’
: Retrieving binary for ‘cuModuleLoadDataEx_136’, for gpu=‘sm_21’, usage mode=’ ’
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_136’
: Control flags for ‘cuModuleLoadDataEx_136’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_136’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_136’, for gpu=‘sm_21’, ocg options=’
The error log was:
The CUDA error code was: CUDA_ERROR_INVALID_IMAGE."
Any ideas?
wonghoi
November 23, 2010, 7:03pm
10
After I changed from -arch=sm_13 to -arch=sm_20 and I got the following error when MATLAB is trying to generate the kernel from the PTX file:
"??? Error using ==> parallel.gpu.CUDAKernel
An error occurred during PTX compilation of .
The information log was:
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_136’
: Retrieving binary for ‘cuModuleLoadDataEx_136’, for gpu=‘sm_21’, usage mode=’ ’
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_136’
: Control flags for ‘cuModuleLoadDataEx_136’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_136’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_136’, for gpu=‘sm_21’, ocg options=’
The error log was:
The CUDA error code was: CUDA_ERROR_INVALID_IMAGE."
Any ideas?
I see messages of sm_21. Do you have a sm_21 device?
I see messages of sm_21. Do you have a sm_21 device?
wonghoi
November 24, 2010, 6:11am
13
I have no idea. It’s a GTS 450. I changed to arch=sm_21 and got this:
??? Error using ==> parallel.gpu.CUDAKernel
An error occurred during PTX compilation of .
The information log was:
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_139’
: Retrieving binary for ‘cuModuleLoadDataEx_139’, for gpu=‘sm_21’, usage mode=’ ’
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_139’
: Control flags for ‘cuModuleLoadDataEx_139’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_139’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_139’, for gpu=‘sm_21’, ocg options=’
The error log was:
The CUDA error code was: CUDA_ERROR_INVALID_IMAGE.
wonghoi
November 24, 2010, 6:11am
14
I have no idea. It’s a GTS 450. I changed to arch=sm_21 and got this:
??? Error using ==> parallel.gpu.CUDAKernel
An error occurred during PTX compilation of .
The information log was:
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_139’
: Retrieving binary for ‘cuModuleLoadDataEx_139’, for gpu=‘sm_21’, usage mode=’ ’
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_139’
: Control flags for ‘cuModuleLoadDataEx_139’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_139’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_139’, for gpu=‘sm_21’, ocg options=’
The error log was:
The CUDA error code was: CUDA_ERROR_INVALID_IMAGE.
the devicequery example from the sdk can tell you the compute capability of your card. My guess is that your GPU is not 2.x capable so it cannot do in-kernel malloc on the device.
the devicequery example from the sdk can tell you the compute capability of your card. My guess is that your GPU is not 2.x capable so it cannot do in-kernel malloc on the device.
wonghoi
November 24, 2010, 7:40am
17
Here’s my GPU’s device query:
gpuDevice
ans =
parallel.gpu.CUDADevice handle
Package: parallel.gpu
Properties:
Name: 'GeForce GTS 450'
Index: 1
ComputeCapability: '2.1'
SupportsDouble: 1
DriverVersion: 3.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [65535 65535]
SIMDWidth: 32
TotalMemory: 1.0417e+009
FreeMemory: 996872192
MultiprocessorCount: 4
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
DeviceSupported: 1
DeviceSelected: 1
Seems like I have compute capability 2.1.
Thanks.
wonghoi
November 24, 2010, 7:40am
18
Here’s my GPU’s device query:
gpuDevice
ans =
parallel.gpu.CUDADevice handle
Package: parallel.gpu
Properties:
Name: 'GeForce GTS 450'
Index: 1
ComputeCapability: '2.1'
SupportsDouble: 1
DriverVersion: 3.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [65535 65535]
SIMDWidth: 32
TotalMemory: 1.0417e+009
FreeMemory: 996872192
MultiprocessorCount: 4
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
DeviceSupported: 1
DeviceSelected: 1
Seems like I have compute capability 2.1.
Thanks.
Then I would file a bug-report with the Mathworks. ( support@mathworks.com : their support is great)
Then I would file a bug-report with the Mathworks. ( support@mathworks.com : their support is great)