Hey All!
I am trying to get a GPGPU instance up and running on Ubuntu 12.04. I’m using an Amazon EC2 G2 instance with a GRID card:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GRID K520"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 4096 MBytes (4294770688 bytes)
( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock rate: 797 MHz (0.80 GHz)
Memory Clock rate: 2500 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 0 / 3
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GRID K520
Result = PASS
I am able to compile and run the code included in samples.
I am using PyCuda 2013.1. As a regular user, I can run the samples included with that.
In a Python shell, I can individually input a simple source file line-by-line and it loads/computes/unloads from the GPU successfully.
My trouble comes in when I attempt to do this in a task-oriented context. I am using Django for the web framework, Celery as a task manager, and a simple function decorated with the @task decorator:
import time
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
@task
def color_shift_average(image, shift, direction="fromWhite", log=1):
if log == 1 :
print ("----------> CUDA CONVERSION")
px = numpy.array(image)
print px
px = px.astype(numpy.float32)
d_px = cuda.mem_alloc(px.nbytes)
cuda.memcpy_htod(d_px, px)
#Kernel grid and block size
BLOCK_SIZE = 1024
block = (1024,1,1)
checkSize = numpy.int32(im.size[0]*im.size[1])
grid = (int(im.size[0]*im.size[1]/BLOCK_SIZE)+1,1,1)
#Kernel text
kernel = """
#include <stdlib.h>
#include <stdio.h>
.........
"""
#Compile and get kernel function
mod = SourceModule(kernel)
func = mod.get_function("foo")
# image, L, a, b
func(d_px, numpy.float32(-10.0), numpy.float32(0.0), numpy.float32(0.0), checkSize, block=block,grid = grid)
#Get back data from gpu
bwPx = numpy.empty_like(px)
cuda.memcpy_dtoh(bwPx, d_px)
bwPx = (numpy.uint8(bwPx))
.... (and so on) ...
I get a cuMemAlloc failed: not initialized error on the line containing
d_px = cuda.mem_alloc(px.nbytes)
Things I have tried:
- Doing the imports on a task-level basis instead of a global basis
- Decreasing concurancy to 1 thread
- Checking permissions on /dev/nv* (All 666)
- Explicitly initializing a CUDA context instead of using the autoinit.py.
import pycuda.driver as cuda # Initialize CUDA cuda.init() from pycuda.tools import make_default_context global context context = make_default_context() device = context.get_device() def _finish_up(): global context context.pop() context = None from pycuda.tools import clear_context_caches clear_context_caches() import atexit atexit.register(_finish_up)
- Reaching out to the PyCuda Developers/Mailing List
Because my code runs in a stand-alone python file (ie, not in the django/celery/apache-wsgi environment) I know it is not an issue with my kernel. I assume this is a permissions/threading/user issue, but I am unsure on how to proceed testing that assumption and fixing it. I could use some expertise here.
Thanks!
-Forrest