[SOLVED] Jetson TK1 Max GPU but still slow?

I made a lane detection system. Im trying on video like video-processing. It is 2 fps. I tried to use pycuda, my tegrastat like that:

RAM 1088/1892MB (lfb 3x4MB) cpu [9%,38%,15%,19%]@2065 EMC 9%@924 AVP 0%@204 VDE 120 GR3D 99%@852 EDP limit 0

I have to solve this problem. Please help me.

Example code with pycuda:

hls = cv2.cvtColor(frame, cv2.COLOR_RGB2HLS).astype(np.uint8) ##HLS renk formatina cevirdik ve kanallari ayistirdim.


    hls_gpu = cuda.mem_alloc(hls.nbytes)
    cuda.memcpy_htod(hls_gpu, hls)
    mod6 = SourceModule("""
        __global__ void hls(float *hls_gpu)
       {
          int idx7 = threadIdx.x + threadIdx.y*3;
           hls_gpu[idx7] *= 1;
       }
       """)

    func6=mod6.get_function("hls")
    func6(hls_gpu, block=(3,3,1))
    gpuhls = np.empty_like(hls)
    cuda.memcpy_dtoh(gpuhls , hls_gpu)

Does tegrastat always have 99% gpu usage? Does all the overhead lie in cuda memcpy and kernel?

Hi mustafamertunali,

Have you clarified the cause and resolved the problem?
Any further information can be shared?

Thanks

Of course, I can share. I have solved the problem with changing functions. I removed the functions and I used the cv2 functions in while.

Old code:

def a(frame):
         hls = cv2.cvtColor(frame, cv2.COLOR_RGB2HLS).astype(np.uint8)
         return a
      
result = a(frame)
cv2.imshow('Serit Takip Sistemi', result)

It was so slow… So I changed the code like this:

hls = cv2.cvtColor(frame, cv2.COLOR_RGB2HLS).astype(np.uint8)
cv2.imshow('Serit Takip Sistemi', hls)

I think this isnt what I realy did but it is easy example because I had alots of functions so I removed the functions and I used the functions like my last code. Sorry for my english I thought that is better way to explain. :)

Edit: It is video processing.

Thanks for sharing.