Hi, I am trying feed two DNN sequentially with batched input with batchsize 6. First model takes the inputs from 6 different source and outputs 2 tensors with shape Nx128x128x128,Nx256x64x64. Both DNNs bin files created for batchsize=6. Then I want to feed this two tensors to second DNN as input. I can pass input with N(batcsize)=6 from first model but while passing the output of first DNN to second as input I am getting runtime error such that:
terminate called after throwing an instance of 'std::runtime_error' what(): [2020-09-01 11:19:25] CUDA Error an illegal memory access was encountered executing CUDA function: cudaMemcpy(m_dnnOutputsHost.get(), m_dnnOutputsDevice_fsDecoder,sizeof(float32_t) * m_totalSizesOutput, cudaMemcpyDeviceToHost) at /home/user/agx_ws/src/agx/driveworks_drivers/src/camera/SimpleCameraBasedDetection.cpp:506
The code section that I use to infer as follows:
CHECK_DW_ERROR(dwDataConditioner_prepareData(m_dnnInputDevice, inputImgs, 6, &m_detectionRegion, cudaAddressModeClamp, m_dataConditioner)); // Run DNN on the output of data conditioner cudaEvent_t start, start_2, stop_1, stop_2; cudaEventCreate(&start); cudaEventCreate(&stop_1); cudaEventCreate(&stop_2); cudaEventRecord(start); CHECK_DW_ERROR(dwDNN_infer(m_dnnOutputsDevice_cnnBackbone, &m_dnnInputDevice, 6U, m_dnn)); cudaEventRecord(stop_1); cudaEventSynchronize(stop_1); cudaEventCreate(&start_2); cudaEventRecord(start_2); auto fslnDecoderInput = static_cast<const float32_t* const*>(m_dnnOutputsDevice_cnnBackbone); CHECK_DW_ERROR(dwDNN_infer(m_dnnOutputsDevice_fsDecoder, fslnDecoderInput, 6U, m_dnn)); cudaEventRecord(stop_2); cudaEventSynchronize(stop_2);
When I changed batchsize to 3 for second dwDNN_infer it just works fine
CHECK_DW_ERROR(dwDNN_infer(m_dnnOutputsDevice_fsDecoder, fslnDecoderInput, 3U, m_dnn));
Can you give me some clue to solve this problem? Thanks in advance.