Passing batched input from 2 DNNs sequentially

Hi, I am trying feed two DNN sequentially with batched input with batchsize 6. First model takes the inputs from 6 different source and outputs 2 tensors with shape Nx128x128x128,Nx256x64x64. Both DNNs bin files created for batchsize=6. Then I want to feed this two tensors to second DNN as input. I can pass input with N(batcsize)=6 from first model but while passing the output of first DNN to second as input I am getting runtime error such that:

terminate called after throwing an instance of 'std::runtime_error'
  what():  [2020-09-01 11:19:25] CUDA Error an illegal memory access was encountered executing CUDA function:
 cudaMemcpy(m_dnnOutputsHost[0].get(), m_dnnOutputsDevice_fsDecoder[0],sizeof(float32_t) * m_totalSizesOutput[2], cudaMemcpyDeviceToHost)
 at /home/user/agx_ws/src/agx/driveworks_drivers/src/camera/SimpleCameraBasedDetection.cpp:506

The code section that I use to infer as follows:

CHECK_DW_ERROR(dwDataConditioner_prepareData(m_dnnInputDevice, inputImgs, 6, &m_detectionRegion,
                                                 cudaAddressModeClamp, m_dataConditioner));

// Run DNN on the output of data conditioner
cudaEvent_t start, start_2, stop_1, stop_2;
cudaEventCreate(&start);
cudaEventCreate(&stop_1);
cudaEventCreate(&stop_2);
cudaEventRecord(start);

CHECK_DW_ERROR(dwDNN_infer(m_dnnOutputsDevice_cnnBackbone, &m_dnnInputDevice, 6U, m_dnn[0]));

cudaEventRecord(stop_1);
cudaEventSynchronize(stop_1);
cudaEventCreate(&start_2);
cudaEventRecord(start_2);

auto fslnDecoderInput = static_cast<const float32_t* const*>(m_dnnOutputsDevice_cnnBackbone);
CHECK_DW_ERROR(dwDNN_infer(m_dnnOutputsDevice_fsDecoder, fslnDecoderInput, 6U, m_dnn[1])); 
cudaEventRecord(stop_2);
cudaEventSynchronize(stop_2);

When I changed batchsize to 3 for second dwDNN_infer it just works fine

CHECK_DW_ERROR(dwDNN_infer(m_dnnOutputsDevice_fsDecoder, fslnDecoderInput, 3U, m_dnn[1])); 

Can you give me some clue to solve this problem? Thanks in advance.

Hi,

It looks like you are using Driveworks API.
May I know which system are you using? Drive or Jetson?

Thanks.

I am using Drive AGX. But I am getting mentioned error on my host machine. After getting it running on my host machine I will run it on AGX.

Hardware Platform: [DRIVE AGX Xavier™ Developer Kit, DRIVE AGX Pegasus™ Developer Kit]
Software Version: [DRIVE Software 10]
Host Machine Version: [Ubuntu 18.04]
SDK Manager Version: [1.0.1.5538]

Thanks.

Moving this topic to DRIVE forum for resolutions.

Dear @mugurcal,
As per the error, you are trying to access illegal memory. Could you check if the allocated memory and size you are trying to copy in cudamemcpy() is same?

I have checked all the memory sizes there is no difference between the memory sizes allocated and size that I have been trying to memcopy.

After adding another model (now I have 3 DNNs, 2nd and 3rd one getting input from 1st DNN output) it showed another error with cudaErrorIllegalAddress. I am able to run 3 DNNs sequentially 1st and 3rd one with batchsize = 6 and 2nd one with batchsize = 3. When I tried to run all of them with batchsize = 6 I got the following error messages:

[07-09-2020 16:23:32] engine.cpp (212) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[07-09-2020 16:23:32] engine.cpp (212) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[07-09-2020 16:23:32] Driveworks exception thrown:  Line: , error cudaErrorIllegalAddress: an illegal memory access was encountered/dvs/git/dirty/gitlab-master_av/dw/sdk/src/dwshared/dwfoundation/dw/cuda/GraphRecorder.hpp:154

terminate called after throwing an instance of 'std::runtime_error'
  what():  [2020-09-07 16:23:32] DW Error DW_CUDA_ERROR executing DW function:
 dwDNN_infer(m_dnnOutputsDevice_odDecoder, DecoderInput, 6U, m_dnn[2])
 at /home/usr/agx_ws/src/driveworks_drivers/src/camera/SimpleCameraBasedDetection.cpp:545

Maybe this would give some clue.
Thanks.

Dear @mugurcal,
Could you just check running each DNN alone with dummy data and see if memcpy hits issue? Check copying each DNN output after inference for vefication. Please insert cudaGetLastError API calls at suspected places in code to know exactly where it is failing in the code.

Dear @mugurcal,
Any update on the issue?

Issue solved. Sorry, I forgot to update post. Thanks for your interest @SivaRamaKrishnaNV.