Please provide the following info (tick the boxes after creating this topic):
DRIVE OS 6.0.4 SDK
Target Operating System
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
SDK Manager Version
Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
I am currently dealing with a segmentation model whose output blob dimensions are [640. 480, 26]. Each pixel contains 26 probabilities. I assign every pixel the index of max probability for that pixel and then, changing the class label with its RGB color, thus converting the output to [640, 480, 3]. I do this by looping through the pointer.
This is very inefficient and am also not able to convert the 3D vector to ImageGL for rendering. I have shared the conversion I want below:
std::vector<std::vector<std::vector<uint32_t>>> mask_vector <-> dwImageGL mask_gl;
I would like to know a method which is more tightly integrated into the driveworks frameworks and is fast.
Could you please confirm the used platform if it is DRIVE AGX ORIN Devkit?
As this post processing is specific to your network, you need to implement as per the requirement. I am assuming this post processing step in happening on CPU now. How about using a GPU kernel to convert [640. 480, 26] to [640, 480, 3]?
Dear @SivaRamaKrishnaNV ,
Before getting into this post-processing, I would like to know whether the output i get from
dwDNN_inferRaw contains the same dimensions as the outputBlob, or is it a linear vector which i will have to convert into a 3D array. The reason, I am asking this is because i only have an outputDevice pointer pointing to the output in the CUDA memory and I don’t have a way to validate its dimensions.
My Output blob dimensions are
[26, 480, 640]. Is this an pointer to
dwImageCUDA or something?
dOutput param in
dwDNN_inferRaw is an array of output buffer pointers. If your network has two output buffer. This
param holds two pointers. Each output pointer points to CUDA device pointer(
As I understand, output buffer in your case should be linear memory of size
26*480*640 in your case. You can access it directly inside a CUDA kernel for verification or copy to CPU buffer using
@SivaRamaKrishnaNV The output is stored in linear memory. But, is the linear memory contiguous or is there some pitch/stride/offset assigned to it. If there is, then how can I find the pitch/stride/offset of the model output.
As I understand, It should be single array with
26*480*640 size. Please check accessing inside a cuda kernel or copy to CPU with for verifcation.