Xavier NX 16 and 4 cameras with jetson-inference - some common questions

Hi!

Please, some look into my questions and try to give me some hints. I know 6 questions is a lot but please look and maybe you can help me with a few?

I think those questions and answers are something what some other people might find useful also.

For a 8 months I have been developing a solution to detect defects on white wooden details (or blanks) - have learned a lot about machine vision and about python and about Dusty’s jetson-inference. But i have had in mind some questions, maybe someone gives me some hints please.

  1. I’d like to use with Jetson Inference 1920x1080 and 60FPS with 4 cameras but getting this notice:

gstreamer] gstBufferManager – map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstBufferManager recieve caps: video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)30/1, format=(string)NV12
[gstreamer] gstBufferManager – recieved first frame, codec=raw format=nv12 width=1280 height=720 size=1382407

Any ideas how to solve this issue? With 1280x720 and 30FPS and 4 cameras it’s working.

  1. If i view Jetson Power GUI and run detectnet.py ( dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson. (github.com)) then I dont see that engines like “dla0”, “dla1”, etc are used? All engines appears to be offline. What does it mean? I think if we use all the NX resources or not? (or this stats doesnt show right information)

  2. In my script i use only detection and not saving or processing or any other things - but i see that GPU usage jumps from time to time to 80% - its strange (it’s not a problem, but I still think why it is so):

#lets configure AI network
net = jetson_inference.detectNet(argv=[‘–model=/home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx’, ‘–labels=/home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’, ‘–confidence=0.7’, ‘–input-width=1980’, ‘–input-height=1080’, ‘–input-rate=60’])

#lets configure cameras
camera1 = jetson_utils.videoSource(“csi://0”) # select camera 1 - Capture a frame and return the cudaImage
camera2 = jetson_utils.videoSource(“csi://4”) # select camera 2 - Capture a frame and return the cudaImage
camera3 = jetson_utils.videoSource(“csi://2”) # select camera 3 - Capture a frame and return the cudaImage
camera4 = jetson_utils.videoSource(“csi://1”) # select camera 4 - Capture a frame and return the cudaImage

while(config.run == 1):
dsid += 1 #this tells which series of detection it is - if DSID is the same in multiple items, then it means it was found on same image set (images from multiple cameras)

start_time = time.time() #time now
now = datetime.now()
current_time = now.strftime("%H:%M:%S")

try:
    img1 = camera1.Capture('rgba32f') #lets capture image from camera 1
    bimg1 = camera1.Capture('rgba32f') #lets capture image from camera 1 for saving it to file later
except:
    print("Camera 1 capture error")

if(config.detector == 1):
    dcounter = 0 #lets reset detection counter
    detections1 = net.Detect(img1, overlay="box,labels,conf") #overlay says how the defect is annotated on final image
    for detection1 in detections1:
        dcounter += 1 #lets add one to counter
        mvcam = 1 #camera ID 1
        dheight = int(detection1.Top)
        dright = int(detection1.Right)
        dleft = int(detection1.Left)
        dbottom = int(detection1.Bottom)
        dclassid = int(detection1.ClassID)
        class_name = net.GetClassDesc(dclassid)
        dconfidence = round(detection1.Confidence, 0) #lets use only integers (no point to be to percice)
        if(dclassid!=99 or emulate_empty !=1): #save to array if that's not empty sight
            filename = f'{image_folder}/{mvid}-1-{did}-{dsid}-{dclassid}.jpg'
            filename_to_db = f'/data/defect/{today}/{mvid}-1-{did}-{dsid}-{dclassid}.jpg'
            detection_array.append([mvid, mvcam, dsid, did, dcounter, dclassid, dconfidence, dleft, dheight, dright, dbottom, 0, 0, filename_to_db, "", img1, filename]) #lets add detection to array
            array_members = len(detection_array) #lets find how many members are in array
            if(array_members > 500): #lets delete if array is too big - dont know if 1000 is good number
                detection_array.clear() #lets clear the array
        if(dclassid==99 or emulate_empty == 1):
            empty += 1
            emptycounter += 1 #lets count how much emptys have we found

if(saveimage==1): #if we found something, lets save the picture also
        try:
            Mv_MakeFolders()
            cudaDeviceSynchronize()
            saveImageRGBA(filename, img1, 1280, 720)
            saveimage=0
        except:
            print(current_time + ": error saving image 1")
            continue
  1. If i use “cmake -DENABLE_NVMM=off …/” what disadvantages does it give? I think i don’t do any other things besides detecting.

  2. I use ONNX model. Is there any other scipts/solutions what I can use for detection in python? Basically I want to detect frame by frame only and to get output about coordinates, confident % and class ID?

  3. Is there any good solution for testing camera parameters on Jetson and CSI cameras? I think brightness, saturation, resolution, etc and see realtime image? Also it could be fun to play with some application where I can switch on and off some Jetson VPI funtctions like “eroda” and “tilate”? For example those: VPI - Vision Programming Interface: Algorithms (nvidia.com)

  4. At the moment I initialize one detection network and in the while loop I give it one frame from 4 cameras. Its the solution how I don’t use too much hardware resources. The question is - if i detect from different view angels defects - does it give any disadvantages? I mean - or i want to ask - is detectnet using somehow previous detections to detect better (does it learn realtime?)

  5. The last question - Let’s imagine that we have 4 or even 6 x FullHD 60FPS images from 4 cameras - how to detect objects from there without not overloading hardware? Any hints how to speed up the process? I use 512x512 detectin, because some defects needed to detect are quite small.

  6. Suggest how can I use NVENC0 and NVENC1 in python to compose a video realtime stream video? (lets assume that I have a frame: img1 = camera1.Capture(‘rgba32f’). Any examples?

  7. Does SSD v1or its training (i use training what comes with jetson inference) some kind of augmentation? I think if it for example uses image histogram view or tilts or rotates or makes some kind of augmentation turing training and turing detecting?

  8. Do you know baseboard for Jetson Orion or AGX where there are 4 or 6 CSI connectors?

And here is additonal log:

detectNet -- loading detection network model from:
          -- prototxt     NULL
          -- model        /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx
          -- input_blob   'input_0'
          -- output_cvg   'scores'
          -- output_bbox  'boxes'
          -- mean_pixel   0.000000
          -- class_labels /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/labels.txt
          -- class_colors NULL
          -- threshold    0.700000
          -- batch_size   1

[TRT]    TensorRT version 8.4.1
[TRT]    loading NVIDIA plugins...
[TRT]    Registered plugin creator - ::GridAnchor_TRT version 1
[TRT]    Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT]    Registered plugin creator - ::NMS_TRT version 1
[TRT]    Registered plugin creator - ::Reorg_TRT version 1
[TRT]    Registered plugin creator - ::Region_TRT version 1
[TRT]    Registered plugin creator - ::Clip_TRT version 1
[TRT]    Registered plugin creator - ::LReLU_TRT version 1
[TRT]    Registered plugin creator - ::PriorBox_TRT version 1
[TRT]    Registered plugin creator - ::Normalize_TRT version 1
[TRT]    Registered plugin creator - ::ScatterND version 1
[TRT]    Registered plugin creator - ::RPROI_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT]    Registered plugin creator - ::BatchTilePlugin_TRT version 1
[TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1
[TRT]    Registered plugin creator - ::CropAndResize version 1
[TRT]    Registered plugin creator - ::CropAndResizeDynamic version 1
[TRT]    Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
[TRT]    Registered plugin creator - ::ProposalDynamic version 1
[TRT]    Registered plugin creator - ::Proposal version 1
[TRT]    Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT]    Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT]    Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT]    Registered plugin creator - ::Split version 1
[TRT]    Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT]    Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT]    Registered plugin creator - ::InstanceNormalization_TRT version 2
[TRT]    Registered plugin creator - ::CoordConvAC version 1
[TRT]    Registered plugin creator - ::DecodeBbox3DPlugin version 1
[TRT]    Registered plugin creator - ::GenerateDetection_TRT version 1
[TRT]    Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[TRT]    Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[TRT]    Registered plugin creator - ::NMSDynamic_TRT version 1
[TRT]    Registered plugin creator - ::PillarScatterPlugin version 1
[TRT]    Registered plugin creator - ::VoxelGeneratorPlugin version 1
[TRT]    Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1
[TRT]    detected model format - ONNX  (extension '.onnx')
[TRT]    desired precision specified for GPU: FASTEST
[TRT]    requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]    [MemUsageChange] Init CUDA: CPU +181, GPU +0, now: CPU 320, GPU 4212 (MiB)
[TRT]    [MemUsageChange] Init builder kernel library: CPU +131, GPU +124, now: CPU 470, GPU 4352 (MiB)
[TRT]    native precisions detected for GPU:  FP32, FP16, INT8
[TRT]    selecting fastest native precision for GPU:  FP16
[TRT]    found engine cache file /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx.1.1.8401                              .GPU.FP16.engine
[TRT]    found model checksum /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx.sha256sum
[TRT]    echo "$(cat /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx.sha256sum) /home/vis                              ioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx" | sha256sum --check --status
[TRT]    model matched checksum /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx.sha256sum
[TRT]    loading network plan from engine cache... /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobil                              enet.onnx.1.1.8401.GPU.FP16.engine
[TRT]    device GPU, loaded /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx
[TRT]    [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 358, GPU 4368 (MiB)
[TRT]    Loaded engine size: 17 MiB
[TRT]    Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause err                              ors.
[TRT]    Deserialization required 24516 microseconds.
[TRT]    [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +16, now: CPU 0, GPU 16 (MiB)
[TRT]    Total per-runner device persistent memory is 0
[TRT]    Total per-runner host persistent memory is 76480
[TRT]    Allocated activation device memory of size 13266944
[TRT]    [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +13, now: CPU 0, GPU 29 (MiB)
[TRT]    The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEX                              PLICIT_BATCH flag. This function will always return 1.
[TRT]
[TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       70
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 13266944
[TRT]       -- bindings     3
[TRT]       binding 0
                -- index   0
                -- name    'input_0'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  4
                -- dim #0  1
                -- dim #1  3
                -- dim #2  512
                -- dim #3  512
[TRT]       binding 1
                -- index   1
                -- name    'scores'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  1
                -- dim #1  8190
                -- dim #2  15
[TRT]       binding 2
                -- index   2
                -- name    'boxes'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  1
                -- dim #1  8190
                -- dim #2  4
[TRT]
[TRT]    binding to input 0 input_0  binding index:  0
[TRT]    binding to input 0 input_0  dims (b=1 c=3 h=512 w=512) size=3145728
[TRT]    binding to output 0 scores  binding index:  1
[TRT]    binding to output 0 scores  dims (b=1 c=8190 h=15 w=1) size=491400
[TRT]    binding to output 1 boxes  binding index:  2
[TRT]    binding to output 1 boxes  dims (b=1 c=8190 h=4 w=1) size=131040
[TRT]
[TRT]    device GPU, /home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw3/ssd-mobilenet.onnx initialized.
[TRT]    detectNet -- number of object classes: 15
[TRT]    detectNet -- maximum bounding boxes:   8190
[TRT]    loaded 15 class labels
[TRT]    detectNet -- number of object classes:  15
[TRT]    loaded 0 class colors
[TRT]    didn't load expected number of class colors  (0 of 15)
[TRT]    filling in remaining 15 class colors with default colors
[gstreamer] initialized gstreamer, version 1.16.3.0
[gstreamer] gstCamera -- attempting to create device csi://0
[gstreamer] gstCamera pipeline string:
[gstreamer] nvarguscamerasrc sensor-id=0 saturation=2 ispdigitalgainrange='1 4' exposurecompensation=0 exposuretimerange='134000 158733000' ee                              -mode=2 ee-strength=1 gainrange='1 3' ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvi                              dconv flip-method=2 ! video/x-raw ! appsink name=mysink
GST_ARGUS: NvArgusCameraSrc: Setting ISP Digital Gain Range : '1 4'
GST_ARGUS: NvArgusCameraSrc: Setting Exposure Time Range : '134000 158733000'
GST_ARGUS: NvArgusCameraSrc: Setting Gain Range : '1 3'
[gstreamer] gstCamera successfully created device csi://0
[video]  created gstCamera from csi://0
------------------------------------------------
gstCamera video options:
------------------------------------------------
  -- URI: csi://0
     - protocol:  csi
     - location:  0
  -- deviceType: csi
  -- ioType:     input
  -- codec:      raw
  -- width:      1280
  -- height:     720
  -- frameRate:  30.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: rotate-180
  -- loop:       0
  -- rtspLatency 2000
------------------------------------------------
[gstreamer] gstCamera -- attempting to create device csi://4
[gstreamer] gstCamera pipeline string:
[gstreamer] nvarguscamerasrc sensor-id=4 saturation=2 ispdigitalgainrange='1 4' exposurecompensation=0 exposuretimerange='134000 158733000' ee                              -mode=2 ee-strength=1 gainrange='1 3' ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvi                              dconv flip-method=2 ! video/x-raw ! appsink name=mysink
GST_ARGUS: NvArgusCameraSrc: Setting ISP Digital Gain Range : '1 4'
GST_ARGUS: NvArgusCameraSrc: Setting Exposure Time Range : '134000 158733000'
GST_ARGUS: NvArgusCameraSrc: Setting Gain Range : '1 3'
[gstreamer] gstCamera successfully created device csi://4
[video]  created gstCamera from csi://4
------------------------------------------------
gstCamera video options:
------------------------------------------------
  -- URI: csi://4
     - protocol:  csi
     - location:  4
     - port:      4
  -- deviceType: csi
  -- ioType:     input
  -- codec:      raw
  -- width:      1280
  -- height:     720
  -- frameRate:  30.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: rotate-180
  -- loop:       0
  -- rtspLatency 2000
------------------------------------------------
[gstreamer] gstCamera -- attempting to create device csi://2
[gstreamer] gstCamera pipeline string:
[gstreamer] nvarguscamerasrc sensor-id=2 saturation=2 ispdigitalgainrange='1 4' exposurecompensation=0 exposuretimerange='134000 158733000' ee                              -mode=2 ee-strength=1 gainrange='1 3' ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvi                              dconv flip-method=2 ! video/x-raw ! appsink name=mysink
GST_ARGUS: NvArgusCameraSrc: Setting ISP Digital Gain Range : '1 4'
GST_ARGUS: NvArgusCameraSrc: Setting Exposure Time Range : '134000 158733000'
GST_ARGUS: NvArgusCameraSrc: Setting Gain Range : '1 3'
[gstreamer] gstCamera successfully created device csi://2
[video]  created gstCamera from csi://2
------------------------------------------------
gstCamera video options:
------------------------------------------------
  -- URI: csi://2
     - protocol:  csi
     - location:  2
     - port:      2
  -- deviceType: csi
  -- ioType:     input
  -- codec:      raw
  -- width:      1280
  -- height:     720
  -- frameRate:  30.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: rotate-180
  -- loop:       0
  -- rtspLatency 2000
------------------------------------------------
[gstreamer] gstCamera -- attempting to create device csi://1
[gstreamer] gstCamera pipeline string:
[gstreamer] nvarguscamerasrc sensor-id=1 saturation=2 ispdigitalgainrange='1 4' exposurecompensation=0 exposuretimerange='134000 158733000' ee                              -mode=2 ee-strength=1 gainrange='1 3' ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvi                              dconv flip-method=2 ! video/x-raw ! appsink name=mysink
GST_ARGUS: NvArgusCameraSrc: Setting ISP Digital Gain Range : '1 4'
GST_ARGUS: NvArgusCameraSrc: Setting Exposure Time Range : '134000 158733000'
GST_ARGUS: NvArgusCameraSrc: Setting Gain Range : '1 3'
[gstreamer] gstCamera successfully created device csi://1
[video]  created gstCamera from csi://1
------------------------------------------------
gstCamera video options:
------------------------------------------------
  -- URI: csi://1
     - protocol:  csi
     - location:  1
     - port:      1
  -- deviceType: csi
  -- ioType:     input
  -- codec:      raw
  -- width:      1280
  -- height:     720
  -- frameRate:  30.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: rotate-180
  -- loop:       0
  -- rtspLatency 2000
------------------------------------------------
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> nvarguscamerasrc0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvarguscamerasrc0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer message new-clock ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvarguscamerasrc0
Tue Jan 24 13:20:15 2023 MV Web Server Starts - 10.199.1.178:8069
GST_ARGUS: Creating output stream
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer message stream-start ==> pipeline0
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: 1920 x 1080 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: Running with following settings:
   Camera index = 0
   Camera mode  = 1
   Output Stream W = 1920 H = 1080
   seconds to Run    = 0
   Frame Rate = 59.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[gstreamer] gstCamera -- onPreroll
[gstreamer] gstBufferManager -- map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstBufferManager recieve caps:  video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)30/1, format=(string)NV12
[gstreamer] gstBufferManager -- recieved first frame, codec=raw format=nv12 width=1280 height=720 size=1382407
RingBuffer -- allocated 4 buffers (1382407 bytes each, 5529628 bytes total)
RingBuffer -- allocated 4 buffers (8 bytes each, 32 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline0
[gstreamer] gstreamer message warning ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
RingBuffer -- allocated 4 buffers (14745600 bytes each, 58982400 bytes total)
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter3
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv1
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter2
[gstreamer] gstreamer changed state from NULL to READY ==> nvarguscamerasrc1
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline1
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter3
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv1
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter2
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvarguscamerasrc1
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline1
[gstreamer] gstreamer message new-clock ==> pipeline1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter3
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter2
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvarguscamerasrc1
GST_ARGUS: Creating output stream
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer message stream-start ==> pipeline1
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: 1920 x 1080 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: Running with following settings:
   Camera index = 4
   Camera mode  = 1
   Output Stream W = 1920 H = 1080
   seconds to Run    = 0
   Frame Rate = 59.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[gstreamer] gstCamera -- onPreroll
[gstreamer] gstBufferManager -- map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstBufferManager recieve caps:  video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)30/1, format=(string)NV12
[gstreamer] gstBufferManager -- recieved first frame, codec=raw format=nv12 width=1280 height=720 size=1382407
RingBuffer -- allocated 4 buffers (1382407 bytes each, 5529628 bytes total)
RingBuffer -- allocated 4 buffers (8 bytes each, 32 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline1
[gstreamer] gstreamer message warning ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline1
RingBuffer -- allocated 4 buffers (14745600 bytes each, 58982400 bytes total)
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter5
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv2
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter4
[gstreamer] gstreamer changed state from NULL to READY ==> nvarguscamerasrc2
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline2
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter5
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv2
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter4
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvarguscamerasrc2
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline2
[gstreamer] gstreamer message new-clock ==> pipeline2
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter5
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv2
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter4
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvarguscamerasrc2
[gstreamer] gstreamer message stream-start ==> pipeline2
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: 1920 x 1080 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: Running with following settings:
   Camera index = 2
   Camera mode  = 1
   Output Stream W = 1920 H = 1080
   seconds to Run    = 0
   Frame Rate = 59.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[gstreamer] gstCamera -- onPreroll
[gstreamer] gstBufferManager -- map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstBufferManager recieve caps:  video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)30/1, format=(string)NV12
[gstreamer] gstBufferManager -- recieved first frame, codec=raw format=nv12 width=1280 height=720 size=1382407
RingBuffer -- allocated 4 buffers (1382407 bytes each, 5529628 bytes total)
RingBuffer -- allocated 4 buffers (8 bytes each, 32 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline2
[gstreamer] gstreamer message warning ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline2
RingBuffer -- allocated 4 buffers (14745600 bytes each, 58982400 bytes total)
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter7
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv3
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter6
[gstreamer] gstreamer changed state from NULL to READY ==> nvarguscamerasrc3
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline3
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter7
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv3
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter6
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvarguscamerasrc3
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline3
[gstreamer] gstreamer message new-clock ==> pipeline3
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter7
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv3
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter6
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvarguscamerasrc3
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer message stream-start ==> pipeline3
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: 1920 x 1080 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 6                              83709000;

GST_ARGUS: Running with following settings:
   Camera index = 1
   Camera mode  = 1
   Output Stream W = 1920 H = 1080
   seconds to Run    = 0
   Frame Rate = 59.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[gstreamer] gstCamera -- onPreroll
[gstreamer] gstBufferManager -- map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstBufferManager recieve caps:  video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)30/1, format=(string)NV12
[gstreamer] gstBufferManager -- recieved first frame, codec=raw format=nv12 width=1280 height=720 size=1382407
RingBuffer -- allocated 4 buffers (1382407 bytes each, 5529628 bytes total)
RingBuffer -- allocated 4 buffers (8 bytes each, 32 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline3
[gstreamer] gstreamer message warning ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline3
RingBuffer -- allocated 4 buffers (14745600 bytes each, 58982400 bytes total)

Hi @raul.orav, I would try creating your videoSource interfaces like this:

camera1 = jetson_utils.videoSource(“csi://0”, argv=["--input-width=1920", "--input-height=1080", "--input-rate=60"])

(you can also remove those args from the detectNet constructor, or use this newer detectNet constructor for custom models shown here)

Note that for this many HD camera streams, I would recommend using DeepStream should you encounter performance issues.

Disabling NVMM memory will cause another memory copy, but can be more compatible if you are having problems with some video sources / codecs.

Sure, you can use PyTorch, TensorFlow, ect but these typically aren’t as fast as inferencing libraries that use TensorRT (like jetson-inference or DeepStream)

detectNet doesn’t do temporal tracking and doesn’t store data frame-to-frame, so you can safely re-use it across multiple streams. (actually in the dev branch of jetson-inference, I have implemented basic IoU tracking in detectNet, but you need to explicitly enable it)

By default, jetson-inference will use the GPU for inferencing and not DLA. I’m not sure if SSD-Mobilenet is compatible with DLA or not. I would also probably recommend using DeepStream to maximize utilization of all the compute hardware on Jetson like the GPU + 2xDLA.

These engines are for video encoding. In CUDA you can use this cudaOverlay() function - https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-image.md#overlay

Also, given the amount of data you are capturing, you might want to consider using rgb8 image format instead. And typically DNN’s don’t really need full-HD resolution, because it’s downscaled to 300x300 (or 512x512 in your case) anyways, but you can tell in your scenario if the greater pixel coverage is beneficial.

Yes, it train_ssd.py does this data augmentation during training found here: https://github.com/dusty-nv/pytorch-ssd/blob/21383204c68846bfff95acbbd93d39914a77c707/vision/ssd/data_preprocessing.py#L4

I would recommend looking for Jetson ecosystem partner carrier boards here:

Thank you Dusty :). I don’t know if i could somehow help you but if you need anything, i’m here :)

I managed to run it with 60fps and full hd :) Thank you. It uses about 80% of cpu and 40-80% gpu while detecting.

IoU tracking in detectNet seems interesting.

Maybe its a stupid question but why is there jetson-inference and also deepstream? :)

Can you give me a hint how to enable this tracking please?

OK great! glad you were able to get your camera working at full resolution.

jetson-inference predates DeepStream and was an easy-to-follow tutorial for getting started with deep learning. I always aim to keep realtime performance with jetson-inference, but DeepStream has maximum performance for multi-stream applications and fully utilizing the hardware.

You would need to clone/build/install the dev branch of jetson-inference and run detectnet/detectnet.py with the --tracking flag. There are also these additional command-line options you can experiment with:

--tracking               flag to enable default tracker (IOU)
--tracker-min-frames=N   the number of re-identified frames for a track to be considered valid (default: 3)
--tracker-lost-frames=N  number of consecutive lost frames before a track is removed (default: 15)
--tracker-overlap=N      how much IOU overlap is required for a bounding box to be matched (default: 0.5)
1 Like

Thank you! :) I’ll try. Good that you have made jetson-inference, it is exactly right thing for developers like me. So your work helped us to take big step forward. :) Thnx.

I’ll test this tracking. Very useful development. At the moment i built some elementary tracking by myself. Just by comparing coordinates in python array.

I think besides tracking it could be useful to have some kind of module, which allows somehow to keep track same object from different cameras.

For example, I’m taking the shot, a wooden white detail with defects on it - i’m taking a shot from 4 different angles. So, i could see the same defect from 3 cameras. At the moment I’m doing some hack with again with python array’s. :) Just an idea.

The tracking I’ve implemented is very basic IOU tracking as described in the paper High-Speed Tracking-by-Detection Without Using Image Information. I’m also testing the KLT tracker from VPI library but I need to work on it more. Again, DeepStream has much better tracking algorithms available for production applications.

1 Like

thnx

Still, I think that more elegant is the solution which is easier to use :) So, i think we’ll keep using jetson-inference as long as it is possible. I like it, it’s easy and understandable and practical.

Hi everyone. Some questions. Maybe it helps me and someone else too.

The background:

  • net = jetson_inference.detectNet(argv=[‘–model=/home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw10/ssd-mobilenet.onnx’, ‘–labels=/home/visioline/install/jetson-inference/python/training/detection/ssd/models/jw10/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’, ‘–confidence=0.6’, ‘–input-width=1980’, ‘–input-height=1080’, ‘–input-rate=60’])
  • camera1 = jetson_utils.videoSource(“csi://0”, argv=[“–input-width=1920”, “–input-height=1080”, “–input-rate=60”]) # select camera 1 - Capture a frame and return the cudaImage
  • And find detections so: detections1 = net.Detect(img1, overlay=“box,labels,conf”)

I use at the moment for training separate Ubuntu machine: pytorch-ssd$ python3 train_ssd_gpus.py --dataset-type=voc --data=data/jw9 --model-dir=models/jw9 --batch-size=56 --workers=22 --epochs=200 --resolution=512 --use-cuda=True --lr=0.001 --gpu-devices 0 1

Let’s imagine I train with one GPU.

The questions are:

  1. What is learning rate? I used 0.01 (which is default). What is the difference in detection or learning process if i use 0.001? In practise? How to understand it?
  2. Half on images are format=‘rgba32f’ and half of them are rgb8 (default) - should all the images be in same format?
    2a) And if not, whats the difference if i’d like to detect on white surface gray stripes?
    2b) Did i ruin my dataset if i but besides rgb8 rgb32f images?
  3. Let’s asume that training time isn’t a problem. I can’t understand the difference between batch size 8, 16, 24 and 32 - how does it affect detecting?
    3a) Should I configure batch size in detectnet also somehow?
  4. Workers. I use number of workers 22. Does it change the outcome of the model? I see that, number of workers only take CPU time.
  5. Is there any safe selection of parameters? Or its very stupid question… :P
  6. How would you suggest me to test different paramters?
  7. If I export onnx_export.py - default is 512x512 (width and height), it should be OK. But default batch size is 1 (looked inside the script), what does that mean?
    7a) Or should i use same batch size while training?
    7b) Or should use batch size which is manageable with my Xavier NX 16?
  8. What should i think about model optimizers? When it point to use those?
  9. Onnx exporter selects model using losses - is this reliable or should i do it somehow manually?
    9a) Or there are some nice tools available?
  10. And finally - is there any differences in model performance whn training with one or 2 CPU?

Learning rate changes how quickly the model weights are updated during training and impacts how quickly the model converges. If the learning rate is too high, the gradients can overshoot the target behavior, and if it’s too low the training can take too long. I would recommend referring to articles like this that explain learning rate in more depth:

Are your images stored on disk as .jpg? If so, they are probably all rgb8, which should be fine. When you would probably want to use the floating-point formats is if you had special HDR cameras (like 10- or 12-bit), which I haven’t used/tested with jetson-inference before anyways.

A higher batch size will speed up the training by utilizing the GPU more (especially on dGPU, on Jetson the speed-ups can be less noticeable depending on how utilized it’s GPU already is). On the inferencing side, detectnet only supports a batch size of 1 (users of jetson-inference typically only have one camera). DeepStream supports inferencing with higher batch sizes (which is useful if you are processing multiple camera streams simultaneously).

The --workers option changes how many CPU threads are used to load/preprocess the dataset during training. It doesn’t change the outcome of the model.

The batch size in onnx_export.py is the batch size to be used during inferencing, and with detectnet that’s 1. The batch size that you used during training can be different.

I have been meaning to try the onnx-simplifier tool to see if that works and improves performance: https://github.com/daquexian/onnx-simplifier

So you could try that. Also the TAO Toolkit can do optimization/pruning while it is training models. TAO is frequently used to train models that are deployed with DeepStream.

With train_ssd.py, the model with the lowest loss is the best performing one, which is why it picks the lowest one. Although you could validate this notion with the --validate-mean-ap option that I added to train_ssd.py.

Do you mean 2 GPUs? I don’t believe there is. Personally I train with just 1 GPU, also a GeForce 1070. The models I typically train aren’t huge, although sometimes I train on larger datasets like MS COCO (in which case I just let it run for awhile).

1 Like

OK, thnx. :) There is a lot to learn, staring from learning rate, params for sgd, Cosine Annealing Scheduler, Scheduler for SGD, etc. I would like to use some kind of configuration which is foolproof :) - I let it train for a week and later will pick the best result :) Or it already has this feature built in and i don’t know it? :)

So, i think i’ll play and learn more and keep this forum quiet for some time.

I turned on --validate-mean-ap, will play with it…
Will see what are the results after 30, 60 or even 100 or 200 epochs…

I used:

python3 train_ssds.py --dataset-type=voc --data=data/jw9 --model-dir=models/jw12-b48-lrdefault-w22-e200-5512-v-2gpu --batch-size=48 --workers=22 --epochs=300 --resolution=512 --use-cuda=True validation_mean_ap=True --gpu-devices 0 1

2023-01-27 21:07:16 - VOC Labels read from file: ('BACKGROUND', 'Vaigupesa', 'Lohe', 'UVseisak', 'Pahtliseisak', 'Oksaauk', 'Postriip', 'Negtriip',                                  'Neguvtriip', 'Sormjatkudefekt', 'Kumavus', 'Muljumine', 'Posdefekt', 'Triibud', 'Karedus', 'Klambritriip', 'hooveldusviga', 'positiivneuvtriip')
2023-01-27 21:07:16 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2023-01-27 21:07:17 - Took 0.02 seconds to load the model.
2023-01-27 21:07:17 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2023-01-27 21:07:17 - Uses CosineAnnealingLR scheduler.
2023-01-27 21:07:17 - Start training from epoch 0.
/usr/local/lib/python3.8/dist-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
2023-01-27 21:08:37 - Epoch: 0, Step: 10/95, Avg Loss: 18.5589, Avg Regression Loss 5.4735, Avg Classification Loss: 13.0854
2023-01-27 21:08:45 - Epoch: 0, Step: 20/95, Avg Loss: 10.8616, Avg Regression Loss 4.9293, Avg Classification Loss: 5.9322
2023-01-27 21:09:35 - Epoch: 0, Step: 30/95, Avg Loss: 9.8073, Avg Regression Loss 4.2703, Avg Classification Loss: 5.5370
2023-01-27 21:09:43 - Epoch: 0, Step: 40/95, Avg Loss: 9.1577, Avg Regression Loss 4.0422, Avg Classification Loss: 5.1156
2023-01-27 21:10:49 - Epoch: 0, Step: 50/95, Avg Loss: 8.9712, Avg Regression Loss 4.0127, Avg Classification Loss: 4.9584
2023-01-27 21:10:57 - Epoch: 0, Step: 60/95, Avg Loss: 8.6861, Avg Regression Loss 3.8574, Avg Classification Loss: 4.8288
2023-01-27 21:11:28 - Epoch: 0, Step: 70/95, Avg Loss: 8.2826, Avg Regression Loss 3.7746, Avg Classification Loss: 4.5080
2023-01-27 21:11:34 - Epoch: 0, Step: 80/95, Avg Loss: 8.2244, Avg Regression Loss 3.7573, Avg Classification Loss: 4.4671
2023-01-27 21:11:42 - Epoch: 0, Step: 90/95, Avg Loss: 7.8739, Avg Regression Loss 3.5314, Avg Classification Loss: 4.3426
2023-01-27 21:11:45 - Epoch: 0, Training Loss: 9.8720, Training Regression Loss 4.1390, Training Classification Loss: 5.7331
2023-01-27 21:12:45 - Epoch: 0, Validation Loss: 8.2319, Validation Regression Loss 3.6052, Validation Classification Loss: 4.6267
2023-01-27 21:30:19 - Epoch: 0, Average Precision Per-class:
2023-01-27 21:30:19 -     Vaigupesa: 0.05384596125790713
2023-01-27 21:30:19 -     Lohe: 0.044271289936223326
2023-01-27 21:30:19 -     UVseisak: 0.061337409636432874
2023-01-27 21:30:19 -     Pahtliseisak: 0.11164539926561612
2023-01-27 21:30:19 -     Oksaauk: 0.3255230824542966
2023-01-27 21:30:19 -     Postriip: 0.0
2023-01-27 21:30:19 -     Negtriip: 0.04812037170203135
2023-01-27 21:30:19 -     Neguvtriip: 0.0076787662205599965
2023-01-27 21:30:19 -     Sormjatkudefekt: 0.03322681659477197
2023-01-27 21:30:19 -     Kumavus: 0.06821402872103974
2023-01-27 21:30:19 -     Muljumine: 0.002589604523232847
2023-01-27 21:30:19 -     Posdefekt: 0.016658379429687252
2023-01-27 21:30:19 -     Triibud: 0.0019715185121682286
2023-01-27 21:30:19 -     Karedus: 1.1292353382906765e-05
2023-01-27 21:30:19 -     Klambritriip: 0.003908750690342988
2023-01-27 21:30:19 -     hooveldusviga: 0.0
2023-01-27 21:30:19 -     positiivneuvtriip: 9.711472162065048e-06
2023-01-27 21:30:19 - Epoch: 0, Mean Average Precision (mAP):  0.0458242578099915
2023-01-27 21:30:19 - Saved model models/jw12-b48-lrdefault-w22-e200-5512-v-2gpu/mb1-ssd-Epoch-0-Loss-8.231881548229017.pth
2023-01-27 21:31:39 - Epoch: 1, Step: 10/95, Avg Loss: 8.9479, Avg Regression Loss 3.7262, Avg Classification Loss: 5.2216
2023-01-27 21:31:47 - Epoch: 1, Step: 20/95, Avg Loss: 7.7680, Avg Regression Loss 3.3489, Avg Classification Loss: 4.4191
2023-01-27 21:32:31 - Epoch: 1, Step: 30/95, Avg Loss: 7.7795, Avg Regression Loss 3.3939, Avg Classification Loss: 4.3856
2023-01-27 21:32:50 - Epoch: 1, Step: 40/95, Avg Loss: 7.5209, Avg Regression Loss 3.2786, Avg Classification Loss: 4.2423
2023-01-27 21:33:14 - Epoch: 1, Step: 50/95, Avg Loss: 7.4934, Avg Regression Loss 3.3628, Avg Classification Loss: 4.1307
2023-01-27 21:33:34 - Epoch: 1, Step: 60/95, Avg Loss: 7.2824, Avg Regression Loss 3.0689, Avg Classification Loss: 4.2135
2023-01-27 21:33:49 - Epoch: 1, Step: 70/95, Avg Loss: 7.3344, Avg Regression Loss 3.2392, Avg Classification Loss: 4.0952
2023-01-27 21:33:56 - Epoch: 1, Step: 80/95, Avg Loss: 7.4790, Avg Regression Loss 3.3395, Avg Classification Loss: 4.1395
2023-01-27 21:34:11 - Epoch: 1, Step: 90/95, Avg Loss: 7.1963, Avg Regression Loss 3.1919, Avg Classification Loss: 4.0044
2023-01-27 21:34:13 - Epoch: 1, Training Loss: 7.6446, Training Regression Loss 3.3951, Training Classification Loss: 4.2495
2023-01-27 21:34:58 - Epoch: 1, Validation Loss: 7.3598, Validation Regression Loss 3.4532, Validation Classification Loss: 3.9067
2023-01-27 21:52:27 - Epoch: 1, Average Precision Per-class:
2023-01-27 21:52:27 -     Vaigupesa: 0.12605222686629014
2023-01-27 21:52:27 -     Lohe: 0.029132832056642705
2023-01-27 21:52:27 -     UVseisak: 0.2918351079324864
2023-01-27 21:52:27 -     Pahtliseisak: 0.18397149908020852
2023-01-27 21:52:27 -     Oksaauk: 0.38457965359059143
2023-01-27 21:52:27 -     Postriip: 0.0
2023-01-27 21:52:27 -     Negtriip: 0.013918457970302098
2023-01-27 21:52:27 -     Neguvtriip: 0.01030999226539954
2023-01-27 21:52:27 -     Sormjatkudefekt: 0.1754421559547955
2023-01-27 21:52:27 -     Kumavus: 0.14696048422727304
2023-01-27 21:52:27 -     Muljumine: 0.007580780081372639
2023-01-27 21:52:27 -     Posdefekt: 0.027454118083067934
2023-01-27 21:52:27 -     Triibud: 0.003935721125022458
2023-01-27 21:52:27 -     Karedus: 0.00034502712389700103
2023-01-27 21:52:27 -     Klambritriip: 0.025893779882844924
2023-01-27 21:52:27 -     hooveldusviga: 3.5696118939468306e-05
2023-01-27 21:52:27 -     positiivneuvtriip: 0.0
2023-01-27 21:52:27 - Epoch: 1, Mean Average Precision (mAP):  0.08396750190347843
2023-01-27 21:52:27 - Saved model models/jw12-b48-lrdefault-w22-e200-5512-v-2gpu/mb1-ssd-Epoch-1-Loss-7.359846556814094.pth
2023-01-27 21:53:54 - Epoch: 2, Step: 10/95, Avg Loss: 9.1909, Avg Regression Loss 4.4712, Avg Classification Loss: 4.7196
2023-01-27 21:54:01 - Epoch: 2, Step: 20/95, Avg Loss: 7.4987, Avg Regression Loss 3.4303, Avg Classification Loss: 4.0684
2023-01-27 21:54:55 - Epoch: 2, Step: 30/95, Avg Loss: 7.2270, Avg Regression Loss 3.1784, Avg Classification Loss: 4.0486
2023-01-27 21:55:02 - Epoch: 2, Step: 40/95, Avg Loss: 7.1913, Avg Regression Loss 3.1573, Avg Classification Loss: 4.0340
2023-01-27 21:55:27 - Epoch: 2, Step: 50/95, Avg Loss: 7.0678, Avg Regression Loss 3.1404, Avg Classification Loss: 3.9274
2023-01-27 21:56:00 - Epoch: 2, Step: 60/95, Avg Loss: 7.0038, Avg Regression Loss 3.0093, Avg Classification Loss: 3.9945
2023-01-27 21:56:06 - Epoch: 2, Step: 70/95, Avg Loss: 6.7761, Avg Regression Loss 2.8699, Avg Classification Loss: 3.9061
2023-01-27 21:56:19 - Epoch: 2, Step: 80/95, Avg Loss: 7.0852, Avg Regression Loss 3.1417, Avg Classification Loss: 3.9435
2023-01-27 21:56:26 - Epoch: 2, Step: 90/95, Avg Loss: 6.7371, Avg Regression Loss 2.8840, Avg Classification Loss: 3.8531
2023-01-27 21:56:28 - Epoch: 2, Training Loss: 7.2055, Training Regression Loss 3.2061, Training Classification Loss: 3.9994
2023-01-27 21:57:13 - Epoch: 2, Validation Loss: 6.8990, Validation Regression Loss 3.1598, Validation Classification Loss: 3.7392
2023-01-27 22:13:40 - Epoch: 2, Average Precision Per-class:
2023-01-27 22:13:40 -     Vaigupesa: 0.17699904803730845
2023-01-27 22:13:40 -     Lohe: 0.1048441271533414
2023-01-27 22:13:40 -     UVseisak: 0.30869524274909177
2023-01-27 22:13:40 -     Pahtliseisak: 0.29111211930897996
2023-01-27 22:13:40 -     Oksaauk: 0.42010490838579445
2023-01-27 22:13:40 -     Postriip: 0.0
2023-01-27 22:13:40 -     Negtriip: 0.13784000911312008
2023-01-27 22:13:40 -     Neguvtriip: 0.008365056636282144
2023-01-27 22:13:40 -     Sormjatkudefekt: 0.1942475284827371
2023-01-27 22:13:40 -     Kumavus: 0.12233463372446894
2023-01-27 22:13:40 -     Muljumine: 0.026536805164587604
2023-01-27 22:13:40 -     Posdefekt: 0.032208548000670786
2023-01-27 22:13:40 -     Triibud: 0.004021597578609632
2023-01-27 22:13:40 -     Karedus: 0.003794602540957285
2023-01-27 22:13:40 -     Klambritriip: 0.005370501794433427
2023-01-27 22:13:40 -     hooveldusviga: 0.03164983164983165
2023-01-27 22:13:40 -     positiivneuvtriip: 0.0
2023-01-27 22:13:40 - Epoch: 2, Mean Average Precision (mAP):  0.10988968001883614
2023-01-27 22:13:40 - Saved model models/jw12-b48-lrdefault-w22-e200-5512-v-2gpu/mb1-ssd-Epoch-2-Loss-6.898978765387284.pth

Is following parameters for playing? For example I could have a defect which is about 5 pixel x 100 px.

specs = [
    SSDSpec(32, 16, SSDBoxSizes(20, 35), [2, 3]),
    SSDSpec(16, 32, SSDBoxSizes(35, 50), [2, 3]),
    SSDSpec(8, 64, SSDBoxSizes(50, 65), [2, 3]),
    SSDSpec(4, 100, SSDBoxSizes(195, 240), [2, 3]),
    SSDSpec(2, 150, SSDBoxSizes(240, 285), [2, 3]),
    SSDSpec(1, 300, SSDBoxSizes(285, 512), [2, 3])
]

I mean, can i change those? Even make some layers more?

So, batch size doesn’t affect the result, it only makes difference in training time. Good to know. Also, workers doesn’t matter, besides training time. In internet, reading other forums it seems like batch size change could somehow have impact on models performance or precision. Its not first time to relay on false information :)

What do think about AVG loss? I understood that if you train too long then the model becomes don’t “think” (or detect) but rather “remembers” - so, its good to keep AVG loss stable, right after the fall of graph line is the “right” model.

By stable I mean, if it falls from 18 to 6 then it’s no point to wait until it gets to AVG loss 3? For example, in my example i got in 20-30 epoch from 18 to 5-6 and then it takes about 5-12 hours until it reaches to 2-3 avg loss. Is there such things like teaching it too much? In my example I have 5000 images, about 5G. What do you think out of experience, what precision is there reasonable to chase?

If i have one or two classes which have a lot more images/annotations, how does it ruin the model? I read that every class should have almost same amount of images - i begin to think its only useful in some situations…

Those rgb32f i got with: img1 = camera1.Capture(format=‘rgba32f’) and later used saveImageRGBA(test_filename, img1, 1920, 1080). I havent got time to figure out if .jpg uses different formats or not or how to test it. If those .jpg’s are rgba32f and others are rbg8 then it think i should write a python script to convert those new images to rgb8. Will look into it.

Actually I looked for some time ago to TAO toolkit but noticed that it hasn’t been updated for a long time (if i’m not wrong), so I thought it is better to find something fresh :) - stupid idea from my side :)

Thanks, i think i should read a little bit and study those parameters for some time before asking any questions more :) - Params for SGD, Params for Cosine Annealing, Params for Multi-step Scheduler, Scheduler, etc.

Any suggestions what courses to take or what documents to read? Theres always a point where one should decide whether to practice or dig into theory… I think, if i got those topics, it could be useful to write here also, maybe someone else gets faster forward…


P.S. with learing rate 0.001 i got:

visioline@mv:~/install/pytorch-ssd/models/jw10-lr0001-batch16-workers22$ ls
labels.txt                                     mb1-ssd-Epoch-243-Loss-3.6913315512687492.pth
mb1-ssd-Epoch-0-Loss-7.844803249035623.pth     mb1-ssd-Epoch-244-Loss-3.5762189921557694.pth
mb1-ssd-Epoch-100-Loss-3.699006062935603.pth   mb1-ssd-Epoch-245-Loss-3.448338948374502.pth
mb1-ssd-Epoch-101-Loss-3.6935708741838433.pth  mb1-ssd-Epoch-246-Loss-3.4074024033630694.pth
mb1-ssd-Epoch-102-Loss-3.6866183879097445.pth  mb1-ssd-Epoch-247-Loss-3.6559912703483772.pth
mb1-ssd-Epoch-103-Loss-3.7020039482588483.pth  mb1-ssd-Epoch-248-Loss-3.381274731757363.pth
mb1-ssd-Epoch-104-Loss-3.697287795400451.pth   mb1-ssd-Epoch-249-Loss-3.356470147628245.pth
mb1-ssd-Epoch-105-Loss-3.68723178794443.pth    mb1-ssd-Epoch-24-Loss-5.010350614048988.pth
mb1-ssd-Epoch-106-Loss-3.700966738138098.pth   mb1-ssd-Epoch-250-Loss-3.377645708225641.pth
mb1-ssd-Epoch-107-Loss-3.6937036071986276.pth  mb1-ssd-Epoch-251-Loss-3.3108525234060657.pth
mb1-ssd-Epoch-108-Loss-3.6714194608661397.pth  mb1-ssd-Epoch-252-Loss-3.4701667329026615.pth
mb1-ssd-Epoch-109-Loss-3.6933456132774216.pth  mb1-ssd-Epoch-253-Loss-3.347730076355142.pth
mb1-ssd-Epoch-10-Loss-5.522492399485288.pth    mb1-ssd-Epoch-254-Loss-3.571896643604912.pth
mb1-ssd-Epoch-110-Loss-3.7102897921215097.pth  mb1-ssd-Epoch-255-Loss-3.2905009129864586.pth
mb1-ssd-Epoch-111-Loss-3.706566136211894.pth   mb1-ssd-Epoch-256-Loss-3.3402130553242175.pth
mb1-ssd-Epoch-112-Loss-3.6986740894957904.pth  mb1-ssd-Epoch-257-Loss-3.275691715230369.pth
mb1-ssd-Epoch-113-Loss-3.67913736176575.pth    mb1-ssd-Epoch-258-Loss-3.377435946633032.pth
mb1-ssd-Epoch-114-Loss-3.6831407075214724.pth  mb1-ssd-Epoch-259-Loss-3.253418833543892.pth
mb1-ssd-Epoch-115-Loss-3.699193856320196.pth   mb1-ssd-Epoch-25-Loss-4.916048883549316.pth
mb1-ssd-Epoch-116-Loss-3.690158523013651.pth   mb1-ssd-Epoch-260-Loss-3.246812629194226.pth
mb1-ssd-Epoch-117-Loss-3.690278866265772.pth   mb1-ssd-Epoch-261-Loss-3.1805609527830523.pth
mb1-ssd-Epoch-118-Loss-3.6857616088415623.pth  mb1-ssd-Epoch-262-Loss-3.2596950926966044.pth
mb1-ssd-Epoch-119-Loss-3.7051202273621575.pth  mb1-ssd-Epoch-263-Loss-3.148422231101316.pth
mb1-ssd-Epoch-11-Loss-5.4967269804789405.pth   mb1-ssd-Epoch-264-Loss-3.1471123274139297.pth
mb1-ssd-Epoch-120-Loss-3.708761480166297.pth   mb1-ssd-Epoch-265-Loss-3.182611842458745.pth
mb1-ssd-Epoch-121-Loss-3.7443698825768785.pth  mb1-ssd-Epoch-266-Loss-3.1396695800889085.pth
mb1-ssd-Epoch-122-Loss-3.739069850621712.pth   mb1-ssd-Epoch-267-Loss-3.1359034995729425.pth
mb1-ssd-Epoch-123-Loss-3.6849114111371257.pth  mb1-ssd-Epoch-268-Loss-3.3426146599934716.pth
mb1-ssd-Epoch-124-Loss-3.786229124760038.pth   mb1-ssd-Epoch-269-Loss-3.1627476122691016.pth
mb1-ssd-Epoch-125-Loss-3.7050361161518435.pth  mb1-ssd-Epoch-26-Loss-5.153917679938327.pth
mb1-ssd-Epoch-126-Loss-3.714161709846119.pth   mb1-ssd-Epoch-270-Loss-3.1389441797674333.pth
mb1-ssd-Epoch-127-Loss-3.76122463423456.pth    mb1-ssd-Epoch-271-Loss-3.079378834461576.pth
mb1-ssd-Epoch-128-Loss-3.725922709219026.pth   mb1-ssd-Epoch-272-Loss-3.0738436396467392.pth
mb1-ssd-Epoch-129-Loss-3.702946438806217.pth   mb1-ssd-Epoch-273-Loss-3.0893833555096872.pth
mb1-ssd-Epoch-12-Loss-5.368018547132243.pth    mb1-ssd-Epoch-274-Loss-3.063586101093899.pth
mb1-ssd-Epoch-130-Loss-3.7289963825852626.pth  mb1-ssd-Epoch-275-Loss-3.0518799350455454.pth
mb1-ssd-Epoch-131-Loss-3.8207651625252446.pth  mb1-ssd-Epoch-276-Loss-3.0142651783703913.pth
mb1-ssd-Epoch-132-Loss-3.799161578656928.pth   mb1-ssd-Epoch-277-Loss-3.0440778433223494.pth
mb1-ssd-Epoch-133-Loss-3.8829803020288582.pth  mb1-ssd-Epoch-278-Loss-3.015008947663931.pth
mb1-ssd-Epoch-134-Loss-3.793763907553872.pth   mb1-ssd-Epoch-279-Loss-3.0484737319996835.pth
mb1-ssd-Epoch-135-Loss-3.8235681532971006.pth  mb1-ssd-Epoch-27-Loss-4.95171338425087.pth
mb1-ssd-Epoch-136-Loss-3.771383668845618.pth   mb1-ssd-Epoch-280-Loss-3.0413278528742573.pth
mb1-ssd-Epoch-137-Loss-3.7599116487974835.pth  mb1-ssd-Epoch-281-Loss-2.989393106618955.pth
mb1-ssd-Epoch-138-Loss-3.794572434240011.pth   mb1-ssd-Epoch-282-Loss-3.0226815492441292.pth
mb1-ssd-Epoch-139-Loss-3.959332571855282.pth   mb1-ssd-Epoch-283-Loss-3.0725651440266586.pth
mb1-ssd-Epoch-13-Loss-5.356737934658469.pth    mb1-ssd-Epoch-284-Loss-3.0162332700335095.pth
mb1-ssd-Epoch-140-Loss-3.811802827006094.pth   mb1-ssd-Epoch-285-Loss-2.982861387434781.pth
mb1-ssd-Epoch-141-Loss-3.9795849601287303.pth  mb1-ssd-Epoch-286-Loss-2.995698247907861.pth
mb1-ssd-Epoch-142-Loss-3.770634977640617.pth   mb1-ssd-Epoch-287-Loss-2.9773487844652506.pth
mb1-ssd-Epoch-143-Loss-3.764160690375015.pth   mb1-ssd-Epoch-288-Loss-2.9726809713950004.pth
mb1-ssd-Epoch-144-Loss-3.941951754657624.pth   mb1-ssd-Epoch-289-Loss-2.964711414630337.pth
mb1-ssd-Epoch-145-Loss-3.818326480397066.pth   mb1-ssd-Epoch-28-Loss-4.922726889802373.pth
mb1-ssd-Epoch-146-Loss-3.787470794398035.pth   mb1-ssd-Epoch-290-Loss-2.964172022502751.pth
mb1-ssd-Epoch-147-Loss-3.896992497225111.pth   mb1-ssd-Epoch-291-Loss-2.9690968792345838.pth
mb1-ssd-Epoch-148-Loss-3.9047439734421854.pth  mb1-ssd-Epoch-292-Loss-2.9587392194111017.pth
mb1-ssd-Epoch-149-Loss-3.801616705348551.pth   mb1-ssd-Epoch-293-Loss-2.9590987179810084.pth
mb1-ssd-Epoch-14-Loss-5.31876806909541.pth     mb1-ssd-Epoch-294-Loss-2.9636004430245175.pth
mb1-ssd-Epoch-150-Loss-3.8390737085376108.pth  mb1-ssd-Epoch-295-Loss-2.9723690687135758.pth
mb1-ssd-Epoch-151-Loss-3.8341963375415062.pth  mb1-ssd-Epoch-296-Loss-2.965184763758427.pth
mb1-ssd-Epoch-152-Loss-3.886494739316799.pth   mb1-ssd-Epoch-297-Loss-2.9498843672839996.pth
mb1-ssd-Epoch-153-Loss-3.860163895485679.pth   mb1-ssd-Epoch-298-Loss-2.9619937703382.pth
mb1-ssd-Epoch-154-Loss-3.938610915160432.pth   mb1-ssd-Epoch-299-Loss-2.958359249489046.pth
mb1-ssd-Epoch-155-Loss-4.00973360530058.pth    mb1-ssd-Epoch-29-Loss-4.8469047466352215.pth
mb1-ssd-Epoch-156-Loss-3.896275334981642.pth   mb1-ssd-Epoch-2-Loss-6.674325160339949.pth
mb1-ssd-Epoch-157-Loss-3.8947098579507835.pth  mb1-ssd-Epoch-300-Loss-2.954099058473068.pth
mb1-ssd-Epoch-158-Loss-3.9470720796618783.pth  mb1-ssd-Epoch-301-Loss-2.955025663013593.pth
mb1-ssd-Epoch-159-Loss-3.9763583155487114.pth  mb1-ssd-Epoch-302-Loss-2.96732164250667.pth
mb1-ssd-Epoch-15-Loss-5.2782613169599335.pth   mb1-ssd-Epoch-303-Loss-2.960101403954172.pth
mb1-ssd-Epoch-160-Loss-4.054554889143145.pth   mb1-ssd-Epoch-304-Loss-2.9525538843006633.pth
mb1-ssd-Epoch-161-Loss-4.192322373811432.pth   mb1-ssd-Epoch-305-Loss-2.954244538457149.pth
mb1-ssd-Epoch-162-Loss-4.118057604813323.pth   mb1-ssd-Epoch-306-Loss-3.0256930016797337.pth
mb1-ssd-Epoch-163-Loss-4.006347869815759.pth   mb1-ssd-Epoch-307-Loss-2.953462362710663.pth
mb1-ssd-Epoch-164-Loss-4.082696682572786.pth   mb1-ssd-Epoch-308-Loss-2.966559061102648.pth
mb1-ssd-Epoch-165-Loss-4.027016346951677.pth   mb1-ssd-Epoch-309-Loss-2.963052592092184.pth
mb1-ssd-Epoch-166-Loss-4.303224370673352.pth   mb1-ssd-Epoch-30-Loss-4.786364819893989.pth
mb1-ssd-Epoch-167-Loss-4.358200672237275.pth   mb1-ssd-Epoch-310-Loss-2.9636663651297876.pth
mb1-ssd-Epoch-168-Loss-4.255158622357533.pth   mb1-ssd-Epoch-311-Loss-2.967023590638865.pth
mb1-ssd-Epoch-169-Loss-5.067095943137529.pth   mb1-ssd-Epoch-312-Loss-2.9585464853701238.pth
mb1-ssd-Epoch-16-Loss-5.0935263911742625.pth   mb1-ssd-Epoch-313-Loss-2.951885333120191.pth
mb1-ssd-Epoch-170-Loss-4.0984607092483305.pth  mb1-ssd-Epoch-314-Loss-3.002850062013094.pth
mb1-ssd-Epoch-171-Loss-4.096603227588397.pth   mb1-ssd-Epoch-315-Loss-2.9496211146297386.pth
mb1-ssd-Epoch-172-Loss-4.095799999607746.pth   mb1-ssd-Epoch-316-Loss-2.9506621815711784.pth
mb1-ssd-Epoch-173-Loss-4.07028457469738.pth    mb1-ssd-Epoch-31-Loss-4.642630913653559.pth
mb1-ssd-Epoch-174-Loss-3.8917217052446236.pth  mb1-ssd-Epoch-32-Loss-4.63385819209338.pth
mb1-ssd-Epoch-175-Loss-4.424966495365641.pth   mb1-ssd-Epoch-33-Loss-4.651152633104223.pth
mb1-ssd-Epoch-176-Loss-4.061723209522638.pth   mb1-ssd-Epoch-34-Loss-4.561925447029275.pth
mb1-ssd-Epoch-177-Loss-4.008765029401745.pth   mb1-ssd-Epoch-35-Loss-4.676771238077656.pth
mb1-ssd-Epoch-178-Loss-4.115264038315089.pth   mb1-ssd-Epoch-36-Loss-4.534164722311202.pth
mb1-ssd-Epoch-179-Loss-4.245942492367101.pth   mb1-ssd-Epoch-37-Loss-4.53293393610223.pth
mb1-ssd-Epoch-17-Loss-5.155223617284121.pth    mb1-ssd-Epoch-38-Loss-4.529368832760059.pth
mb1-ssd-Epoch-180-Loss-4.309604926159862.pth   mb1-ssd-Epoch-39-Loss-4.459713729868508.pth
mb1-ssd-Epoch-181-Loss-4.259074957126442.pth   mb1-ssd-Epoch-3-Loss-6.372005097015165.pth
mb1-ssd-Epoch-182-Loss-4.182449917489985.pth   mb1-ssd-Epoch-40-Loss-4.444898235924252.pth
mb1-ssd-Epoch-183-Loss-4.114398039693125.pth   mb1-ssd-Epoch-41-Loss-4.400280089344658.pth
mb1-ssd-Epoch-184-Loss-4.364920032319247.pth   mb1-ssd-Epoch-42-Loss-4.380910423113685.pth
mb1-ssd-Epoch-185-Loss-4.060626204780471.pth   mb1-ssd-Epoch-43-Loss-4.372802652655558.pth
mb1-ssd-Epoch-186-Loss-4.052608737254732.pth   mb1-ssd-Epoch-44-Loss-4.455169289356407.pth
mb1-ssd-Epoch-187-Loss-4.037063216573358.pth   mb1-ssd-Epoch-45-Loss-4.365654075103598.pth
mb1-ssd-Epoch-188-Loss-4.217447706751605.pth   mb1-ssd-Epoch-46-Loss-4.346745607288482.pth
mb1-ssd-Epoch-189-Loss-3.9521583685184116.pth  mb1-ssd-Epoch-47-Loss-4.3586179527714055.pth
mb1-ssd-Epoch-18-Loss-5.0379230858159145.pth   mb1-ssd-Epoch-48-Loss-4.314381283079356.pth
mb1-ssd-Epoch-190-Loss-3.953404034826865.pth   mb1-ssd-Epoch-49-Loss-4.202310864579973.pth
mb1-ssd-Epoch-191-Loss-4.023098120841037.pth   mb1-ssd-Epoch-4-Loss-6.267493958186766.pth
mb1-ssd-Epoch-192-Loss-4.221324980048324.pth   mb1-ssd-Epoch-50-Loss-4.197987907345641.pth
mb1-ssd-Epoch-193-Loss-4.189583466246777.pth   mb1-ssd-Epoch-51-Loss-4.365126366328856.pth
mb1-ssd-Epoch-194-Loss-4.1018959991502255.pth  mb1-ssd-Epoch-52-Loss-4.278510636659899.pth
mb1-ssd-Epoch-195-Loss-3.982418603273668.pth   mb1-ssd-Epoch-53-Loss-4.195528481116143.pth
mb1-ssd-Epoch-196-Loss-4.05930337046566.pth    mb1-ssd-Epoch-54-Loss-4.155250171890528.pth
mb1-ssd-Epoch-197-Loss-4.1383189893864065.pth  mb1-ssd-Epoch-55-Loss-4.4274492419650615.pth
mb1-ssd-Epoch-198-Loss-3.9729718353217565.pth  mb1-ssd-Epoch-56-Loss-4.093779110234533.pth
mb1-ssd-Epoch-199-Loss-4.0136233033224045.pth  mb1-ssd-Epoch-57-Loss-4.125031690294246.pth
mb1-ssd-Epoch-19-Loss-5.056397146976457.pth    mb1-ssd-Epoch-58-Loss-4.1064425837446015.pth
mb1-ssd-Epoch-1-Loss-7.478294910053482.pth     mb1-ssd-Epoch-59-Loss-4.00749178741509.pth
mb1-ssd-Epoch-200-Loss-3.965183096302693.pth   mb1-ssd-Epoch-5-Loss-6.149081731432318.pth
mb1-ssd-Epoch-201-Loss-3.8797379075006546.pth  mb1-ssd-Epoch-60-Loss-4.030072903464624.pth
mb1-ssd-Epoch-202-Loss-3.936633770541673.pth   mb1-ssd-Epoch-61-Loss-4.006369186374408.pth
mb1-ssd-Epoch-203-Loss-4.432676501914385.pth   mb1-ssd-Epoch-62-Loss-3.9926272752849457.pth
mb1-ssd-Epoch-204-Loss-3.892888594431928.pth   mb1-ssd-Epoch-63-Loss-3.957493271928794.pth
mb1-ssd-Epoch-205-Loss-4.053726827719187.pth   mb1-ssd-Epoch-64-Loss-3.9553439865685185.pth
mb1-ssd-Epoch-206-Loss-3.830054516084624.pth   mb1-ssd-Epoch-65-Loss-3.929611106222173.pth
mb1-ssd-Epoch-207-Loss-3.751714853852882.pth   mb1-ssd-Epoch-66-Loss-3.9470394870838934.pth
mb1-ssd-Epoch-208-Loss-3.8991387075754442.pth  mb1-ssd-Epoch-67-Loss-3.952488877326777.pth
mb1-ssd-Epoch-209-Loss-3.791562867248859.pth   mb1-ssd-Epoch-68-Loss-3.894740044860031.pth
mb1-ssd-Epoch-20-Loss-4.979706742317011.pth    mb1-ssd-Epoch-69-Loss-3.871377524975753.pth
mb1-ssd-Epoch-210-Loss-5.160919652389554.pth   mb1-ssd-Epoch-6-Loss-6.135876489612323.pth
mb1-ssd-Epoch-211-Loss-3.949747069564388.pth   mb1-ssd-Epoch-70-Loss-3.879801269554839.pth
mb1-ssd-Epoch-212-Loss-3.918283830683139.pth   mb1-ssd-Epoch-71-Loss-3.846202814958121.pth
mb1-ssd-Epoch-213-Loss-3.8751271027979497.pth  mb1-ssd-Epoch-72-Loss-3.8629668942609863.pth
mb1-ssd-Epoch-214-Loss-3.9347812535484774.pth  mb1-ssd-Epoch-73-Loss-3.904229204562022.pth
mb1-ssd-Epoch-215-Loss-3.7550916844458966.pth  mb1-ssd-Epoch-74-Loss-3.83929830136653.pth
mb1-ssd-Epoch-216-Loss-3.8810523091272415.pth  mb1-ssd-Epoch-75-Loss-3.796804022452014.pth
mb1-ssd-Epoch-217-Loss-4.007471275835071.pth   mb1-ssd-Epoch-76-Loss-3.7874465431004447.pth
mb1-ssd-Epoch-218-Loss-4.029544852226446.pth   mb1-ssd-Epoch-77-Loss-3.834847791034847.pth
mb1-ssd-Epoch-219-Loss-3.8131336854119184.pth  mb1-ssd-Epoch-78-Loss-3.800401662348016.pth
mb1-ssd-Epoch-21-Loss-5.030396945484957.pth    mb1-ssd-Epoch-79-Loss-3.750266995531089.pth
mb1-ssd-Epoch-220-Loss-3.7892867512079516.pth  mb1-ssd-Epoch-7-Loss-5.880293455224997.pth
mb1-ssd-Epoch-221-Loss-3.7841227252576037.pth  mb1-ssd-Epoch-80-Loss-3.790931124148015.pth
mb1-ssd-Epoch-222-Loss-3.7276114823540194.pth  mb1-ssd-Epoch-81-Loss-3.7494613926318006.pth
mb1-ssd-Epoch-223-Loss-3.799472652138754.pth   mb1-ssd-Epoch-82-Loss-3.728058026030712.pth
mb1-ssd-Epoch-224-Loss-3.7750495785959197.pth  mb1-ssd-Epoch-83-Loss-3.7277570735439394.pth
mb1-ssd-Epoch-225-Loss-3.824506605050589.pth   mb1-ssd-Epoch-84-Loss-3.7323043923495938.pth
mb1-ssd-Epoch-226-Loss-3.683006483337483.pth   mb1-ssd-Epoch-85-Loss-3.7278818085842333.pth
mb1-ssd-Epoch-227-Loss-3.742463223925749.pth   mb1-ssd-Epoch-86-Loss-3.7244291667803435.pth
mb1-ssd-Epoch-228-Loss-3.671506966381949.pth   mb1-ssd-Epoch-87-Loss-3.7267692308122613.pth
mb1-ssd-Epoch-229-Loss-3.64356760169929.pth    mb1-ssd-Epoch-88-Loss-3.703605293806366.pth
mb1-ssd-Epoch-22-Loss-4.9297245715616445.pth   mb1-ssd-Epoch-89-Loss-3.7260948225803174.pth
mb1-ssd-Epoch-230-Loss-3.688968066191926.pth   mb1-ssd-Epoch-8-Loss-5.755764618358006.pth
mb1-ssd-Epoch-231-Loss-3.6833032719238066.pth  mb1-ssd-Epoch-90-Loss-3.69775080765094.pth
mb1-ssd-Epoch-232-Loss-3.5915347173441425.pth  mb1-ssd-Epoch-91-Loss-3.7238078403809887.pth
mb1-ssd-Epoch-233-Loss-3.5670561554575135.pth  mb1-ssd-Epoch-92-Loss-3.7012678806015122.pth
mb1-ssd-Epoch-234-Loss-3.5655705284314103.pth  mb1-ssd-Epoch-93-Loss-3.6890994954867415.pth
mb1-ssd-Epoch-235-Loss-3.561608221000159.pth   mb1-ssd-Epoch-94-Loss-3.7001089211487517.pth
mb1-ssd-Epoch-236-Loss-3.5979444761579535.pth  mb1-ssd-Epoch-95-Loss-3.6902907084239245.pth
mb1-ssd-Epoch-237-Loss-3.582918860887049.pth   mb1-ssd-Epoch-96-Loss-3.688935249517326.pth
mb1-ssd-Epoch-238-Loss-3.716347982942426.pth   mb1-ssd-Epoch-97-Loss-3.6946485358497703.pth
mb1-ssd-Epoch-239-Loss-3.650979427061317.pth   mb1-ssd-Epoch-98-Loss-3.6937970567507796.pth
mb1-ssd-Epoch-23-Loss-5.419604926985481.pth    mb1-ssd-Epoch-99-Loss-3.694662232281041.pth
mb1-ssd-Epoch-240-Loss-3.577769852358545.pth   mb1-ssd-Epoch-9-Loss-5.686954400564672.pth
mb1-ssd-Epoch-241-Loss-3.5249817468252282.pth  tensorboard
mb1-ssd-Epoch-242-Loss-3.5374658579539915.pth

I found TensorBoard :)

What do you think, how should i change SSD parameters to detect even smaller images using ssd 512?

In mobilenetv1_ssd_config.py:

specs = [
SSDSpec(32, 32, SSDBoxSizes(10, 15), [1, 2]),
SSDSpec(32, 16, SSDBoxSizes(20, 35), [2, 3]),
SSDSpec(16, 32, SSDBoxSizes(35, 50), [2, 3]),
SSDSpec(8, 64, SSDBoxSizes(50, 65), [2, 3]),
SSDSpec(4, 100, SSDBoxSizes(195, 240), [2, 3]),
SSDSpec(2, 150, SSDBoxSizes(240, 285), [2, 3]),
SSDSpec(1, 300, SSDBoxSizes(285, 512), [2, 3])
]

And:

def set_image_size(size=512, min_ratio=10, max_ratio=90)

How should ia change those values?

The raw loss values are dataset dependent and how challenging the dataset is, so for some datasets a loss of 0.5 is good while others a loss of 2 may be fine (just for example). I added the --validate-mean-ap option so it is easier to quantify the accuracy vs the loss. I would train it until it reaches an acceptable Mean Average Precision (mAP) on your test set for you. The mAP takes longer to compute, which is why it only runs at the end of each epoch on your test set.

Yes having an uneven class distribution can bias the model - you could try using the --balance-data option to train_ssd.py to see if it helps. But if the per-class meanAP accuracies are okay for you, you should be fine.

I think JPG files are always stored as rgb8 regardless of the image format that you input to it.

1 Like

I’m not an expert on training, and I haven’t tried train_ssd.py at a resolution greater than 512x512, but that set_image_size() function already re-calculates the SSD specs/priors for variable resolutions. You should just need to specify the --resolution option when you run train_ssd.py (this is only supported for mb1-ssd models currently)

1 Like

Thnx again Dusty! :)

Are you saying that with SSD Mobilent V2 i can use during training whatever resolution I want? I didn’t know :) - i thought in those exaples it’s only possible to use high res as 512x512 in mobilenet v1.

Then I think i should get from somewhere something like this “mb2-ssd-lite-mp-0_686.pth” - but i havent thought if there is a difference “ssd v2 lite” and “ssd vs”. Is there any differences? In my case i want to use as high resolution as possible (as i mentioned before, i need to detect very small objects or stripes).

By the way, i noticed that this augmentation part uses Resize(self.size) but this resize resizes down to 300 pixels, i think in 512 configuration i should increase it to 512 - at first i followed: How train jetson-inference ssd512 model - Jetson & Embedded Systems / Jetson TX2 - NVIDIA Developer Forums

I’ll test and write here later what was the results comparing initial training. Maybe someone is interested. I added some configuration in data_preprocessing.py:

class TrainAugmentation:
def init(self, size, mean=0, std=1.0):
“”"
Args:
size: the size the of final image.
mean: mean pixel value per channel.
“”"
self.mean = mean
self.size = size
self.augment = Compose([
ConvertFromInts(),
PhotometricDistort(),
Expand(self.mean),
RandomSampleCrop(),
RandomMirror(),
ToPercentCoords(),
RandomSaturation(), ← added this
RandomHue(), ← added this
RandomContrast(), ← added this
RandomBrightness(), ← added this
Resize(self.size),
SubtractMeans(self.mean),
lambda img, boxes=None, labels=None: (img / std, boxes, labels),
ToTensor(),
])

Alsoo, configured a little bit transforms.py (to augment not too agressively).

Also changed mobilenetv1_ssd_config.py to:

specs = [
SSDSpec(32, 32, SSDBoxSizes(10, 15), [2, 3]), ← added this (i think it makes one layer more)
SSDSpec(32, 16, SSDBoxSizes(20, 35), [2, 3]),
SSDSpec(16, 32, SSDBoxSizes(35, 50), [2, 3]),
SSDSpec(8, 64, SSDBoxSizes(50, 65), [2, 3]),
SSDSpec(4, 100, SSDBoxSizes(195, 240), [2, 3]),
SSDSpec(2, 150, SSDBoxSizes(240, 285), [2, 3]),
SSDSpec(1, 300, SSDBoxSizes(285, 512), [2, 3])
]

And now testing with saame dataset, lowered learning rate 0.005 (and added just in case some epochs) and now i’ll keep eye on tensorboard.

python3 train_ssd_gpus.py --dataset-type=voc --learning-rate=0.005 --data=data/jwt --model-dir=models/jwt --batch-size=48 --workers=24 --epochs=400 --resolution=512 --use-cuda=True --validation-mean-ap=True --gpu-devices 0 1

I’ll let you know - i dont know how many are reading this thread but i’ll write what was the result.

The variable resolutions I’ve only implemented for ssd-mobilenet-v1. And I did some testing with ssd-mobilenet-v2 on Xavier/Orin, and the runtime performance was lower than that of ssd-mobilenet-v1, so I just stick with -v1. Not all of the network architectures in train_ssd.py work with ONNX export/import into TensorRT.

Yes, it only supports square resolutions, but yes you should be able to use things other than 512x512 (although I don’t believe I have tested it thoroughly)

The TrainAugmentation object gets initialized with the image size from the configuration here:

train_transform = TrainAugmentation(config.image_size, config.image_mean, config.image_std)

And that config.image_size gets set by the set_image_size() function for ssd-mobilenet-v1. Or are you saying that even with ssd-mobilenet-v1, the augmentation Resize() transform is still getting initialized to 300x300?

1 Like

Yes but in transforms. py I see:

class Resize(object):
def init(self, size=300): ← I changed this 300
self.size = size

def __call__(self, image, boxes=None, labels=None):
    image = cv2.resize(image, (self.size,
                             self.size))
    return image, boxes, labels

maybe i rushed or didn’t undestood…

Sry, i changed this to 512