Make full use of the swapfile

2537998622 · May 19, 2023, 5:14am

Hi, I’m trying to convert onnx to tensorrt file, but failed due to not enough space.

I have a swap space which is 8G, while it was only used about 100MB when converting the tensorrt file. Is there a way to make full use of the swap file?

AastaLLL · May 19, 2023, 6:53am

Hi,

TensorRT needs a GPU-accessible buffer to work.
However, swap can only be used by the CPU.

Thanks.

2537998622 · May 19, 2023, 7:20am

Hi, thanks for replying!
I have tried hard to free the memory space, are there other ways to spare more space from the memory?
Besides, though tried hard, I still failed to convert the model. Could you help me out?
I converted the onnx file to trt file with this code:

import os 
import tensorrt as trt
os.environ["CUDA_VISIBLE_DEVICES"]='0'
TRT_LOGGER = trt.Logger()
onnx_file_path = 'end2end.onnx'
engine_file_path = 'end2end.trt'

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
    config = builder.create_builder_config()
    config.max_workspace_size = 1 << 28 # 256MiB
    builder.max_batch_size = 1
    # Parse model file
    if not os.path.exists(onnx_file_path):
        print('ONNX file {} not found, please run yolov3_to_onnx.py first to generate it.'.format(onnx_file_path))
        exit(0)
    print('Loading ONNX file from path {}...'.format(onnx_file_path))
    with open(onnx_file_path, 'rb') as model:
        print('Beginning ONNX file parsing')
        if not parser.parse(model.read()):
            print ('ERROR: Failed to parse the ONNX file.')
            for error in range(parser.num_errors):
                print (parser.get_error(error))

    network.get_input(0).shape = [1, 3, 512, 512]
    print('Completed parsing of ONNX file')
    print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
    #network.mark_output(network.get_layer(network.num_layers-1).get_output(0))
    profile = builder.create_optimization_profile()
    config.add_optimization_profile(profile)
    trt_model_engine  = builder.build_engine(network, config)
    # trt_model_context = trt_model_engine.create_execution_context()
    
    # engine = builder.build_cuda_engine(network)
    print("Completed creating Engine")
    with open(engine_file_path, "wb") as f:
        f.write(trt_model_engine.serialize())

and encountered runtime error:

Loading ONNX file from path end2end.onnx...
Beginning ONNX file parsing
[05/19/2023-14:54:09] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/19/2023-14:54:09] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
Completed parsing of ONNX file
Building an engine from file end2end.onnx; this may take a while...
[05/19/2023-14:58:46] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::100] Error Code 1: Cuda Runtime (the launch timed out and was terminated)
[05/19/2023-14:58:46] [TRT] [W] GPU error during getBestTactic: ArgMax_372 : the launch timed out and was terminated
[05/19/2023-14:58:46] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::100] Error Code 1: Cuda Runtime (the launch timed out and was terminated)
[05/19/2023-14:58:47] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ArgMax_372.)
Completed creating Engine
Traceback (most recent call last):
  File "onnx2trt.py", line 37, in <module>
    f.write(trt_model_engine.serialize())
AttributeError: 'NoneType' object has no attribute 'serialize'

Could you help me out?

AastaLLL · June 1, 2023, 4:12am

Hi,

Unfortunately, the memory that can be used by GPU is fixed.

Based on the TensorRT log you shared, the model is too complicated to run on Nano.
We have lots of devices that are equipped with larger memory.
You can find one and switch to it.

Thanks.

linuxdev · June 1, 2023, 3:43pm

I’ll add some explanation to that…

In the kernel itself, where drivers are accessed, this is done via an actual physical address. User space applications tend to use virtual addresses, of which swap is one category. The GPU cannot use an address which is virtual. It is difficult, but since the GPU shares memory with the rest of the o/s, sometimes you can reserve a larger contiguous block of memory if it is part of the kernel’s command line during load (I’m the wrong person to ask how to do that, I don’t know the specific arguments which would apply to a Nano). That would tend to only provide a bit more memory and isn’t a particularly good solution.

Newer Jetson models do have more memory (and are also accordingly more expensive). On a desktop PC one of the big reasons why people pay for a lot of VRAM (discrete GPUs have their own RAM and don’t share with the o/s) is for precisely that reason: They too are limited in things like CAD textures and AI learning, and cannot really swap out as easily as a normal user space program can. This is why more expensive video cards (even ones with lower performance) can have 24 GB or 48 GB of VRAM and people will pay for that. Incidentally, if you are training, then you could do that on a desktop system (perhaps actual running of the AI takes less memory).

2537998622 · June 2, 2023, 9:30am

Thanks for your comprehensive response!

system · June 28, 2023, 2:09am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.