Tensor volume exceeds (2^31)-1

configuration as below:

• Hardware Platform (Jetson / GPU)
jetson NX
• DeepStream Version
6.0
• JetPack Version (valid for Jetson only)
4.6.2
• TensorRT Version
8.2.1.8

When I tried to run yolov5s-customed model without .engine file , it occured:

(py3) shisun@nx:/opt/nvidia/deepstream/deepstream/source/DeepStream-Yolo$ deepstream-app -c deepstream_app_config.txt

Using winsys: x11
ERROR: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.0/source/DeepStream-Yolo/5s_ghost.engine open error
0:00:03.172938201 4960 0x13963670 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1889> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-6.0/source/DeepStream-Yolo/5s_ghost.engine failed
0:00:03.201935543 4960 0x13963670 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1996> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-6.0/source/DeepStream-Yolo/5s_ghost.engine failed, try rebuild
0:00:03.202025590 4960 0x13963670 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 1]: Trying to create engine from model files

Loading pre-trained weights
Loading weights of 5s_ghost complete
Total weights read: 3702174
Building YOLO network

  layer                        input               output         weightPtr

(0) conv_silu 3 x 640 x 640 32 x 320 x 320 3584
(1) conv_silu 32 x 320 x 320 256 x 320 x 320 12800
(2) maxpool 256 x 320 x 320 256 x 320 x 320 12800
(3) maxpool 256 x 320 x 320 256 x 320 x 320 12800
(4) maxpool 256 x 320 x 320 256 x 320 x 320 12800
(5) route - 1024 x 320 x 320 12800
(6) conv_silu 1024 x 320 x 320 512 x 320 x 320 539136
(7) upsample 512 x 320 x 320 512 x 640 x 640 -
(8) route - 1024 x 640 x 640 539136
(9) upsample 1024 x 640 x 640 1024 x1280 x1280 -
ERROR: [TRT]: 4: [graphShapeAnalyzer.cpp::processCheck::581] Error Code 4: Internal Error ((Unnamed Layer* 19) [Concatenation]_output: tensor volume exceeds (2^31)-1, dimensions are [2048,1280,1280])
deepstream-app: utils.cpp:147: int getNumChannels(nvinfer1::ITensor*): Assertion `d.nbDims == 3’ failed.
Aborted (core dumped)

The problem is here:
Error ((Unnamed Layer* 19) [Concatenation]_output: tensor volume exceeds (2^31)-1, dimensions are [2048,1280,1280])

How to fix this pls?

I guess you also can not run the model directly using trtexec. did you try it?

Thank you for reply.
I failed to transfer my model to onnx and engine file, and when I tried to convert to wts via tensorrtx(wangxinyu), I got wrong cfg file. but right engine file can be exported from export.py of yolov5b6.1, and the engine file can run on yolov5 correctly.

Has your problem been resolved?

mot yet, I reflashed nx with JP4.6 & 4.6.1, and
when deepstream load my engine file, another error occured
something like as below:
number os detection missmatch, configued 1, the network 0.
I guess the problem is wrong parsing of the engine file.
Now I am trying to use tensortx(wangxinyu) to redefine my
network and to produce .wts and ,cfg file for deepetream to
produce right engine file. please advice me is that a right way?

in yolov5, common.py:
class Conv(nn.Module):
# Standard convolution
def init(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super().init()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

def forward(self, x):
    return self.act(self.bn(self.conv(x)))

def forward_fuse(self, x):
    return self.act(self.conv(x))

in tensorrtx, common.hpp, I added the corresponding definition:
ILayer* Ghostconv(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, ITensor& input, int outch, int ksize, int s, int g, std::string lname) {
Weights emptywts{ DataType::kFLOAT, nullptr, 0 };

int c_=c2 / 2;
auto cv1 = convBlock(network, weightMap, input, (int)((float)c2 * e), k, s, g, lname + “.cv1”);
auto cv2 = convBlock(network, weightMap, *cv1->getOutput(0), c2, 5, 1, g, lname + “.cv2”);

ITensor* inputTensors[] = { cv1->getOutput(0), cv2->getOutput(0) };
auto cat = network->addConcatenation(inputTensors, 2);

return cat;

}

common.py:
class DWConv(Conv):
# Depth-wise convolution class
def init(self, c1, c2, k=1, s=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super().init(c1, c2, k, s, g=math.gcd(c1, c2), act=act)

common.hpp:
ILayer* DWConv(INetworkDefinition network, std::map<std::string, Weights>& weightMap, ITensor& input, int outch, int ksize, int s, int g, std::string lname) {
Weights emptywts{ DataType::kFLOAT, nullptr, 0 };
int p = ksize / 3;
int k = 1;
int s = 1;
int g = __gcd(c1, c2);
IConvolutionLayer
conv1 = network->addConvolutionNd(input, outch, DimsHW{ ksize, ksize }, weightMap[lname + “.conv.weight”], emptywts);
assert(conv1);
conv1->setStrideNd(DimsHW{ s, s });
conv1->setPaddingNd(DimsHW{ p, p });
conv1->setNbGroups(g);
IScaleLayer* bn1 = addBatchNorm2d(network, weightMap, *conv1->getOutput(0), lname + “.bn”, 1e-3);

// silu = x * sigmoid
auto sig = network->addActivation(*bn1->getOutput(0), ActivationType::kSIGMOID);
assert(sig);
auto ew = network->addElementWise(*bn1->getOutput(0), *sig->getOutput(0), ElementWiseOperation::kPROD);
assert(ew);
return ew;

}

common.py:
class Bottleneck(nn.Module):
# Standard bottleneck
def init(self, c1, c2, shortcut=True, g=1, e=0.5): # ch_in, ch_out, shortcut, groups, expansion
super().init()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c_, c2, 3, 1, g=g)
self.add = shortcut and c1 == c2

def forward(self, x):
    return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

common.hpp:
ILayer* Ghostbottleneck(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, ITensor& input, int c1, int c2, int k, int s, std::string lname) {
auto cv1 = Ghostconv(network, weightMap, input, (int)((float)c2 * e), 1, 1, 1, lname + “.cv1”);
auto cv4 = DWconv(network, weightMap, input, c1, 3, 1, g, lname + “.cv4”);
if (s==2){
auto cv2 = DWconv(network, weightMap, *cv1->getOutput(0), c2, 3, 1, g, lname + “.cv2”);
auto cv3 = Ghostconv(network, weightMap,*cv2->getOutput(0), c2, 1, 1, 1, lname + “.cv3”);
auto cv5 = convBlock(network, weightMap, *cv4->getOutput(0), c2, 1, 1, 1, lname + “.cv4”);
}
else {
auto cv3 = Ghostconv(network, weightMap,*cv1->getOutput(0), c2, 1, 1, 1, lname + “.cv3”);
cv5 = cv4
}

auto ew = network->addElementWise(*cv3->getOutput(0), *cv5->getOutput(0), ElementWiseOperation::kSUM);
return ew;

}common.py:
class C3(nn.Module):
# CSP Bottleneck with 3 convolutions
def init(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
super().init()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2)
self.m = nn.Sequential((Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
# self.m = nn.Sequential(
[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])

def forward(self, x):
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))

common.hpp:
ILayer* C3Ghost(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, ITensor& input, int c1, int c2, int n, bool shortcut, int g, float e, std::string lname) {
int g = 1;
float e = 0.5
int n = 1
int c_ = (int)((float)c2 * e);
auto cv1 = convBlock(network, weightMap, input, c_, 1, 1, 1, lname + “.cv1”);
auto cv2 = convBlock(network, weightMap, input, c_, 1, 1, 1, lname + “.cv2”);
ITensor *y1 = cv1->getOutput(0);
for (int i = 0; i < n; i++) {
auto b =Ghostbottleneck(network, weightMap, *y1, c_, c_, shortcut, g, 1.0, lname + “.m.” + std::to_string(i));
y1 = b->getOutput(0);
}

ITensor* inputTensors[] = { y1, cv2->getOutput(0) };
auto cat = network->addConcatenation(inputTensors, 2);

auto cv3 = convBlock(network, weightMap, *cat->getOutput(0), c2, 1, 1, 1, lname + ".cv3");
return cv3;

}

I have not comiled common.hpp yet, and
I am not sure whether it works or not.

Please refer to DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums.

thank you, I am going to read this

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hello Chen,
I am facing the same issue
[graphShapeAnalyzer.cpp::processCheck::581] Error Code 4: Internal Error (StatefulPartitionedCall/sequential/lstm/PartitionedCall/while_loop:7: tensor volume exceeds (2^31)-1, dimensions are [2147483647,1,16])
[06/29/2022-12:26:20] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[06/29/2022-12:26:20] [E] Engine could not be created from network
[06/29/2022-12:26:20] [E] Building engine failed
[06/29/2022-12:26:20] [E] Failed to create engine from model.
[06/29/2022-12:26:20] [E] Engine set up failed

Could you please guide me in resolving this.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.