Please provide the following info (tick the boxes after creating this topic): Software Version
[*] DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other
Target Operating System
[*] Linux
QNX
other
Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
[*] DRIVE AGX Orin Developer Kit (not sure its number)
other
SDK Manager Version
1.9.2.10884
other
Host Machine Version
[*] native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other
1.If the “processorTypes” for “pass” is set to “DLA” in node.json, the corresponding mapped resource is “CUDLA_STREAM” in automatically generated stm yaml file. However, according to the Nvidia-STM-Userguide.pdf, the mapped resource for DLA should be “DLA_HANDLE”.
2.The descriptionScheduleYamlGenerator.py tool may not have been updated to support the “VPA” processor type yet. It currently supports “VPI”. Are the two equivalent?
3.How are MUTEX software resources declared in app.json assigned to the corresponding passes,use ? Are there any default allocation rules?
4.We did not find any examples of using “passDependencies” in app.json. Could you please provide provide some examples?
Yes, the Vision Programming Interface (VPI) is used to perform computer vision operations on the PVA (Programmable Vision Accelerator) hardware engine. Can you please share the file or document where you encountered the term “VPI”? This will help us provide more accurate information.
{
"comment": "Generated by the nodedescriptor tool based on data provided by the C++ API of the node class",
"generated": true,
"library": "libdwcgf_helloworld.so",
"name": "dw::framework::HelloWorldNode",
"inputPorts": {
},
"outputPorts": {
"VALUE_0": {
"type": "int",
"bindingRequired": true
},
"VALUE_1": {
"type": "int",
"bindingRequired": true
}
},
"parameters": {
"name": {
"type": "std::string"
}
},
"passes": [
{
"name": "SETUP",
"processorTypes": [
"CPU"
]
},
{
"name": "PROCESS",
"processorTypes": [
"GPU"
]
},
{
"name": "PROCESS2",
"processorTypes": [
"DLA"
]
},
{
"name": "PROCESS3",
"processorTypes": [
"PVA"
]
},
{
"name": "TEARDOWN",
"processorTypes": [
"CPU"
]
}
]
}
1. I did not see any information about DLA_HANDLE in the CGF. The description of DLA_HANDLE in Nvidia-STM-Userguide.pdf is as follows:
Resource Type: CUDA Stream, DLA Handle, PVA Stream
CUDA streams, DLA handles, and PVA streams are client-specific software resources that are
mapped to corresponding hardware engines (GPU, DLA and VPU respectively). To specify these
resources, the resource types should be set to CUDA_STREAM, DLA_HANDLE, or
PVA_STREAM respectively. The hardware engine mapping is conveyed to the compiler when
specifying the resource instances as shown in the example below. The specified hardware
resource instances should be specified under the corresponding hardware resource type in the
Global Resources section. Note that the compiler will throw an error if the limits on the mapped
resource (as specified in section 3.1.1.6) are violated.
Clients:
- Client0:
Resources:
CUDA_STREAM:
- CUDA_STREAM0: GPU0 # CUDA_STREAM0 mapped to GPU0
- CUDA_STREAM1: GPU0 # CUDA_STREAM1 mapped to GPU0
DLA_HANDLE:
- DLA_HANDLE0: DLA1 # DLA_HANDLE0 mapped to DLA1
PVA_STREAM: #A client can have one unique stream per VPU
- PVA_STREAM0: VPU0 # PVA_STREAM0 mapped to VPU0
Resource Type: Local Scheduling Mutex
Resource types other than those specified in section 3.1.1.3 above are treated as local scheduling
mutexes. These cannot be mapped to a hardware resource.
Clients:
- Client0:
Resources:
LOCAL_SCHED_MUTEX:
- LOCAL_SCHED_MUTEX0
LOCAL_RESOURCE_MUTEX:
- RESOURCE_MUTEX0
According to the description above, CUDLA_STREAM will be treated as the MUTEX resource. This is inconsistent with my expectations, so I am confused.
2. The description of the Pass processor type in driveworks-5.10/tools/schema/node.schema.json:
"processorTypes": {
"description": "The processor types used by the pass (support is limited to a single processor type atm)",
"type": "array",
"minItems": 1,
"maxItems": 1,
"items": {
"enum": [
"CPU",
"GPU",
"DLA",
"PVA"
]
}
and in driverorks-5.10/tools/descriptionScheduleYamlGenerator/descriptionScheduleYamlGenerator.py:
def __getGlobalResource(hyperepochs, machineName):
ret = {}
for hyperepoch, hyperepochDesc in hyperepochs.items():
for res in hyperepochDesc.get("resources", {}).keys():
ids = res.split(".")
# ids[0] could be machine name, client name or resource name
if ids[0] == machineName:
resourceName = ids[1]
if ":" in resourceName:
raise RuntimeError("Hardware resource name cannot be mapped")
resType = ScheduleDescription.__determineResourceType(resourceName)
if not resType in ("CPU", "GPU", "VPI", "DLA"):
raise RuntimeError("Only hardware resources can be specified with machine name")
I suggest upgrading the descriptionScheduleYamlGenerator.py tool and keeping it in sync with STM regarding the above two issues.
3. Mutex resources in STM are used to prevent concurrent execution of Runnables (pass in CGF) that own the same mutex resource:
Runnables:
......
- dwcgfHelloworld_helloWorldNode_pass_1:
......
Resources:
- CPU
- CUDA_MUTEX_LOCK
- dwcgfHelloworld_helloWorldNode_pass_2:
Resources:
- CPU
- CUDA_MUTEX_LOCK
I did not find any configuration method related to mutex resources and pass association in CGF *app.json. I need to know how to precisely configure mutex resources to the relevant passes.
Earlier when DLA is used, we had to choose one of the two engines specifically (DLA_0 vs. DLA_1). Now we have switched to only use CUDLA in all the nodes. Hence the former shouldn’t be used anymore.
We have updated descriptionScheduleYamlGenerator.py to support PVA processor in next release
For each cuda stream there is a corresponding cuda mutex lock which is listed as a resource in a hyperepoch.
I don’t have example in hand or in documentation. I will check and update you. It basically allows to add additional scheduling dependencies to influence the graph STM uses to determine the order.
Thank you for brining the issue to our attention. We will work on updating the documents to clarify this.