Dear all,
I have been trying to connect six switches using octahedron network topology. I modified the opensm.conf file while starting opensm and check if there is any error.
The opensm.conf
file I entered modifies the default opensm.conf file for the following lines:
max_op_vls 8
routing_engine nue
avoid_throttled_links TRUE
nue_max_num_vls 8
qos TRUE
# QoS default options
qos_max_vls 8
qos_high_limit 6
qos_vlarb_high 0:4,1:4,2:4,3:192,4:16,5:32,6:64,7:128
qos_vlarb_low 0:64,1:64,2:64,3:64,4:64,5:64,6:64,7:64
qos_sl2vl 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
However, there are errors like the get_max_num_vls: WRN NUE47: user requested maximum #VLs is larger than supported #VLs
exist in the log.
Am I not suppose to set max_op_vls
too large?
Many thanks!!
The error is issued when the nue_max_num_vls is larger than any of the portinfo opVL across the fabric.
ibdiagnet (db_csv file) would hold the opVLs configured on the fabric ports – need to ensure those are all <=8
Have you tried reducing the number of VLs to see if it is not reproducing?
Regardless – the Nue protocol isn’t maintained by NVDA. For issues with the protocol, it is required to contact the dev.
Thanks to your response!!
Yes by reducing max_op_vls to 4 it works. We have tried that both max_op_vls 8
and max_op_vls 7
do not work.
We are running an application where the message size distribution is
| Message Sizes summary for all ranks
|-----------------------------------------------------------------------------------------------------
| Message size(B) Volume(MB) Volume(%) Transfers Time(sec) Time(%)
|-----------------------------------------------------------------------------------------------------
0 0.00 0.00 24399260211 4305559.54 87.02
1 269.42 0.03 282512124 505425.38 10.21
785 17.79 0.00 23760 17107.65 0.35
3 292.57 0.03 102260551 15810.23 0.32
19 715.04 0.08 39461907 14210.46 0.29
8640 17.81 0.00 2161 5834.82 0.12
The main objective for us is to know how to tune the QoS to get a better performance.