I’m new on this community but working on mellanox IB since a little while.
I recently installed UFM 4.0 with a remote monitoring history database ( ~200 node cluster)
Everything is working fine with that excepted a limitation on the history graphical display.
GUI is explaining a limitation of 40 ports per monitoring session, but currently I’m only allowed to run session with 20 ports.
Am I missing something ?
Thank you for your answer.
I understand the load problems on a live session. Especially on large scale fabrics, where UFM server is already quite on a load.
But if i’m right with the new scheme of ufm 4.0 and the history feature, polling of the whole fabric is continuous. On my example, i have a schedule every 30 second, I’m not anymore running live session, and history database is on a second server (not on the UFM one).
The graphical display of history is important for me to have fast overview of congestion counters.
I’ll be happy to give you feedback if you have someone looking into it.
I think that this limit is an outcome of a limitation the product has with the live monitoring sessions.
With the live monitoring session, sampling too many ports at the same time consumes to many resources and can also pose heavier load on the fabric (every counter polling introduces traffic) - this is why the guys limited the amount of ports you can sample at the same time.
with the history feature, the behavior is the same but i am not sure about the amount of load. i can have somebody look into this.
Thanks for your insight. I can forward this to some folks that looks into UFM’s future and hopefully they can improve this area.
can you please describe which counters were you selecting, for how many ports were you asking the data for (how many devices, how many ports).
what was the error you received (a screenshot would be great!!)
the team in Mellanox will look into this.
sorry for the late answer.
i think that you can bypass this limitation by exporting the history data to a CSV file (you can export up to 1G of information) and then analyze as many ports as you’d like offline away from the UFM system .
Putting together the numbers: server=2 ports (for each HCA) so 20 servers runs 40 ports (potentially).
36 ports switch is under the 40 limitation so no issues with that part.
i guess, for your purpose, CSV will work but here is another advise (and sorry i am not being specific enough):
from my experience, if you use the history feature to investigate an event that took place in the past, you usually have some idea about which elements were involved with the issue. Try to narrow down your data analysis only to relevant ports (as much as possible). It will make your work more efficient too.
Thanks for updating,
In fact when starting a history session, I can select 20 ports (aka 20 nodes). Then I am able to select 4 counters.
I would like to select more ports (21…), but I’m blocked, popup says that I can’t display more than 40 ports.
Screenshot may not be possible easily but I can give you on Monday more details and the precise message.
My answer is late too, sorry for that.
I’ll try to use the csv export fonctionnality.
The fact is that internal ufm display is really convinient for fast analysis.
Maybe I’m missunderstanding the ufm terminology:
When I launch a monitoring history session, I’m able to select 2 modes:
So when I use graphical mode I can select up to 20 nodes (compute node) per session.
if I run a new session, I can select a switch with 36 ports and the session run without problem.
Am I missing some vocabulary ? Is there a reason for having graphical limitation on ports, and csv limitation on nodes ?