Improving GPU Performance by Reducing Instruction Cache Misses

Originally published at: https://developer.nvidia.com/blog/improving-gpu-performance-by-reducing-instruction-cache-misses-2/

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming multiprocessors (SMs), and an array of facilities to keep them fed with data: high bandwidth to memory, sizable data caches, and the capability to switch to other teams of workers (warps)…

This post combines multiple features of the Nsight Compute tool to analyze performance of a certain workload. Please let us know if you have questions about the presentation or specifics of the use of the tool.

This is a question regarding HE in federated learning developed by nVIDIA in Clara 4.0:

In a scenario where the goal is to foster collaboration among competing companies in a market, companies participating as clients in Federated Learning (FL) each hold their own decryption keys to access the updates in the model they receive from the server. However, I’m curious about how the updates encrypted by other clients are handled, given that no client possesses the keys to decrypt another client’s updates. Could someone please clarify this? Thank you!

@khaliliamir90 – Did you mean to post this on the Federated Learning with Homomorphic Encryption post?

Yes, my bad, I posted my message there, after realizing my mistake. Thanks!