Working with FP4 when it appears impossible to get a 5090

hexexpert5 · March 18, 2025, 5:07am

Today at GTC I saw a session on TensorRT and using FP4. How am I supposed to leverage this knowledge when I can’t upgrade my 2.5 year old 4090 to a 5090? I will say that Newegg has sold “bundles” at $6000 which have a 5090 in them and they actually last a few minutes. I can’t afford that. I have been unable to just order a 5090 and because I’m not a gamer but only an AI and content creator I don’t use the NVidia app so I’m not even sure NVidia knows I’m an old time 4090 user and thus I probably am being rejected by the priority access program. I’ve accomplished so much with the 4090 and published my finding but am now blocked. Please help.

njuffa · March 18, 2025, 6:18pm

I think you are trying to make the point that you should have priority access to scarce new hardware. If your expectation is that someone from NVIDIA would intervene on your behalf after reading your post (not sure there is a realistic chance of that; the forums would be flooded with “Pick me!” posts if it worked like that), that point would be stronger if you linked your publication(s) here, or at least cited them.

hexexpert5 · March 18, 2025, 11:52pm

I appreciate your response. By publish I mean I have recorded evidence of things I’ve done. As opposed to someone that simply claims things. For instance, videos are now all the rage in the Stable Diffusion community and I believe I can make a case that I, more than likely, was the first person on the planet to demonstrate “real-time” generation of videos and real-time stable diffusion inferencing. Thus, a few highlights that are all Stable Diffusion 4090 performance related.

Preface Aug 2022: Having just retired from MSFT and about a week before Stable Diffusion(SD) even had a wikipedia page I learned of SD and was hooked. Not for the creative beauty that could be rendered but for the AI aspects. Having spent 40+ years focused on software performance in SQL databases I used that experience to take on SD performance as my hobby.

Am I just another gamer trying to get a 5090 or …

Nov/Dec 2022: Got may new system with a 4090/i9-13900K. Doing SD on my laptop before that.
Dec/Jan: Learning python but already posting about SD perf and helping other get performance. A small sample of my posts;
** 2023-01-05: Woo Hoo! .496 seconds for a 512x512 image at 20 steps
** 2023-01-16: Requested a pytorch change which tripled 4090 SD inference speed
** 2023-01-19: Showed users how to get a 3X perf improvement on a 4090
** 2023-03-19: Used torch.compile to hit 51 it/s
** 2023: Even TomsHardware 4090 benchmark comparison was flawed. I let them know.
** 2023: Explain to non-technical image creators why CPU performance actually made a difference in the early days of the 4090 with batchsize=1 512x512 images.
** 2023-06-21: Allow users access to my home system showing how fast a 4090 can be
** 2023-07-21: Explained why the often used it/s was a useless measure of performance
2023-10-22: Latent Consistent Models(LCM) paper published and announced on reddit by Simian Luo. What used to take 20+ inference steps was now doable in 4 or less steps. I saw this on reddit just hours after it was posted. Having already created high perf inference pipelines I immediately realized that with Simian’s LCM I could generate real-time 512x512 videos at 15 fps. I also coined the term RTSD(real-time SD) where one could move sliders for various SD parameters and get instant image updates(This is an interesting story itself). This has only evolved since then to the point where I can do 23 fps at 1280x1024 with my multi-modal interactive system with voice input, panning, zooming and other effects.
** First post of many documenting my evolution of real-time videos.
2023-11-18: Crude early “REAL-TIME” deep fake This is me on camera with a prompt of Biden
2023-12-02: Twitter post announcing 77 images/sec on a 4090
2023-12-05: Twitter post where I upped the perf to 149 images/sec
2024-03-26: Twitter post hitting 294 images/sec optimizing the heck out of everything
2024-03-31: First time showing my RT videos on Twitter
You can see many demos here along with my multi-modal RT gen app: https://x.com/Dan50412374/ Emad Mostaque (Former StabilityAI CEO) saw one of these and gave me a “good job” comment.

Topic		Replies	Views
Does NVidia know about the 300% perf improvement cuDNN can provide? CUDA Programming and Performance cuda	6	4009	November 4, 2023
4090 doesn't have fp8 compute? CUDA Programming and Performance	20	14856	August 6, 2024
Why are GPU so memory bound? CUDA Programming and Performance	3	2468	January 22, 2023
Standard nVidia CUDA tests fail with dual RTX 4090 Linux box Linux	54	21169	April 29, 2024
Davinci Resolve Studio 19 in need of the right NVIDIA Graphics Card Raytracing	2	520	March 11, 2025
Blackwell Caution? CUDA Programming and Performance	2	98	February 5, 2025
Maximum power draw 3090 CUDA Programming and Performance	3	10831	February 23, 2021
why cudaGetDeviceProperties and cudaMallocPitch consume a lot of time CUDA Programming and Performance	18	2366	January 9, 2017
Problems with 4090, CUDA (samples), cuDNN (sample). Are these expected? cuDNN tensorrt , cuda	5	4813	March 17, 2023
Which one is more suitable for my needs? A100 or 4090? CUDA Programming and Performance	12	40993	January 29, 2024

Working with FP4 when it appears impossible to get a 5090

Related topics