I apologize in advance that part of this post might not seem positive but I will present some solution in the end. I believe that this has become a long standing issue and I happen to be in some role of a mediator by accident - and the stance should be stronger now to communicate where is the borderline for customers (for me mainly EU citizens). Even it might not sound positive at the start, I will try to present some options at the end to prevent further escalation and show a willing to solve it in win-win manner and deescalate so we can all continue living a peaceful life:
@Robert_Crovella Hi Robert, no you are not right. You lie now. And I am very sensitive when someone lies (Asperger).
Next week I’m launching innovative STEM education AI portal with assistants so I was not able to find all places where you mention cuFile is not part of GDS and Magnum IO, but I found two relevant things at least:
There is clear mention that cuFile is standalone independent user library and even mention of future plan that it should be merged to cuda.h. There is no mention on GDS and same information is spread across 3 web pages on your domain.
Here you can see libcufile*.deb and no GDS - that is separate. So no mention in DOCS, no connection in DEPLOY.
Please I know I came here angry, but now I’m trying to find some solution. I will propose solution at the end of this post, but first we should analyze what is happening, so we have better understanding of context and potential risks of escalation. We should deescalate. So please, I understand you need to represent stakeholders and protect their privileges, just this was unlucky example. It seems to be more business oriented approach rather than scientific analysis.
Magnum IO was the first implementation, but recently your company evolve into new proposals in terms of standalone GPUDirect functionality which targets different market than Magnum IO.
In current deployment, cuFile.so is located at \usr\local\cuda-12.5\targets\x86_64-linux\lib
Therefore it is located under cuda-12.5 toolkit. There is no mention of Magnum IO and no connection with GDS. Further more cuFile license and changelogo nowhere mentions Magnum IO nor GDS, there is contact on cudatools team only:
libcufile (1.10.1.7-1) stable; urgency=low
- Automatic Debian package build
– cudatools cudatools@nvidia.com Thu, 06 Jun 2024 10:53:01 +0000
Documentation, license and comments in header files for GDS clearly states that GDS is only a wrapper around cuFile. From software engineering point, if we use OOP and Design Patterns, GDS could have several implementations and wrappers and extend functionality with new implementation than wrapping cca four method calls around cuFile.
I would like to point out what I read somewhere, don’t remember where exactly it was stated that cuFile is using POSIX for read and writes and that is actually bottleneck and has performance impact. So, if I would have the opportunity to implement Windows support for you, for free, I would not use this POSIX approach and my implementation with further enhancements would be in fact faster than Linux implementation.
Next, in cuFile.h you already implement some kind of Windows support:
CU_FILE_HANDLE_TYPE_OPAQUE_WIN32 = 2, /*!< Windows based handle */
union {
int fd; /* Linux */
void handle; / Windows */
} handle;
I don’t want to be offensive; I’m just doing analysis and pointing out facts that are already there. Now we continue about license information:
You claim that creating the SDK did cost you quite significant financial expenses and you use commercial computer and commercial documentation and that you are giving it out for free.
Here I would like to point out that I need to do the same now and instead of doing actual work or spending time with family, I need to solve a lot of issues only with you. Other PyTorch CUDA dependencies and corporations like Microsoft or INTEL do not create so many technical issues. So, I do it also for free and I am willing to share it solely with you and you can use your license. But this will cost me around $5000.
I would like to point out that in your license there is a list of third parties and it seems most of your code is taken from universities, so in the end you might not have si significant financial losses, in naive case it could be just CTRL+C and CTRL+V and glue everything together:
Licensee’s use of the GDB third party component is
subject to the terms and conditions of GNU GPL v3
Licensee’s use of the Thrust library is subject to the
terms and conditions of the Apache License Version 2.0
In addition, Licensee acknowledges the following notice:
Thrust includes source code from the Boost Iterator,
Tuple, System, and Random Number libraries.
Licensee’s use of the LLVM third party component is
subject to the following terms and conditions:
University of Illinois/NCSA
Open Source License
Licensee’s use (e.g. nvprof) of the PCRE third party
component is subject to the following terms and
conditions:
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2012
STACK-LESS JUST-IN-TIME COMPILER
Copyright(c) 2009-2012 Zoltan Herczeg
All rights reserved. (Hungary)
THE C++ WRAPPER FUNCTIONS
-------------------------
Contributed by: Google Inc.
Copyright (c) 2007-2012, Google Inc.
All rights reserved.
Some of the cuBLAS library routines were written by or
derived from code written by Vasily Volkov and are subject
to the Modified Berkeley Software Distribution License as
follows:
Copyright (c) 2007-2009, Regents of the University of California
Some of the cuBLAS library routines were written by or
derived from code written by Davide Barbieri and are
subject to the Modified Berkeley Software Distribution
License as follows:
Copyright (c) 2008-2009 Davide Barbieri @ University of Rome Tor Vergata.
Some of the cuBLAS library routines were derived from
code developed by the University of Tennessee and are
subject to the Modified Berkeley Software Distribution
License as follows:
Copyright (c) 2010 The University of Tennessee.
Some of the cuBLAS library routines were written by or
derived from code written by Jonathan Hogg and are subject
to the Modified Berkeley Software Distribution License as
follows:
Copyright (c) 2012, The Science and Technology Facilities Council (STFC).
All rights reserved.
Some of the cuBLAS library routines were written by or
derived from code written by Ahmad M. Abdelfattah, David
Keyes, and Hatem Ltaief, and are subject to the Apache
License, Version 2.0, as follows:
-- (C) Copyright 2013 King Abdullah University of Science and Technology
Some of the cuSPARSE library routines were written by or
derived from code written by Li-Wen Chang and are subject
to the NCSA Open Source License as follows:
Copyright (c) 2012, University of Illinois.
Some of the cuRAND library routines were written by or
derived from code written by Mutsuo Saito and Makoto
Matsumoto and are subject to the following license:
Copyright (c) 2009, 2010 Mutsuo Saito, Makoto Matsumoto and Hiroshima
University. All rights reserved.
Some of the cuRAND library routines were derived from
code developed by D. E. Shaw Research and are subject to
the following license:
Copyright 2010-2011, D. E. Shaw Research.
Some of the Math library routines were written by or
derived from code developed by Norbert Juffa and are
subject to the following license:
Copyright (c) 2015-2017, Norbert Juffa
All rights reserved.
Licensee’s use of the lz4 third party component is
subject to the following terms and conditions:
Copyright (C) 2011-2013, Yann Collet.
BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
The NPP library uses code from the Boost Math Toolkit,
and is subject to the following license:
Boost Software License - Version 1.0 - August 17th, 2003
Portions of the Nsight Eclipse Edition is subject to the
following license:
The Eclipse Foundation makes available all content in this plug-in
("Content"). Unless otherwise indicated below, the Content is provided
to you under the terms and conditions of the Eclipse Public License
Version 1.0 ("EPL").
Some of the cuBLAS library routines uses code from
OpenAI, which is subject to the following license:
The MIT License
Licensee’s use of the Visual Studio Setup Configuration
Samples is subject to the following license:
The MIT License (MIT)
Copyright (C) Microsoft Corporation. All rights reserved.
Licensee’s use of linmath.h header for CPU functions for
GL vector/matrix operations from lunarG is subject to the
Apache License Version 2.0.
The DX12-CUDA sample uses the d3dx12.h header, which is
subject to the MIT license .
I don’t want to show disrespect, but it seems a lot of opensource licenses have been used in your SDK and I think there is not that many files in contract to this amount of licenses. Maybe it would be helpful to also document what part of code is actually yours only.
So, I’m done with nitpicking, but it was fair and needed to provide some analysis for all parties here.
IMPORTANT FACT: Anyone asks about cuFile won’t have any response for 3-5 years. Here I’m glad some discussion was opened, but it seems there is some will to steer the conversation. But we did not get to the point and NVIDIA never provide explanation or reason while cuFile have special place in your SDK and is so super secret that university or your opensource community or even commercial entities cannot have access to it in the same manner as other CUDA TOOLKIT files. In cuFile there is already hint of Windows support, so it would not take much effort. Nobody here knows the reason, but if it is not viable for you or you need to focus on something else, I think that could be accepted. Perhaps someone from community could help you finish it under some NDA or contract or sublicensing and share the result of work with you so you could benefit from reaching more markets and you would not have tens of angry scientists, Ph.D. teachers and developer.
Imagine these people invest mostly in RTX 4090 and now they share photos that they upgrade to RTX 5090. So it is not cheap and these people are not teenagers. But they can feel that it is not fair that because of one cuFile, they cannot use the expensive hardware of yours to full potential. Nobody needs GDS wrapper, you can keep it. Everyone wants only cuFile.
In the docs you mention that GDS and cuFile is aimed at datacenters and clouds. We fall into that category and have university or on-premise clouds mostly for the initial development and observations and it is using Windows Server. Then if we need to scale it up, we go to cloud by 3rd party provider and at that time it is not a big problem to use Linux.
Why there is so much tense from professional community regarding Windows + WSL2 for Cuda? Simply because when you develop, imagine you have 20 Python venv or conda virtual environments. You don’t want Pytorch inside WSL2 because:
- shared filesystem is a problem
- when some Python Pytorch app spawns webui on some port, it is extra work to make it accessible to Host OS, very inconvinient
- on top of that if Pytorch is providing some GUI with dialogs (we are still in development and testing phase), it is very cumberstome to render Linux GUI inside WSL2 to Windows. Mostly it means you cannot have headless Ubuntu and you cannot even use WSL2. Standard approach, from what I know, is that you need to purchase X410 and instead of WSL2 you need to use Hyper-V, then you are able to redirect just one window from X11 to Windows using sockets.
- during development on Windows you also provide help or analyze 3rd party solutions, these mostly use older Python or older CUDA Toolkit, in Windows you can easily switch priority of CUDA version in Environment Variables dialog. In Linux I don’t know how it would be difficult, but if WSL2 is using direct DMA GPU access to Host OS, it means now you need to do these changes on two places, in Windows as well in Linux
Therefore, if everyone could develop using VS Code and local PyTorch and just route calls from libraries that call DLLs to WSL2 container to .so shared libraries, it would be much better. If the WSL2 with CUDA would have implementation with all proper dependencies, that would mean performance gain around 20-30%. So the WSL2 Ubuntu container should not mean you need to move whole development environment inside this container. This container you provide with WSL2 Ubuntu has cuFile and because of WSL2 it was never meant to be used in the cloud actually, right? This approach, design and architecture is therefore wrong and misleading. WSL2 Ubuntu seems to be a special use case or edge case and it should act as supporting backend only - that would be used from outside and the only purpose of it would be CUDA TOOLKIT.
Now I have a few options in mind and you can choose the one that would be the best for you with least effort:
-
the people around I talk with mostly work at universities across Europe and have even Patreon or Youtube to share knowledge and spread and promote NVIDIA technologies and products, me personally is working on STEM education platform - like ChatGPT but with a lot of individual assistens and including real-time avatar with microphone communication in many European languages: Would it be possible if we have some association of universities and want to ask you for licensing cuFile for non-profit?
-
Microsoft GPUDirect: would it be possible to have 1:1 replacement of cuFile, and possibly GDS, with Microsoft GPUDirect? In Windows 11 it shows that our cards have it enabled and it is supported out of the box
-
Until we find some solution like licensing cuFile or adding it to CUDA TOOLKIT for Windows or letting someone finish Windows implementation, my idea how not to make everyone angry and stop them complaining here on forums or on Github, I brainstorm various solution with AI and one of them could look like this:
ChatGPT (gpt4-o) would read API documentation of header outline on your website and generate stubs. These stubs would then be cross-compiled for both Linux and Windows. But they would be placed just in Windows, somewhere on class path of Pytorch virtual environment. Now someone call any of these libraries from Windows Dev env (VS Code + PyTorch), this stub would act as a proxy design pattern and the function call with all parameters would be routed to WSL2 container (either using SSH or shared memory). The result would be then returned back to Host OS (Windows) to PyTorch. Therefore developers would focus on Windows only and CUDA TOOLKIT in WSL2 would act as background service. This would provide a seamless workflow and they should even know about it because it would be implemented in that way. So nobody would get angry anymore :-D
The high level outline of this solution is here: Project: Phantom CUDA Bridge · NANOTRIK-AI · Discussion #1 · GitHub
Thank you and have a nice day! And apologize that I probably made you stand up, off the chair, few times :-D
but now you know why it is not good to ask me where I read some claim, I have good memory, but it is difficult to find some nested page and also I am perfectionist so I get stuck in loop and it takes me 2-3 hours to provide some data and write some response.
The purpose of having more agressive stance was to express various frustrations of several people. It is midnight and I need to work on some AI now.