Use nvcuvid with several board

amillet · October 20, 2010, 7:19am

Hi,

I am a developer from Civolution France, a company specialized in video application.
We are new to CUDA development, but we aim to develop an application which will decode a H.264 stream insert a logo on it and re-encode the video to a new H.264 stream.
As we need high velocity for this application, we want t use the CUDA technology to speed up the decoding and encoding process.

We have identify a H.264 encoder SDK, provided by MainConcept, which is based on CUDA. For de decoding process, we tried to use the NVCUVID API from NVIDIA, which seems to be quite fast for decoding H.264 stream.
So, for test purpose, I started from the cudaDecodeGL sample provided by NVIDIA and I integrated in this sample the call to the Mainconcept SDK to encode to a new H.264 stream.
Whe using only one board (tested ith GeForce 470 and Tesla C2050), the results obtained by this test program are quite good.

My problem comes when I want to start several instances of the program simultaneously using different board. For doing this tests, we had access to a server with 8 Tesla C2050 board inside (running Windowq 7 Professionnal). Our test program (through command line option) allows to define which card to use for the process. So I created a .bat file which runs 8 instances of the test program (one instance for each Tesla board). When I tried to run the .bat file, the PC freezed and I got a bluescreen and reboot.

When I run only one instance of the program, everything is fine, whatever the Tesla board I use. I have done another test then. I introduced a pause command in the .bat file between each launch of the test program so that the program are not run simultaneaouly. I did not get any blue screen then, but some of the test program instances returned error code or crashed (this can work up to 4 or 5 instances, but all others return errors or crash).

So I tried to do my tests using the cudaDecodeGL sample from NVIDIA (modified so that I can specify a specific device to run on), and I got the same results.
I tried also to remove the OpenGL code from this sample (as we do not need display with our case), but with the same result (one instance work file, several instances does not - can be fine up to 4 or 5 instance but no more).
I have no idea where the problem is.

Does anyone know if there is some limitation concerning the nvcuvid API ? Is this supposed to run simultaneaouly on several nvidia board ?
If yes, can anyone provide me with a simple sample code doing this (as we do not need display, we do not need to use the OpenGL code).

Thanks,
Regards

amillet · October 20, 2010, 7:19am

Hi,

I am a developer from Civolution France, a company specialized in video application.
We are new to CUDA development, but we aim to develop an application which will decode a H.264 stream insert a logo on it and re-encode the video to a new H.264 stream.
As we need high velocity for this application, we want t use the CUDA technology to speed up the decoding and encoding process.

We have identify a H.264 encoder SDK, provided by MainConcept, which is based on CUDA. For de decoding process, we tried to use the NVCUVID API from NVIDIA, which seems to be quite fast for decoding H.264 stream.
So, for test purpose, I started from the cudaDecodeGL sample provided by NVIDIA and I integrated in this sample the call to the Mainconcept SDK to encode to a new H.264 stream.
Whe using only one board (tested ith GeForce 470 and Tesla C2050), the results obtained by this test program are quite good.

My problem comes when I want to start several instances of the program simultaneously using different board. For doing this tests, we had access to a server with 8 Tesla C2050 board inside (running Windowq 7 Professionnal). Our test program (through command line option) allows to define which card to use for the process. So I created a .bat file which runs 8 instances of the test program (one instance for each Tesla board). When I tried to run the .bat file, the PC freezed and I got a bluescreen and reboot.

When I run only one instance of the program, everything is fine, whatever the Tesla board I use. I have done another test then. I introduced a pause command in the .bat file between each launch of the test program so that the program are not run simultaneaouly. I did not get any blue screen then, but some of the test program instances returned error code or crashed (this can work up to 4 or 5 instances, but all others return errors or crash).

So I tried to do my tests using the cudaDecodeGL sample from NVIDIA (modified so that I can specify a specific device to run on), and I got the same results.
I tried also to remove the OpenGL code from this sample (as we do not need display with our case), but with the same result (one instance work file, several instances does not - can be fine up to 4 or 5 instance but no more).
I have no idea where the problem is.

Does anyone know if there is some limitation concerning the nvcuvid API ? Is this supposed to run simultaneaouly on several nvidia board ?
If yes, can anyone provide me with a simple sample code doing this (as we do not need display, we do not need to use the OpenGL code).

Thanks,
Regards

Sarnath · October 20, 2010, 7:52am

Just a suggestion:

Set compute-exclusive mode – It will automatically take care of reserving one card for one application. In Linux, we use nvidia-smi to do it. There must be something on windows too, I would presume…

Try incremental number of cards… 8 cards at a time could be overkill…

Sarnath · October 20, 2010, 7:52am

Just a suggestion:

Set compute-exclusive mode – It will automatically take care of reserving one card for one application. In Linux, we use nvidia-smi to do it. There must be something on windows too, I would presume…

Try incremental number of cards… 8 cards at a time could be overkill…

amillet · October 20, 2010, 8:24am

It is always the same application that I use. But need to run it on several board at the same time (one instance per board).
We may also need to run several instances (2 may be 3) of the program on the same board.

I tried incremental test (using the pause command in my .bat file), and it seems that it can run up to 4 or 5 instance at the same time.

Regards

amillet · October 20, 2010, 8:24am

It is always the same application that I use. But need to run it on several board at the same time (one instance per board).
We may also need to run several instances (2 may be 3) of the program on the same board.

I tried incremental test (using the pause command in my .bat file), and it seems that it can run up to 4 or 5 instance at the same time.

Regards

Sarnath · October 20, 2010, 8:45am

compute-exclusive understands “instances” of same application. Multiple instances of same applicaton are still different processes and hence will benefit from compute-exclusive mode…

Sarnath · October 20, 2010, 8:45am

compute-exclusive understands “instances” of same application. Multiple instances of same applicaton are still different processes and hence will benefit from compute-exclusive mode…

Topic		Replies	Views
FYI: Windows 7 can use cards from multiple vendors CUDA Programming and Performance	5	9339	February 9, 2010
CUDA multi-threaded programming CUDA Programming and Performance	3	4300	December 22, 2023
64 bit Windows 10, gtx 1060, CUDA kernel startup time? CUDA Programming and Performance	12	2849	October 10, 2017
Linux CUDA kbuntu/ubuntu 11.10 CUDA Programming and Performance	13	101356	November 25, 2011
Different performance from different GPUs with Identical Code CUDA Programming and Performance	18	4406	April 11, 2012
Slow CUDA programs' startup CUDA Programming and Performance	10	7286	January 23, 2012
260M GPU memory usage for one GPU h264 video decoder is normal? Video Processing & Optical Flow	10	4061	February 7, 2020
continuously using h264 cuvid with h264_nvenc makes the encoding process hang GPU-Accelerated Libraries	2	4411	May 11, 2017
Sharing the same Cuda context for encoding(NVENC) and decoding(NVDEC) Video Processing & Optical Flow	13	4404	January 12, 2020
NVCUVID performance while decoding multiple videos at the same time CUDA Programming and Performance	0	1028	April 8, 2015

Use nvcuvid with several board

Related topics