HDMI was flicker on demo kit when running stress on gpu

Hi
We use nano demo kit and running cude but find HDMI output was flicker.

our test setp:
1. install phoronix-test-suite_9.4
2. Open terminal and use “export TOTAL_LOOP_TIME=99999999999999999999”
3. Run “phoronix-test-suite stress-run pts/cuda-mini-nbody”

Bill

Hi Bill,

Follow your steps to run stress, but got below errors:

$ phoronix-test-suite stress-run pts/cuda-mini-nbody
STRESS-RUN ENVIRONMENT VARIABLES:
PTS_CONCURRENT_TEST_RUNS: Set the PTS_CONCURRENT_TEST_RUNS environment variable to specify how many tests should be run concurrently during the stress-run process. If not specified, defaults to 2.
TOTAL_LOOP_TIME set; running tests for 99999999999999999999 minutes
[PROBLEM] pts/cuda-mini-nbody-1.1.1 is not installed.

We don’t have experience of using cuda-mini-nbody on Jetson platforms.
Please share how can we install it. Thanks!

Hi

$phoronix-test-suite stress-run install pts/cuda-mini-nbody-1.1.1

works

Thanks HuiW.

Hi Bill,

After run below command, I can install cuda-mini-nbody-1.1.1 now.

$ phoronix-test-suite install pts/cuda-mini-nbody-1.1.1

Run

$ phoronix-test-suite stress-run pts/cuda-mini-nbody

STRESS-RUN ENVIRONMENT VARIABLES:

PTS_CONCURRENT_TEST_RUNS: Set the PTS_CONCURRENT_TEST_RUNS environment variable to specify how many tests should be run concurrently during the stress-run process. If not specified, defaults to 2.

TOTAL_LOOP_TIME set; running tests for 99999999999999999999 minutes

CUDA Mini-Nbody 2015-11-10:
pts/cuda-mini-nbody-1.1.1
Graphics Test Configuration
1: Original
2: SOA Data Layout
3: Flush Denormals To Zero
4: Cache Blocking
5: Loop Unrolling
6: Test All Options
** Multiple items can be selected, delimit by a comma. **

I select item 6 (Test All Options), running about 1 hour, I don’t see HDMI flicker issue.
Are you enable max performance mode before running?
sudo nvpmodel -m 0 ; sudo jetson_clocks

dear Carolyuu.

option “1,2” to running 1~2 day , The HDMI will show flicker.

Bill

Hi,

Please share your tegrastats when you run this test.
Also, which release are you using? Do you use sdcard image or sdkmanager?
Can you reproduce this issue on more than 2 jetson nano modules?

dear WayeWWW.

Please share your tegrastats when you run this test.

–>update test reilt .
Also, which release are you using?
–>We use Jatpack 4.3(R32.3.1)
Do you use sdcard image or sdkmanager?
–>We tested SDcard version on demo kit and use our carried board (use eMMC Nano) and same issue was happen.
Can you reproduce this issue on more than 2 jetson nano modules?
–>We had test three nano module and find same issue.

Hi bill_tu,

We can see the black line on your screenshot. But could you share the tegrastats as a file instead of a picture?
Please always share the log as text file instead of picture unless you cannot share it as text.

Also, does this always need to run 1~2 days to see the issue?

Hi Bill,

We put Nano-B01 with r32.3.1 to running over-weekend (2 days), but still can’t reproduce flicker issue.

Tegrastatus:
RAM 2099/3956MB (lfb 105x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU [35%@1479,100%@1479,0%@1479,0%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34C CPU@39.5C PMIC@100C GPU@36C AO@42.5C thermal@38C POM_5V_IN 6239/6311 POM_5V_GPU 3298/3299 POM_5V_CPU 1033/1066

RAM 2099/3956MB (lfb 105x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,100%@1479,0%@1479,0%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34C CPU@39.5C PMIC@100C GPU@36C AO@42.5C thermal@37.75C POM_5V_IN 6279/6309 POM_5V_GPU 3298/3299 POM_5V_CPU 1033/1064

I have set max performance mode before running.
sudo nvpmodel -m 0 sudo jetson_clocks

Hi Carol.

This week , we had run again of Nano demo kit(SD card version) on CUDE loading. 
The flicker was happen and system was hand up (ex: keyboard and mouser not working) , We had unplug power adapter and plug in to open system . 

Our test config.:
1. use MAXIM performanace.
2. use 25 temp .
3. run CUDE loading.
4. use Nano demo kit on SD card version.

Bill

Hi,

  1. use 25 temp .
  2. run CUDE loading.

What are 25 temp and CUDE?

dear Wayne.

Use 25 temperature mean room temperature. 
Run CUDE loading =install phoronix-test-suite and keep "phoronix-test-suite stress-run pts/cuda-mini-nbody" to ruuning .

Bill

Hi Bill,

Yes, that is how we test.
We cannot reproduce your case.

How many devices have you tried? Is this issue only happened to specific nano?
And please do reply my question above in #6.

Hi WayneWWW,

I got the issue too.
I uploaded the tegrastats and log information I got In my case. (see attached)

The settings:
Nano B01 evb board
Jatpack 4.3(R32.3.1)
use sdkmanager
test two Nanos

Thank you,display_start_stats.log (87.4 KB) dispaly_error_3.log (128.4 KB)

Hi HuiW,
Thanks for sharing the log. We are trying to reproduce this issue again.

According to the video file in your mail, I notice you are using the terminal on screen but our test was on remote access with ssh. The tegrastats lines showing up on your screen is causing more gpu rendering in your case than ours. I guess that is the reason we didn’t reproduce error.

To make this bug more clear, it is a rendering issue when gpu is under stress test (99%).

Hi HuiW,

This error log indicates the problem is from userspace tool but not nvgpu driver. Could you try other tool to run the gpu stress?

  [ 9378.548115] nvgpu: 57000000.gpu gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR]  Channel 491 to be removed from TSG 12 has NEXT set!
    [ 9378.560736] nvgpu: 57000000.gpu          gk20a_tsg_unbind_channel:164  [ERR]  Channel 491 unbind failed, tearing down TSG 12

dear Wayne.

  We had try to run CPU load and same issue happen. 
  CPU loading command: 1.export TOTAL_LOOP_TIME=99999999999999999999
                                             2. phoronix-test-suite stress-run pts/stress-ng

Bill

Our team is still checking the code of phoronix. Will update later.

Hi WayneWWW,

Thank you for your great support.
Is there any update from your team?

Thanks

Hi HuiW,

Sorry, we have some experimental patch but not yet fixed this issue.