Fq2bam on GCP

Hi! I am having issue running fq2bam on GCP using Nextflow, and the error message was not clear.

process PARABRICKS_FQ2BAM {
    tag "$meta.id"
    accelerator 4, type: 'nvidia-tesla-t4'
    cpus 32
    memory '196 GB'
    machineType 'n1-standard-32'
    disk 375.GB, type: 'local-ssd'

    container "nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1"

    shell = ['/bin/bash', '-euo', 'pipefail']

    input:
    tuple val(meta), path(R1_reads), path(R2_reads)
    tuple val(meta2), path(index_files)
    tuple val(meta3), path(fasta), path(fasta_fai), path(ref_dict)

    output:
    tuple val(meta), path("*.bam")               , emit: bam
    tuple val(meta), path("*.bai")                  , emit: bai
    path("*_duplicate_metrics.txt")            , emit: duplicate_metrics

    when:
    task.ext.when == null || task.ext.when

    script:
    def args = task.ext.args ?: ''
    def prefix = meta.prefix ? "${meta.prefix}" : "${meta.id}"

    """
    in_fq_command=""
    for fq in `ls $R1_reads`
    do
        filename=\$(basename \$fq _R1.fq.gz)
        read_group=\$(echo \$filename | awk -F'_' '{print \$(NF-1) "_" \$NF}')
        sample_name=${prefix}
        read_group_string="@RG\\tID:\${read_group}\\tLB:\${sample_name}\\tPL:Illumina\\tS
M:\${sample_name}\\tPU:\${read_group}"
        in_fq_command+="--in-fq \$fq \${filename}_R2.fq.gz \"\${read_group_string}\" "
    done

    nvidia-smi

    pbrun \\
        fq2bam \\
        --tmp-dir /tmp \\
        --num-gpus $task.accelerator.request \\
        --ref $fasta \\
        --bwa-options="-Y -K 100000000" \\
        --fix-mate \\
        --optical-duplicate-pixel-distance 2500 \\
        --out-duplicate-metrics ${prefix}_duplicate_metrics.txt \\
        --out-bam ${prefix}.bam \\
        \${in_fq_command} \\
        --monitor-usage \\
        --low-memory
"""

The log below:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:05.0 Off |                    0 |
| N/A   62C    P8             11W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:00:06.0 Off |                    0 |
| N/A   70C    P8             12W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:00:07.0 Off |                    0 |
| N/A   71C    P8             11W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:00:08.0 Off |                    0 |
| N/A   70C    P8             12W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

[PB Info 2024-Dec-09 21:02:05] ------------------------------------------------------------------------------
[PB Info 2024-Dec-09 21:02:05] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2024-Dec-09 21:02:05] ||                              Version 4.4.0-1                             ||
[PB Info 2024-Dec-09 21:02:05] ||                      GPU-PBBWA mem, Sorting Phase-I                      ||
[PB Info 2024-Dec-09 21:02:05] ------------------------------------------------------------------------------
[PB Info 2024-Dec-09 21:02:05] Mode = pair-ended-gpu
[PB Info 2024-Dec-09 21:02:05] Running with 4 GPU(s), using 1 stream(s) per device with 16 worker threads per GPU
[PB Info 2024-Dec-09 21:02:15] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.4 CPUUsage%: 9.3
[PB Info 2024-Dec-09 21:02:25] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.4 CPUUsage%: 10.2
[PB Info 2024-Dec-09 21:02:35] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.4 CPUUsage%: 9.7
[PB Info 2024-Dec-09 21:02:45] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.4 CPUUsage%: 9.4
[PB Info 2024-Dec-09 21:02:55] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.5 CPUUsage%: 13.0
[PB Info 2024-Dec-09 21:03:05] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.5 CPUUsage%: 12.4
[PB Info 2024-Dec-09 21:03:15] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.5 CPUUsage%: 12.1
[PB Info 2024-Dec-09 21:03:25] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 4.6 CPUUsage%: 11.7
[PB Info 2024-Dec-09 21:03:29] Read 3171 alt contigs
[PB Info 2024-Dec-09 21:03:35] #  9  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 10.5 CPUUsage%: 26.2
[PB Info 2024-Dec-09 21:03:45] # 32  0  2  0  0   0 pool:  0 0 bases/GPU/minute: 0.0 CPUMem%: 20.1 CPUUsage%: 58.1
[PB Info 2024-Dec-09 21:03:55] # 48  0  2  0  0   0 pool:  0 341374699 bases/GPU/minute: 512062048.5 CPUMem%: 29.7 CPUUsage%: 87.0
[PB Info 2024-Dec-09 21:04:05] # 68  0  2  0  0   0 pool:  0 675766023 bases/GPU/minute: 501586986.0 CPUMem%: 39.7 CPUUsage%: 88.9
[PB Info 2024-Dec-09 21:04:15] # 88  0  1  0  0   0 pool:  0 1030785278 bases/GPU/minute: 532528882.5 CPUMem%: 49.2 CPUUsage%: 90.1
[PB Info 2024-Dec-09 21:04:25] # 100  0  2  0  0   0 pool:  3 1434780644 bases/GPU/minute: 605993049.0 CPUMem%: 58.2 CPUUsage%: 95.0
[PB Info 2024-Dec-09 21:04:35] # 100  0  2  0  0   0 pool:  3 1828946899 bases/GPU/minute: 591249382.5 CPUMem%: 63.2 CPUUsage%: 91.1
[PB Info 2024-Dec-09 21:04:45] # 100  0  2  0  0   0 pool:  1 2109108972 bases/GPU/minute: 420243109.5 CPUMem%: 68.3 CPUUsage%: 83.3
[PB Info 2024-Dec-09 21:04:55] # 100  0  2  0  0   0 pool:  4 2583584845 bases/GPU/minute: 711713809.5 CPUMem%: 72.2 CPUUsage%: 98.5
[PB Info 2024-Dec-09 21:05:05] # 100  0  2  0  0   0 pool:  2 3007016021 bases/GPU/minute: 635146764.0 CPUMem%: 75.6 CPUUsage%: 91.0
[PB Info 2024-Dec-09 21:05:15] # 100  0  2  0  0   0 pool:  1 3460339491 bases/GPU/minute: 679985205.0 CPUMem%: 78.2 CPUUsage%: 91.4
[PB Info 2024-Dec-09 21:05:25] # 100  0  2  0  0   0 pool:  0 3943597657 bases/GPU/minute: 724887249.0 CPUMem%: 79.8 CPUUsage%: 93.0
[PB Info 2024-Dec-09 21:05:35] # 100  0  2  0  0   0 pool:  0 4406859521 bases/GPU/minute: 694892796.0 CPUMem%: 81.6 CPUUsage%: 93.0
[PB Info 2024-Dec-09 21:05:45] # 100  0  2  0  0   0 pool:  2 4902733488 bases/GPU/minute: 743810950.5 CPUMem%: 83.9 CPUUsage%: 92.9
[PB Info 2024-Dec-09 21:05:55] # 100  0  0  0  0   0 pool:  2 5346113478 bases/GPU/minute: 665069985.0 CPUMem%: 85.6 CPUUsage%: 92.5
[PB Info 2024-Dec-09 21:06:05] # 100  0  2  0  0   0 pool:  4 5821960589 bases/GPU/minute: 713770666.5 CPUMem%: 86.8 CPUUsage%: 91.3
[PB Info 2024-Dec-09 21:06:15] # 100  0  2  0  0   0 pool:  2 6255280423 bases/GPU/minute: 649979751.0 CPUMem%: 88.5 CPUUsage%: 93.3
[PB Info 2024-Dec-09 21:06:25] # 100  0  1  0  0   0 pool:  2 6728588316 bases/GPU/minute: 709961839.5 CPUMem%: 90.1 CPUUsage%: 95.4
[PB Info 2024-Dec-09 21:06:35] # 100  0  2  0  0   0 pool:  2 7193228388 bases/GPU/minute: 696960108.0 CPUMem%: 91.9 CPUUsage%: 94.7
[PB Info 2024-Dec-09 21:06:45] # 100  0  2  0  0   0 pool:  2 7687705002 bases/GPU/minute: 741714921.0 CPUMem%: 93.8 CPUUsage%: 95.9
[PB Info 2024-Dec-09 21:06:55] # 100  0  2  0  0   0 pool:  2 8160975405 bases/GPU/minute: 709905604.5 CPUMem%: 95.6 CPUUsage%: 95.2
[PB Info 2024-Dec-09 21:07:05] # 100  0  2  0  0   0 pool:  3 8625665750 bases/GPU/minute: 697035517.5 CPUMem%: 96.8 CPUUsage%: 93.7
[PB Info 2024-Dec-09 21:07:15] # 100  0  2  0  0   0 pool:  3 9090324929 bases/GPU/minute: 696988768.5 CPUMem%: 98.1 CPUUsage%: 95.2
[PB Info 2024-Dec-09 21:07:25] # 100  0  2  0  0   0 pool:  2 9543563826 bases/GPU/minute: 679858345.5 CPUMem%: 99.0 CPUUsage%: 95.2
[PB Info 2024-Dec-09 21:07:36] # 100  0  2  0  0   0 pool:  3 9715519477 bases/GPU/minute: 257933476.5 CPUMem%: 99.3 CPUUsage%: 95.1
[PB Info 2024-Dec-09 21:07:47] # 100  0  1  0  0   0 pool:  3 9796550219 bases/GPU/minute: 121546113.0 CPUMem%: 99.5 CPUUsage%: 94.3
[PB Info 2024-Dec-09 21:07:59] # 100  0  2  0  0   0 pool:  4 9857649015 bases/GPU/minute: 91648194.0 CPUMem%: 99.6 CPUUsage%: 97.3
[PB Info 2024-Dec-09 21:08:17] # 100  0  2  0  0   0 pool:  4 9857649015 bases/GPU/minute: 0.0 CPUMem%: 99.8 CPUUsage%: 97.9
[PB Info 2024-Dec-09 21:08:32] # 100  0  2  0  0   0 pool:  3 9857649015 bases/GPU/minute: 0.0 CPUMem%: 100.0 CPUUsage%: 98.8

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Read group created for /fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22F3KCLT4_5_R1.fq.gz and
/fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22F3KCLT4_5_R2.fq.gz
[Parabricks Options Mesg]: @RG\tID:22F3KCLT4_5\tLB:FDSANTIAGO_000032\tPL:Illumina\tSM:FDSANTIAGO_000032\tPU:22F3KCLT4_5
[Parabricks Options Mesg]: Read group created for /fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22F3KCLT4_6_R1.fq.gz and
/fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22F3KCLT4_6_R2.fq.gz
[Parabricks Options Mesg]: @RG\tID:22F3KCLT4_6\tLB:FDSANTIAGO_000032\tPL:Illumina\tSM:FDSANTIAGO_000032\tPU:22F3KCLT4_6
[Parabricks Options Mesg]: Read group created for /fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22F52VLT4_2_R1.fq.gz and
/fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22F52VLT4_2_R2.fq.gz
[Parabricks Options Mesg]: @RG\tID:22F52VLT4_2\tLB:FDSANTIAGO_000032\tPL:Illumina\tSM:FDSANTIAGO_000032\tPU:22F52VLT4_2
[Parabricks Options Mesg]: Read group created for /fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22GCWHLT4_5_R1.fq.gz and
/fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22GCWHLT4_5_R2.fq.gz
[Parabricks Options Mesg]: @RG\tID:22GCWHLT4_5\tLB:FDSANTIAGO_000032\tPL:Illumina\tSM:FDSANTIAGO_000032\tPU:22GCWHLT4_5
[Parabricks Options Mesg]: Read group created for /fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22GF53LT4_5_R1.fq.gz and
/fusion/gs/fq2bam-test/fqs/FDSANTIAGO_000032_22GF53LT4_5_R2.fq.gz
[Parabricks Options Mesg]: @RG\tID:22GF53LT4_5\tLB:FDSANTIAGO_000032\tPL:Illumina\tSM:FDSANTIAGO_000032\tPU:22GF53LT4_5
[Parabricks Options Mesg]: Using --low-memory sets the number of streams in bwa mem to 1.
For technical support visit https://docs.nvidia.com/clara/index.html#parabricks
Exiting...
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation



Could not run fq2bam
Exiting pbrun ...

I am not sure what the issue is, as the error message did not provide any details other than exiting.
Can you advise on what the issue could be?

Hello @zih-hua.fang, based on your log at 2024-Dec-09 21:08:32 you can see that CPUMem usage becomes 100% (CPUMem%: 100.0). When that happens the OS will kill the process and then we print the message Exiting.... Is it possible to run on a system with more host memory?

Hi @dpuleri,

Thanks a lot for the reply.
I am surprised that fq2bam used that much CPUMem.
From the website, I assumed that 196GB would be enough…

  • A 4 GPU system should have at least 196GB CPU RAM and at least 32 CPU threads.
    Also from this wdl, 180Gb was enough for clara-parabricks:4.3.0-1.
    Does clara-parabricks:4.4.0-1 requires substantially more CPUMem?

Increasing CPUMem is not a problem since I run on GCP.
Is there any benchmarking for the computation resources required for illumina 30X WGS data?

Thanks!

Hi @zih-hua.fang, 196GB is the minimum amount of memory that we recommend for 4 GPU systems but the amount of memory used can depend on the input.

Also, I see that for the first minute or so of your run data is still being loaded. Are you using an SSD for storage? All parts of the system can affect overall speed so if we are waiting for data to be read or written then that can cause the internal queues to fill up and increase memory usage.

Hi @dpuleri ,

I just tested and was able to run on clara-parabricks:4.3.0-1 with the same computational resources that clara-parabricks:4.4.0-1 failed on:

    accelerator 4, type: 'nvidia-tesla-t4'
    cpus 32
    memory '196 GB'
    machineType 'n1-standard-32'
    disk 375.GB, type: 'local-ssd'

I used fusion system to read GCP storage objects, and my VM had 1 NVMe local SSD (375Gb) attached. I assumed this follows the performance suggestion?

@zih-hua.fang , yes the version from v4.3.0 generally used less memory. The new version of fq2bam we introduced in v4.3.1 uses slightly more memory for better performance, especially on Ampere and later GPUs.

I used fusion system to read GCP storage objects

I looked into it and it seems that the short time at the start where file IO seems to take longer is likely due to pulling the file from object storage to the local SSD. That should be fine.

@dpuleri, so for ppl who are running on T4, would you recommend to stay with v4.3.0?

@zih-hua.fang I would not recommend to go back to an older version but if you cannot provision a system with more memory then that is a viable workaround.

The newer version has more features and will continue to be improved in the future. Thank you for highlighting the differences in host memory use for your samples and use-case. We will work on improvements in upcoming releases.

@dpuleri, Thank you for the suggestion. Since I perform the analyses on the cloud, I will explore the most cost-effective option between the versions and their associated costs.

For the most cost-effective run customers have typically found L4 nodes to be best.

See the note in our FAQ here: NVIDIA Clara for Genomics.

Thanks! I want to report that I successfully ran clara-parabricks:4.4.0-1 with the following settings on GCP:

    accelerator 4, type: 'nvidia-tesla-t4'
    cpus 32
    memory '208 GB'
    machineType 'n1-highmem-32'
    disk 375.GB, type: 'local-ssd'

Will go to try L4 nodes.

1 Like