Hi, I can’t get the ptx code of Cublas64_50_35.dll by cuobjdump tool. My command is cuobjdump.exe -ptx cublas64_50_35.dll. I am a student and want to study all functions of cublas of CUDA_50 release. And who can help me? Thank you very much
My OS is Windows7. My hardware is GTX640.
It is odd! I could get it. So your machine and OS is 64-bit? Could you get ptx from vecAdd - one of the CUDA sample?
Yes, My machine and OS is 64-bit. I can get ptx from vecAdd and cublas32_50_35.dll, but not cublas64_50_35.dll.
I have installed the software of Cuda5.0 release named ‘cuda_5.0.35_winvista_win7_win8_general_64’
Could you paste the output of your cuobjdump for cublas 64-bit? That may help us.
Hi, the output of cuobjdump on cublas64_50_35.dll as follow. And there are some sm_35’s ptx code but no sm_30. As you know, There are many kernels in cublas64_50_35.dll.
Fatbin elf code:
================
arch = sm_10
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu
Fatbin elf code:
================
arch = sm_13
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu
Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu
Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu
Fatbin elf code:
================
arch = sm_35
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu
Fatbin ptx code:
================
arch = sm_35
code version = [3,1]
producer = cuda
host = windows
compile_size = 64bit
compressed
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu
ptxasOptions =
.version 3.1
.target sm_35
.address_size 64
.file 1 "C:/DOCUME~1/BUILDM~1.COM/LOCALS~1/Temp/tmpxft_000013e4_00000000-39_cublas_v1.compute_35.cpp3.i"
.file 2 "d:\bld\rel\gpgpu\toolkit\r5.0\cuda\tools\cudart\../cnprt/cuda_device_runtime_api.h"
.weak .func (.param .b32 func_retval0) cudaMalloc(
.param .b64 cudaMalloc_param_0,
.param .b64 cudaMalloc_param_1
)
{
.reg .s32 %r;
mov.u32 %r1, 30;
st.param.b32 [func_retval0+0], %r1;
.loc 2 66 3
ret;
}
.weak .func (.param .b32 func_retval0) cudaFuncGetAttributes(
.param .b64 cudaFuncGetAttributes_param_0,
.param .b64 cudaFuncGetAttributes_param_1
)
{
.reg .s32 %r;
mov.u32 %r1, 30;
st.param.b32 [func_retval0+0], %r1;
.loc 2 71 3
ret;
}
Fatbin elf code:
================
arch = sm_10
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu
Fatbin elf code:
================
arch = sm_13
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu
Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu
Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu
Fatbin elf code:
================
arch = sm_35
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu
Fatbin ptx code:
================
arch = sm_35
code version = [3,1]
producer = cuda
host = windows
compile_size = 64bit
compressed
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu
ptxasOptions =
.version 3.1
.target sm_35
.address_size 64
.file 1 "C:/DOCUME~1/BUILDM~1.COM/LOCALS~1/Temp/tmpxft_00000d08_00000000-39_cublas.compute_35.cpp3.i"
.file 2 "d:\bld\rel\gpgpu\toolkit\r5.0\cuda\tools\cudart\../cnprt/cuda_device_runtime_api.h"
.weak .func (.param .b32 func_retval0) cudaMalloc(
.param .b64 cudaMalloc_param_0,
.param .b64 cudaMalloc_param_1
)
{
.reg .s32 %r;
mov.u32 %r1, 30;
st.param.b32 [func_retval0+0], %r1;
.loc 2 66 3
ret;
}
.weak .func (.param .b32 func_retval0) cudaFuncGetAttributes(
.param .b64 cudaFuncGetAttributes_param_0,
.param .b64 cudaFuncGetAttributes_param_1
)
{
.reg .s32 %r;
mov.u32 %r1, 30;
st.param.b32 [func_retval0+0], %r1;
.loc 2 71 3
ret;
}
Fatbin elf code:
================
arch = sm_10
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu
Fatbin elf code:
================
arch = sm_13
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu
Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu
Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu
Fatbin elf code:
================
arch = sm_35
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu
Fatbin ptx code:
================
arch = sm_35
code version = [3,1]
producer = cuda
host = windows
compile_size = 64bit
compressed
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu
ptxasOptions =
.version 3.1
.target sm_35
.address_size 64
.file 1 "C:/DOCUME~1/BUILDM~1.COM/LOCALS~1/Temp/tmpxft_000000a8_00000000-39_xerbla.compute_35.cpp3.i"
.file 2 "d:\bld\rel\gpgpu\toolkit\r5.0\cuda\tools\cudart\../cnprt/cuda_device_runtime_api.h"
.weak .func (.param .b32 func_retval0) cudaMalloc(
.param .b64 cudaMalloc_param_0,
.param .b64 cudaMalloc_param_1
)
{
.reg .s32 %r;
mov.u32 %r1, 30;
st.param.b32 [func_retval0+0], %r1;
.loc 2 66 3
ret;
}
.weak .func (.param .b32 func_retval0) cudaFuncGetAttributes(
.param .b64 cudaFuncGetAttributes_param_0,
.param .b64 cudaFuncGetAttributes_param_1
)
{
.reg .s32 %r;
mov.u32 %r1, 30;
st.param.b32 [func_retval0+0], %r1;
.loc 2 71 3
ret;
}
There are no the ptx code of the kernel Sgem and dgem.
Maybe there is something wrong here. Are you willing to file a bug? Here is the steps:
- If not registered, please register first and then apply for program “CUDA/GPU Computing Registered Developer Program” in registered developer programs;
- Open page https://developer.nvidia.com/rdp/bugs/cudagpu-bug-reporting;
- After seeing the bug report page, please fill the required itmes, other items are optional, but detailed information will help us to target and fix the issue a lot;
- If necessary, an attachment should be uploaded;
- For Linux system, it is better to attach an nvidia-bug-report;
- If an issue is related to specific code pattern, a sample code and instructions to compile it are desired for reproduction.
If you have problem in filing a bug or you want me to file a bug for you, please feel free to let me know.