A problem of PTX code of Cublas64_50_35 of Cuda5.0 release

Hi, I can’t get the ptx code of Cublas64_50_35.dll by cuobjdump tool. My command is cuobjdump.exe -ptx cublas64_50_35.dll. I am a student and want to study all functions of cublas of CUDA_50 release. And who can help me? Thank you very much

My OS is Windows7. My hardware is GTX640.

It is odd! I could get it. So your machine and OS is 64-bit? Could you get ptx from vecAdd - one of the CUDA sample?

Yes, My machine and OS is 64-bit. I can get ptx from vecAdd and cublas32_50_35.dll, but not cublas64_50_35.dll.

I have installed the software of Cuda5.0 release named ‘cuda_5.0.35_winvista_win7_win8_general_64’

Could you paste the output of your cuobjdump for cublas 64-bit? That may help us.

Hi, the output of cuobjdump on cublas64_50_35.dll as follow. And there are some sm_35’s ptx code but no sm_30. As you know, There are many kernels in cublas64_50_35.dll.

Fatbin elf code:
================
arch = sm_10
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu

Fatbin elf code:
================
arch = sm_13
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu

Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu

Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu

Fatbin elf code:
================
arch = sm_35
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu

Fatbin ptx code:
================
arch = sm_35
code version = [3,1]
producer = cuda
host = windows
compile_size = 64bit
compressed
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas_v1.cu
ptxasOptions = 












.version 3.1

.target sm_35

.address_size 64



.file	1 "C:/DOCUME~1/BUILDM~1.COM/LOCALS~1/Temp/tmpxft_000013e4_00000000-39_cublas_v1.compute_35.cpp3.i"

.file	2 "d:\bld\rel\gpgpu\toolkit\r5.0\cuda\tools\cudart\../cnprt/cuda_device_runtime_api.h"



.weak .func (.param .b32 func_retval0) cudaMalloc(

.param .b64 cudaMalloc_param_0,

.param .b64 cudaMalloc_param_1

)

{

.reg .s32 %r;





mov.u32 %r1, 30;

st.param.b32	[func_retval0+0], %r1;

.loc 2 66 3

ret;

}



.weak .func (.param .b32 func_retval0) cudaFuncGetAttributes(

.param .b64 cudaFuncGetAttributes_param_0,

.param .b64 cudaFuncGetAttributes_param_1

)

{

.reg .s32 %r;





mov.u32 %r1, 30;

st.param.b32	[func_retval0+0], %r1;

.loc 2 71 3

ret;

}






Fatbin elf code:
================
arch = sm_10
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu

Fatbin elf code:
================
arch = sm_13
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu

Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu

Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu

Fatbin elf code:
================
arch = sm_35
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu

Fatbin ptx code:
================
arch = sm_35
code version = [3,1]
producer = cuda
host = windows
compile_size = 64bit
compressed
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/cublas.cu
ptxasOptions = 












.version 3.1

.target sm_35

.address_size 64



.file	1 "C:/DOCUME~1/BUILDM~1.COM/LOCALS~1/Temp/tmpxft_00000d08_00000000-39_cublas.compute_35.cpp3.i"

.file	2 "d:\bld\rel\gpgpu\toolkit\r5.0\cuda\tools\cudart\../cnprt/cuda_device_runtime_api.h"



.weak .func (.param .b32 func_retval0) cudaMalloc(

.param .b64 cudaMalloc_param_0,

.param .b64 cudaMalloc_param_1

)

{

.reg .s32 %r;





mov.u32 %r1, 30;

st.param.b32	[func_retval0+0], %r1;

.loc 2 66 3

ret;

}



.weak .func (.param .b32 func_retval0) cudaFuncGetAttributes(

.param .b64 cudaFuncGetAttributes_param_0,

.param .b64 cudaFuncGetAttributes_param_1

)

{

.reg .s32 %r;





mov.u32 %r1, 30;

st.param.b32	[func_retval0+0], %r1;

.loc 2 71 3

ret;

}






Fatbin elf code:
================
arch = sm_10
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu

Fatbin elf code:
================
arch = sm_13
code version = [1,2]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu

Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu

Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu

Fatbin elf code:
================
arch = sm_35
code version = [1,6]
producer = cuda
host = windows
compile_size = 64bit
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu

Fatbin ptx code:
================
arch = sm_35
code version = [3,1]
producer = cuda
host = windows
compile_size = 64bit
compressed
identifier = D:/Bld/rel/gpgpu/toolkit/r5.0/cublas/src/xerbla.cu
ptxasOptions = 












.version 3.1

.target sm_35

.address_size 64



.file	1 "C:/DOCUME~1/BUILDM~1.COM/LOCALS~1/Temp/tmpxft_000000a8_00000000-39_xerbla.compute_35.cpp3.i"

.file	2 "d:\bld\rel\gpgpu\toolkit\r5.0\cuda\tools\cudart\../cnprt/cuda_device_runtime_api.h"



.weak .func (.param .b32 func_retval0) cudaMalloc(

.param .b64 cudaMalloc_param_0,

.param .b64 cudaMalloc_param_1

)

{

.reg .s32 %r;





mov.u32 %r1, 30;

st.param.b32	[func_retval0+0], %r1;

.loc 2 66 3

ret;

}



.weak .func (.param .b32 func_retval0) cudaFuncGetAttributes(

.param .b64 cudaFuncGetAttributes_param_0,

.param .b64 cudaFuncGetAttributes_param_1

)

{

.reg .s32 %r;





mov.u32 %r1, 30;

st.param.b32	[func_retval0+0], %r1;

.loc 2 71 3

ret;

}

There are no the ptx code of the kernel Sgem and dgem.

Maybe there is something wrong here. Are you willing to file a bug? Here is the steps:

  1. If not registered, please register first and then apply for program “CUDA/GPU Computing Registered Developer Program” in registered developer programs;
  2. Open page https://developer.nvidia.com/rdp/bugs/cudagpu-bug-reporting;
  3. After seeing the bug report page, please fill the required itmes, other items are optional, but detailed information will help us to target and fix the issue a lot;
  4. If necessary, an attachment should be uploaded;
  5. For Linux system, it is better to attach an nvidia-bug-report;
  6. If an issue is related to specific code pattern, a sample code and instructions to compile it are desired for reproduction.

If you have problem in filing a bug or you want me to file a bug for you, please feel free to let me know.