You may suggest to talk developers of cugraph or rmm, but my other colleagues not having such a problem like that, even we have same cuda, and cugraph versions. So this made me think that problem might be on my settings?
My cuda version is: 12.4
Compute capability: 86
Device: RTX 3090
OS: Ubuntu 22.04 LTS
DRIVER:550.127.08
The examples that I tried (Examples are tested and working in other GPU’s/Github’s Pipeline);
Short definition of problem.
If I am working small dataset like karate there is no problem. It starts and finishes succesfully. But when I working with big datasets (ca-hollywood-2009, soc-livejournal), it initializes after that runs ~30-40 seconds and crashes.
I also ran with compute-sanitizer and got this results.
Program hit cudaErrorLaunchOutOfResources (error 701) due to "too many resources requested for launch" on CUDA API call to cudaLaunchKernel_ptsz.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: [0x4466f5]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:cudaLaunchKernel_ptsz [0x547fd]
========= in /home/yigithan/miniconda3/envs/cugraph_dev/lib/libcudart.so.12
========= Host Frame:cudaLaunchKernel in /home/yigithan/miniconda3/envs/cugraph_dev/targets/x86_64-linux/include/cuda_runtime_api.h:14030 [0xe5ecef1]
========= in /home/yigithan/miniconda3/envs/cugraph_dev/lib/libcugraph.so
========= Host Frame:_ZL736__device_stub__ZN7cugraph6detail35per_v_transform_reduce_e_mid_degreeILb1ENS_12graph_view_tIiiLb0ELb0EvEENS0_52edge_partition_endpoint_dummy_property_device_view_tIiEES5_NS0_42edge_partition_edge_property_device_view_tIiPKffEENS6_IiPKjbEEPfZNS_71_GLOBAL__N__3530e449_32_graph_weight_utils_sg_v32_e32_cu_4d8abc56_2573119compute_weight_sumsILb1EiifLb0ELb0EEEN3rmm14device_uvectorIT2_EERKN4raft8handle_tERKNS2_IT0_T1_XT3_EXT4_EvEENS_20edge_property_view_tISP_PKSI_N6thrust15iterator_traitsISV_E10value_typeEEEEUnvdl0_PFNSH_IfEESN_RKS3_NST_IiS8_fEEESF_ILb1EiifLb0ELb0EE2_NS_9reduce_op4plusIfEEfEEvNS_28edge_partition_device_view_tINSO_11vertex_typeENSO_9edge_typeEXsrSO_12is_multi_gpuEvEES1C_S1C_SP_SI_T3_NSW_8optionalIT4_EET5_T6_T8_S1L_T7_RN7cugraph28edge_partition_device_view_tIiiLb0EvEEiiRNS_6detail52edge_partition_endpoint_dummy_property_device_view_tIiEES6_RNS3_42edge_partition_edge_property_device_view_tIiPKffEERN6thrust8optionalINS7_IiPKjbEEEEPfR17__nv_dl_wrapper_tI11__nv_dl_tagIPFN3rmm14device_uvectorIfEERKN4raft8handle_tERKNS_12graph_view_tIiiLb0ELb0EvEENS_20edge_property_view_tIiS9_fEEEXadL_ZNS_71_GLOBAL__N__3530e449_32_graph_weight_utils_sg_v32_e32_cu_4d8abc56_2573119compute_weight_sumsILb1EiifLb0ELb0EEENSN_IT2_EESS_RKNST_IT0_T1_XT3_EXT4_EvEENSX_IS16_PKS13_NSC_15iterator_traitsIS1B_E10value_typeEEEEELj2EEJEEffRNS_9reduce_op4plusIfEE in /tmp/tmpxft_00006475_00000000-6_graph_weight_utils_sg_v32_e32.cudafe1.stub.c:233 [0xe5ee3f6]
========= in /home/yigithan/miniconda3/envs/cugraph_dev/lib/libcugraph.so
.
.
.
.
Yes, I am aware that is your filed issue. I’m posting this for others who may come across this thread. That is a mechanism to contact rapids development team.