Hello Everyone,
Previously i tried to execute NAMD on C870 card but it didn't work, it gives error "Charm++ fatal error:
FATAL ERROR: CUDA error allocating force table: feature is not yet implemented"
Now, I am using Tesla supercomputer desktop system with 4 Tesla C1060 card. And the motherboard is having nForce780a SLI graphics card. And i am not confirmed whether it is cuda enabled or not but my application i.e NAMD detect nForce 780a card and instead of running only on four C1060 cards it goes to nForce too.And when i tried to execute with one card it only goes to nForce780.
The command i give is like this:
$ charmrun namd2 +p1 …/…/NAMD-with-cuda/NAMD-Data/apoa1/apoa1.namd 2>&1 | tee namd2_apoa1_1P
Running on 1 processors: namd2 …/…/NAMD-with-cuda/NAMD-Data/apoa1/apoa1.namd
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 1 namd2 …/…/NAMD-with-cuda/NAMD-Data/apoa1/apoa1.namd
Charm++> Running on MPI version: 2.1 multi-thread support: MPI_THREAD_SINGLE (max supported: MPI_THREAD_SINGLE)
Did not find +devices i,j,k,… argument, defaulting to (pe + 1) % deviceCount
Pe 0 binding to CUDA device 1 on samir-desktop: ‘nForce 780a SLI’ Mem: 125MB Rev: 1.1
Charm++> cpu topology info is being gathered!
Charm++> 1 unique compute nodes detected!
.
.
.
.
.
.
.
Pe 0 has 144 local and 0 remote patches and 3888 local and 0 remote computes.
allocating 51 MB of memory on GPU
FATAL ERROR: CUDA error malloc everything: out of memory
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error malloc everything: out of memory
[0] Stack Traceback:
[0] CmiAbort+0x2b [0x8600e1]
[1] _Z8NAMD_diePKc+0x56 [0x4ec846]
[2] _Z13cuda_errcheckPKc+0x5e [0x5ef46e]
[3] _Z21cuda_bind_patch_pairsPK10patch_pairiPK10force_listiiii+0
x1e4 [0x7d58b 4]
[4] _ZN20ComputeNonbondedCUDA6doWorkEv+0x3c2 [0x5f02e2]
[5] _ZN19CkIndex_WorkDistrib30_call_enqueueCUDA_LocalWorkMsgEPvP
11WorkDistrib+ 0xd [0x7a973d]
[6] CkDeliverMessageFree+0x38 [0x80a82f]
[7] _Z15_processHandlerPvP11CkCoreState+0x183 [0x80de9d]
[8] CmiHandleMessage+0x27 [0x8617b6]
[9] CsdScheduleForever+0x64 [0x8632c8]
[10] CsdScheduler+0xd [0x86335f]
[11] _ZN9ScriptTcl3runEPc+0xe1 [0x77f2d1]
[12] _Z18after_backend_initiPPc+0x25d [0x4f0cad]
[13] main+0x24 [0x4f0d94]
[14] __libc_start_main+0xe6 [0x7ffff5ee8466]
[15] namd2 [0x4ebae9]
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
mpirun has exited due to process rank 0 with PID 8486 on
node samir-desktop exiting without calling “finalize”. This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
And with two threads it detect one Tesla C1060 and nForce card…and give same error.
I want to know whether there is problem of memory with nForce card or it is not supporting CUDA or some other problem is there?
If anybody has done NAMD on any GPU card, let me know about that.
Thanks in advance.
Regards,
Deepti