Ashing
December 14, 2016, 5:17am
1
Hi,all
I tied to follow below tutorial to use caffe training network.
A Practical Introduction to Deep Learning with Caffe and Python // Adil Moujahid // Data Analytics and more
firstly,I conquer memory size issue by creating 4GB SWAP in extented SD card.
but When I am trying to train the network, it stops at iteration 0.and no other error message.
what is wrong? how to deal with it?
Hi,
Could you try default MNIST training example?
cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
./examples/mnist/train_lenet.sh | tee res.out
Ashing
December 15, 2016, 2:44pm
3
yes,MNIST training example can work as well.
Hi,
Could you help to check gpu memory utilization?
Please run this program in a separate ssh link.
It will print out gpu memory utilization information.
#include <iostream>
#include <unistd.h>
#include "cuda.h"
int main()
{
// show memory usage of GPU
size_t free_byte ;
size_t total_byte ;
while (true )
{
cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;
if ( cudaSuccess != cuda_status ){
std::cout << "Error: cudaMemGetInfo fails, " << cudaGetErrorString(cuda_status) << std::endl;
exit(1);
}
double free_db = (double)free_byte ;
double total_db = (double)total_byte ;
double used_db = total_db - free_db ;
std::cout << "GPU memory usage: used = " << used_db/1024.0/1024.0 << ", free = "
<< free_db/1024.0/1024.0 << " MB, total = " << total_db/1024.0/1024.0 << " MB" << std::endl;
sleep(1);
}
return 0;
}
nvcc test.cu -o test
./test