Using caffe to train network but stop at iteration 0.

Ashing · December 14, 2016, 5:17am

Hi,all

I tied to follow below tutorial to use caffe training network.
A Practical Introduction to Deep Learning with Caffe and Python // Adil Moujahid // Data Analytics and more

firstly,I conquer memory size issue by creating 4GB SWAP in extented SD card.
but When I am trying to train the network, it stops at iteration 0.and no other error message.
what is wrong? how to deal with it?

AastaLLL · December 14, 2016, 1:14pm

Hi,

Could you try default MNIST training example?

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
./examples/mnist/train_lenet.sh | tee res.out

Ashing · December 15, 2016, 2:44pm

yes,MNIST training example can work as well.

AastaLLL · December 20, 2016, 3:10am

Hi,

Could you help to check gpu memory utilization?

Please run this program in a separate ssh link.
It will print out gpu memory utilization information.

#include <iostream>
#include <unistd.h>
#include "cuda.h"

int main()
{
    // show memory usage of GPU
    size_t free_byte ;
    size_t total_byte ;

    while (true )
    {
        cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;

        if ( cudaSuccess != cuda_status ){
            std::cout << "Error: cudaMemGetInfo fails, " << cudaGetErrorString(cuda_status) << std::endl;
            exit(1);
        }

        double free_db = (double)free_byte ;
        double total_db = (double)total_byte ;
        double used_db = total_db - free_db ;

        std::cout << "GPU memory usage: used = " << used_db/1024.0/1024.0 << ", free = "
                  << free_db/1024.0/1024.0 << " MB, total = " << total_db/1024.0/1024.0 << " MB" << std::endl;
        sleep(1);
    }

    return 0;
}

nvcc test.cu -o test
./test