cublas cublasSdot can't work with As described Documentation

I want use cublas:cublasSdot to get two float buffer dot result.(use cuda 9.0)

In Documentation,cublasSdot have this explain:

  1. cublasStatus_t cublasSdot (cublasHandle_t handle, int n, const float *x, int incx, const float *y, int incy, float *result)
  2. [img]http://m.qpic.cn/psb?/V118zfRo4BEH8z/ZvyGLWfFoPX02AdLTkWJITY3*iAjYRdqmZUEH9BO0EE!/b/dDYBAAAAAAAA&bo=TwJaAQAAAAADBzQ!&rf=viewer_4[/img]

    That is to say,result could use host float* ,also could use device float*.

    this is mycode:

    cublasHandle_t handle_cosw;
        cublasCreate(&handle_cosw);
        float * out_p;
        float * cos_w;
        float * result_cos;
        cudaMalloc((void**)&out_p, batch_update * inSize * sizeof(float));
        cudaMalloc((void**)&cos_w, inSize * sizeof(float));
        cudaMalloc((void**)&result_cos, batch_update * inSize * sizeof(float));
        //result_cos = (float*)malloc(batch_update * inSize * sizeof(float));
        cudaMemcpy(out_p, (float*)OutFeatures0, batch_update * inSize * sizeof(float), cudaMemcpyHostToDevice);
        cudaMemcpy(cos_w, (float*)data_cos, inSize * sizeof(float), cudaMemcpyHostToDevice);
    
        for(int i=0; i<batch_update; i++)
        {
            cublasSdot (handle_cosw, inSize, out_p, 1, cos_w, 1, result_cos);
        }
    

    if I use this code ,when run it will have Segmentation fault.

    Segmentation fault.
    0x0000007fa782d3b4 in cublasSdot_v2 ()
         from /use/local/cuda-9.0/lib64/libcublas.so.9.0
    

    but when I use annotation line code not use Line 8 code, it not have any error…

    Do I have a misunderstanding of the document or a bug in the cublas library itself.