why the titan v is slowed than rtx 2080ti ?

the kernel :

void CSR(int i,unsigned int N,
	unsigned int *xadj,unsigned int *adjncy,
	double *dataxx,double *datayy,double *datazz,
	double *Cspin,
	double *CHDemag,double *CH)
{ 
	if(i < N)
	{
		double dot[3]={0,0,0};
		for(int n = xadj[i] ; n < xadj[i+1]; n++)
		{
			unsigned int neigh=adjncy[n];
			printf("%d\n",n);
			printf("%f,%f,%f\n",dataxx[n],datayy[n],datazz[n]);
			double val[3] = {dataxx[n],datayy[n],datazz[n]};
			for(unsigned int co = 0 ; co < 3 ; co++)
			{
				dot[co]+=(val[co]*Cspin[3*neigh+co]);
			}
		}
		double a=CHDemag[3*i];
		double b=CHDemag[3*i+1];
		double c=CHDemag[3*i+2];
		CH[3*i]=a+dot[0];
		CH[3*i+1]=b+dot[1];
		CH[3*i+2]=c+dot[2];
		// CH[3*i]=CHDemag[3*i]+dot[0];
		// CH[3*i+1]=CHDemag[3*i+1]+dot[1];
		// CH[3*i+2]=CHDemag[3*i+2]+dot[2];
	}
}

under the same code and the machine(except gpu)
titan v:490ms
rtx2080 :380ms
titan v’s double precision compatity may better than rtx2080ti
but the result doesn’t.
may i shoule compile the code to double precision using some arg ?

thank you.

the code is wrong above,
the correct code is:

__global__ void CSpMV_CSR(unsigned int N,
	unsigned int *xadj,unsigned int *adjncy,
	double *dataxx,double *datayy,double *datazz,
	double *Cspin,
	double *CHDemag,double *CH)
{ 
	int i = blockDim.x*blockIdx.x + threadIdx.x;
	if(i < N)
	{
		double dot[3]={0,0,0};
		for(int n = xadj[i] ; n < xadj[i+1]; n++)
		{
			unsigned int neigh=adjncy[n];
			double val[3] = {dataxx[n],datayy[n],datazz[n]};
			for(unsigned int co = 0 ; co < 3 ; co++)
			{
				dot[co]+=(val[co]*Cspin[3*neigh+co]);
			}
		}
		CH[3*i]=CHDemag[3*i]+dot[0];
		CH[3*i+1]=CHDemag[3*i+1]+dot[1];
		CH[3*i+2]=CHDemag[3*i+2]+dot[2];
	}
}

Give more complete information please - how did you compile this code for the two architectures? (Compiler options)

I compile my code using nvcc bigData.cu
without any other options

That seems incredibly wrong as this would target the oldest supported GPU architectures such as sm_20

what should i do ?
i don’t know how to improve the titan v.

the titan is more expensive than rtx 2080ti,
why the performance is opposited?

any body help me ?

[1] Compile for the correct GPU target architecture (learn about the -arch and -gencode switches of nvcc)
[2] Familiarize yourself with the CUDA profiler, profile your kernel and use the results to guide optimizations
[3] Learn about the restrict modifier and how it can help the compiler generate better code

In my experience, questions of the sort “GPU X is faster than GPU Y, why?” based on perceived notions or simplistic mental models of GPU performance are not fruitful. The output of the CUDA profiler is a much better way to zero in on the factors that are crucial to the performance of a particular kernel. If necessary the relative performance of two GPUs can then be discussed based on salient differences in profiler output.

At first glance your kernel would appear to be memory bound, with some potentially disadvantageous access patterns because of indirection caused by the use of the adjacency matrix.

the -arch and -gencode is not useful,and the titan v’s double precision compatity may be 10 times than rtx2080ti,but the running result isn’t.

If the performance of the code is bound by memory throughput (as I think it is), the computational throughput is largely irrelevant.

but the titan v 's memory thoughout also powerful than rtx.

but the titan v 's memory thoughout also powerful than rtx.

I would suggest following up on item [2] in post #9. Achieved bandwidth is also a function of access patterns; blanket statements like “titan v 's memory thoughout also powerful than rtx” are not really actionable.

There may be other issues affecting your application level performance due to code you haven’t shown, your kernel configuration(s) may be sub-optimal, your performance methodology may not be sound, etc.

thank you very, i work!,may I know how to improve it.