Problem with basic cufft

I made a quick program to make sure I could use the cufft library correctly. When I run a batch size of “1” I get the result I expect. However, as I increase the batch size, I get what appears to be random bytes at the end of my data buffer. If the batch size is 2, the last three entries are noise. If the batch size is 3, I get noise in the last six entries at the end of the buffer, as well as in the three entries at the end of what should be the results from the second of the three transforms in the batch.

#define NX 1024
#define BATCH 2

#include 
#include 
#include 
#include 
#include 

int main()
{
	cufftHandle plan;
	cufftComplex *deviceData;
	cufftComplex *hostData;
	FILE* output;
	
	int i, j;
	
	cudaMalloc((void**)&deviceData, NX * BATCH * sizeof(cufftComplex));
	hostData = (cufftComplex*)malloc(NX * BATCH * sizeof(cufftComplex));

	for (j = 0; j < BATCH; j++)
	{
		for (i = 0; i < NX; i++)
		{
			hostData[i + j*BATCH].x = sin(i*(j+1) / (float)10);
			hostData[i + j*BATCH].y = 0;
		}
	}

	cudaMemcpy(deviceData, hostData, NX * BATCH * sizeof(cufftComplex), cudaMemcpyHostToDevice);
	cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH);
	cufftExecC2C(plan, deviceData, deviceData, CUFFT_FORWARD);
	cudaThreadSynchronize();
	cudaMemcpy(hostData, deviceData, NX * BATCH * sizeof(cufftComplex), cudaMemcpyDeviceToHost);
	cufftDestroy(plan);
	cudaFree(deviceData);

	output = fopen("outputFile.txt", "w");
	for (j = 0; j < BATCH; j++)
		for (i = 0; i < NX; i++)
			fprintf(output, "%f\t%f\n", hostData[i + j*BATCH].x, hostData[i + j*BATCH].y);
	fclose(output);
}

You address elements incorrectly. i’th element in j’th batch of size NX is has index i + j * NX and not i + j * BATCH.