Testing built-in R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating R2C & C2R FFT plans for 2048 x 2048 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1289.075120 MPix/s (3.103000 ms) ...reading back GPU convolution results ...running reference CPU convolution ...comparing the results: rel L2 = 1.308637E-03 (max delta = 1.054192E-01) TEST FAILED ...shutting down Testing custom R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating C2C FFT plan for 2048 x 1024 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1297.016815 MPix/s (3.084000 ms) ...reading back GPU FFT results ...running reference CPU convolution ...comparing the results: rel L2 = 5.094591E-04 (max delta = 5.422376E-02) TEST FAILED ...shutting down Testing updated custom R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating C2C FFT plan for 2048 x 1024 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1634.654661 MPix/s (2.447000 ms) ...reading back GPU FFT results ...running reference CPU convolution ...comparing the results: rel L2 = 5.721548E-04 (max delta = 5.010304E-02) TEST FAILED ...shutting down Press ENTER to exit...