I’ve gathered that an “unspecified launch failure” is generic, but often caused by errant memory access. Below I have some partial code, and I’ve isolated the problem to one line:
extern __shared__ signed int sharedMem[];
// [... some code ..]
__global__ kernel(signed int *timeSeriesArray, [...]) {
signed int * globalReferenceTimeSeries = (signed int *)(timeSeriesArray + referenceTimeSeriesIndex);;
signed int * sharedReferenceTimeSeries = (signed int *)(sharedMem);
// [... some code ..]
for(unsigned int i = 0; i < timeSeriesLength; i++)
// Problem line below:
sharedReferenceTimeSeries[i] = globalReferenceTimeSeries[i];
}
But when I do
for(unsigned int i = 0; i < timeSeriesLength; i++)
signed int foo = globalReferenceTimeSeries[i];
OR
for(unsigned int i = 0; i < timeSeriesLength; i++)
sharedReferenceTimeSeries[i] = i;
OR even
for(unsigned int i = 0; i < timeSeriesLength; i++) {
signed int foo = globalReferenceTimeSeries[i];
sharedReferenceTimeSeries[i] = i;
OR
for(unsigned int i = 0; i < timeSeriesLength; i++) {
signed int foo = 12;
sharedReferenceTimeSeries[i] = foo;
No problem. Then I do:
for(unsigned int i = 0; i < timeSeriesLength; i++) {
signed int foo = globalReferenceTimeSeries[i];
sharedReferenceTimeSeries[i] = foo;
And I’m back to the “unspecified launch failure” error with kernel termination.
Any ideas on:
How this is even possible (my C/C++ is 15 years old, so consider me a n00b if it’s an obvious pointer issue or some such thing)
I’ve gathered that an “unspecified launch failure” is generic, but often caused by errant memory access. Below I have some partial code, and I’ve isolated the problem to one line:
extern __shared__ signed int sharedMem[];
// [... some code ..]
__global__ kernel(signed int *timeSeriesArray, [...]) {
signed int * globalReferenceTimeSeries = (signed int *)(timeSeriesArray + referenceTimeSeriesIndex);;
signed int * sharedReferenceTimeSeries = (signed int *)(sharedMem);
// [... some code ..]
for(unsigned int i = 0; i < timeSeriesLength; i++)
// Problem line below:
sharedReferenceTimeSeries[i] = globalReferenceTimeSeries[i];
}
But when I do
for(unsigned int i = 0; i < timeSeriesLength; i++)
signed int foo = globalReferenceTimeSeries[i];
OR
for(unsigned int i = 0; i < timeSeriesLength; i++)
sharedReferenceTimeSeries[i] = i;
OR even
for(unsigned int i = 0; i < timeSeriesLength; i++) {
signed int foo = globalReferenceTimeSeries[i];
sharedReferenceTimeSeries[i] = i;
OR
for(unsigned int i = 0; i < timeSeriesLength; i++) {
signed int foo = 12;
sharedReferenceTimeSeries[i] = foo;
No problem. Then I do:
for(unsigned int i = 0; i < timeSeriesLength; i++) {
signed int foo = globalReferenceTimeSeries[i];
sharedReferenceTimeSeries[i] = foo;
And I’m back to the “unspecified launch failure” error with kernel termination.
Any ideas on:
How this is even possible (my C/C++ is 15 years old, so consider me a n00b if it’s an obvious pointer issue or some such thing)
In all but the first and last case the compiler will optimize away the access to [font=“Courier New”]globalReferenceTimeSeries[i][/font] because its value is not actually used. So this strongly hints at a problem with [font=“Courier New”]timeSeriesArray[/font] or [font=“Courier New”]referenceTimeSeriesIndex[/font].
In all but the first and last case the compiler will optimize away the access to [font=“Courier New”]globalReferenceTimeSeries[i][/font] because its value is not actually used. So this strongly hints at a problem with [font=“Courier New”]timeSeriesArray[/font] or [font=“Courier New”]referenceTimeSeriesIndex[/font].
If [font=“Courier New”]foo[/font] is used later, then probably all but the last read of [font=“Courier New”]globalReferenceTimeSeries[i][/font] are optimized away. This would point to a (usually less common) problem at the beginning of the array.
You can verify that theory by checking that
signed int foo = 0;
for(unsigned int i = 0; i < timeSeriesLength; i++) {
foo += globalReferenceTimeSeries[i];
sharedReferenceTimeSeries[i] = i;
}
fails but
signed int foo = globalReferenceTimeSeries[timeSeriesLength-1];
for(unsigned int i = 0; i < timeSeriesLength; i++) {
sharedReferenceTimeSeries[i] = i;
}
works (assuming timeSeriesLength>0).
EDIT: Missed your last post. - glad you found the problem already. Unfortunately I don’t know of a tool that would catch such a problem at compile time. Sounds like it would be a fun project to adapt sparse to this.
Ocelot should have been able to catch this at runtime, though.
If [font=“Courier New”]foo[/font] is used later, then probably all but the last read of [font=“Courier New”]globalReferenceTimeSeries[i][/font] are optimized away. This would point to a (usually less common) problem at the beginning of the array.
You can verify that theory by checking that
signed int foo = 0;
for(unsigned int i = 0; i < timeSeriesLength; i++) {
foo += globalReferenceTimeSeries[i];
sharedReferenceTimeSeries[i] = i;
}
fails but
signed int foo = globalReferenceTimeSeries[timeSeriesLength-1];
for(unsigned int i = 0; i < timeSeriesLength; i++) {
sharedReferenceTimeSeries[i] = i;
}
works (assuming timeSeriesLength>0).
EDIT: Missed your last post. - glad you found the problem already. Unfortunately I don’t know of a tool that would catch such a problem at compile time. Sounds like it would be a fun project to adapt sparse to this.
Ocelot should have been able to catch this at runtime, though.