Acer CB5-311 TK1 ChrUbuntu running the CUDA example - error code=46

Dear All,

I have installed on my Acer CB5-311 TK1 4GB FullHD the OS Chrubuntu.
I have installed Ubuntu and the Tegra 21.4.0 following the link:
http://www.clifford.at/blog/index.php?/archives/131-Installing-Ubuntu-on-an-Acer-Chromebook-13-Tegra-K1.html
After I have installed the CUDA 6.5 Toolkit.

Downloaded the CUDA Example and compiled them (NVIDIA_CUDA-6.5_Samples)

If I run the deviceQuery, I get all the information about of the GK20 GPU.
Nevertheless, if I run any other example that execute a call cudaMalloc(), I get the following error:

code=46(cudaErrorDeviceUnavailable)

Could anybody halp me to understand what is wrong?

The same software installation on my Jetson TK1 card is working fine without any problem

Thanks in advance for the help.

Best Regards
Stefano

 Yes, per this thread [url]https://devtalk.nvidia.com/default/topic/774354/jetson-tk1/l4t-on-acer-chromebook-13-cb5/3[/url], a recent ChromeOS kernel removed some CUDA hooks.  The script on this page grabs a slightly older ChromeOS kernel (from R41, ~March 2015) and builds it.  There's also a command to unthrottle the GPU. I haven't run this yet but followup posts say it works fine. I don't know if heat will be a problem if you run a long CUDA job (I've read TK1 uses about 11 watts running all 192 CUDA units), but reportedly it had no heat problems running the demos.  I suggest making a few small edits though.. 
  1. Check your destination. This has a line setting “root=/dev/mmcblk0p7” (which should be mmcblk1p7 if you installed to SD like I did.) In the instructions in prints at the end to install the kernel, you’d want mmcblk1 instead of mmcblk0 there too if you installed to SD.

  2. It downloads Linux4Tegra 21.3 (but doesn’t untar it or anything.) Go ahead and comment that out, you’ve already got 21.4.

I must admit I’m impressed by this Chromebook – just got it a few days ago planning to see if I could do some CUDA development (and Android development with Eclipse & Android Studio hopefully). I’m impressed by ChromeOS, and both impressed and relieved at how well Chrubuntu worked out (I installed gnome-session-flashback.) Have fun!

I want to add a few more tweaks I’ve found.

I placed these in /etc/rc.local:
mount / -o remount,commit=600
echo deadline > /sys/block/mmcblk1/queue/scheduler
echo 6 > /sys/block/mmcblk1/queue/iosched/writes_starved
echo runnable > /sys/devices/system/cpu/cpuquiet/current_governor

 Explanation:
 First three lines affect the disk scheduling, I've gotten *FAR* better I/O performance using these tweaks. The last line enables some additional power management.

 The first line changed the journal commit interval from default of 5 seconds to 600 seconds.  ChromeOS uses this by default.  This apparently does a *synchronous* (sync mode) write of any journal data not written to disk already... stalling any other writes or reads to do so.  That's what brings about those random stalls I'd see now and then.

 The second and third line change the disk I/O scheduler.  cfq's worked great for me in general, but on the SDCard it seems to perform quite poorly.  I monitored actual disk performance with gkrellm.  The deadline scheduler's main tuneables are read_expire (default 500), write_expire (default 5000), writes_starved (default 2.)  It tries to fullfill read requests within read_expire ms, writes within write_expire ms, but will starve writes up to writes_starved time intervals (so 10000ms) to keep reads going.  Settings writes_starved to 6 allows deferring writes up to 30 seconds in favor of reads.  In practice, it seems to stall writes entirely to allow reads to complete ASAP, then start flushing out writes as soon as your reads complete.  It doesn't actually do the writes any faster than cfq but prevents the writes from stalling your reads.

 The last line, I found that in ChromeOS the system would actually power off CPUs that are not needed, while Ubuntu-side you'd have your tasks evenly divided between all 4 cores.   I saw about 15 hours battery life in ChromeOS, but ~10 hours in Ubuntu (good enough but still.)  That governor defaults to "userspace", but there's *no* userspace application controlling it so it just runs all 4 cores.  "runnable" turns on 1 core per runnable task, you can run top (and press "1") and see the CPUs appear and disappear with load.  The one down side, top may be the *only* way to gauge CPU usage, it jacks up typical CPU meters... is that "75%" 75% usage of 1 core or 4?

OK, one more – this one’s good. Long story short, I ran

sudo tune2fs /dev/mmcblk1p7 -E stride=16,stripe_width=1024

(replace mmcblk1p7 as needed) and saw quite a bit more ~5MB a second writes and less of the like 32KB/sec or so writes. The ext2/3/4 filesystem tries to allocate new files within a SDCard write block and erase block with these options, instead of allocating them wherever. This should also greatly reduce card wear.

So, what’s this do? These settings are designed for RAID. The stride is the size of a basic RAID block, you write a 512-byte block and the RAID system must read it, modify it, write it back (and write some parity for RAID5.) Well, the SDCard has a minimum block size where it must do this too, this is typically 16-64KB, I choose 64KB. The stripe-width is based on stride * number of data disks, since you have x data + 1 or 2 parity disks keeping writes within a stripe speeds things up somewhat. On SDCard, you have an erase block/allocation unit size that is like 2-4MB where keeping writes within this block greatly improves write speed, I choose 4MB. When you run mke2fs (or mkfs.ext2/3/4) with the -E stride=x,stripe_width=y, it tries to make sure the metadata is spread between disks (probably useless for SDCard.) But you can turn it on even after the fact, and it should try to allocate new files so they are within write block/erase block boundaries on the SDCard. In practice it seems to help quite a bit.

This site has some info on how to determine your write block and erase block size, plus a table with a bunch of cards listed. I didn’t measure my card, the bigger cards were mostly 64KB write size and 4MB erase block so I picked that.
https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey