16 Sep 2017 Running RancherOS on Scaleway Cloud, the right(-ish) way.

There are some posts out there showing how to install RancherOS on SCW using the installer + syslinux/grub, thus effectively side-stepping the SCW infrastructure and forcing user to manage whole stack, from iPXE to a booted system. I think we can do better. After all, there is a rootfs available and, based on some SCW articles, this should be enough, right? Well, almost.

Setup [temporary] server with RancherOS rootfs.

Create a new server instance and change the bootscript (it’s under “Advanced” options) to one that ends with rescue. This will run the instance with system loaded to RAM and leave the disk free for us to tinker with. The actual system chosen doesn’t matter, we’re gonna destroy it anyway in just a minute.

After the machine boots up, connect to it through ssh.

First step is to zero out partition table and create a new filesystem. I’m doing it the RancherOS installer way, although it shouldn’t be strictly necessary. Just remember to not create any partitions, the whole drive should be formatted as is, otherwise Scaleway scripts won’t pick it up properly.

# dd if=/dev/zero of=/dev/vda bs=512 count=2048
# mkfs.ext4 -F -i 4096 -L RANCHER_STATE /dev/vda

Now mount it.

# mount /dev/vda /mnt

Download and unpack RancherOS rootfs image to the disk.

# curl -Lo - https://github.com/rancher/os/releases/download/v1.0.4/rootfs.tar.gz | tar xz -C /mnt

Adjust rootfs to work with Scaleway infrastructure.

Move to the partition mount directory, for easier access.

# cd /mnt

RancherOS image stores init script at the root of the filesystem, but Scaleway boot script expects it under /sbin, so symlink it.

# ln -s ../../init sbin/init

Since the rootfs is just a filesystem image, it does not contain kernel, nor - more importantly - kernel modules. Scaleway images use a custom script that downloads appropriate modules from their servers on first run. Unfortunately, this script requires a Debian-like environment, with full-blown shell, GNU Wget, depmod, etc. We don’t have that in our barebones RancherOS image.

Therefore, I’ve written a different script, which, bundled with the proper (more on this in just a second) busybox, is able to do this without any other dependencies.

The proper busybox. We cannot use the one that’s on the image, as it doesn’t contain many of the actions we need (wget, depmod, …). We also cannot use the “stock” one, downloaded from busybox website, because it uses a “simplified” depmod, which produces output incompatible with regular modules utilities.

So, I have downloaded the sources and built a custom busybox binary which contains all that’s needed and with proper configuration. The .config file used is available as gist.

There is still one more gotcha here, though. For the name resolution in wget to work, we need to link against uclibc(-ng). Statically linking against glibc will not work and we, quite obviously, don’t want a dynamic link. There are two ways to tackle this. Either use a distro that’s built around uclibc, which you probably don’t have (unless you’re running a x86 version of OpenWrt somewhere [TIL]). Or build a cross toolchain. I went the second path and used crosstool-NG as my guide.

With all that out of the way, the following commands will download the script and the proper busybox binary and put them in /usr/local/sbin, as expected by Scaleway boot script.

# mkdir -p usr/local/sbin
# curl -Lo usr/local/sbin/scw-sync-kernel-modules \
https://gist.github.com/KenjiTakahashi/d3660cf8120c38d514d43436af9c2f90/raw/a99f9d568dcb609f4d16b10b5c53e1e103d55d7e/scw-sync-kernel-modules
# curl -Lo usr/local/sbin/busybox https://img.kenji.sx/busybox
# chmod +x usr/local/sbin/busybox usr/local/sbin/scw-sync-kernel-modules

Last, but not least, you should add a cloud-config file with your ssh key. Otherwise, you won’t be able to login into the system in any way. Setting a hostname might be a good idea, too.

# mkdir -p var/lib/rancher/config/cloud-config.d
# cat <<EOF > var/lib/rancher/confg/cloud-config.d/user-config.yml
#cloud-config
hostname: rancheros
ssh_authorized_keys:
    - <YOUR_SSH_PUBLIC_KEY_HERE>
EOF

Remember to change the above command to contain your real ssh (public) key!

Boot up time.

Unmount the disk and disconnect from the server.

# cd
# umount /mnt
# ^D

Power off (“archive”) the machine.

There are two ways to go from now.

If you want to have an image that can be used to bootstrap many servers automatically, etc., you should now create disk snapshot, create image off of the snapshot. The downside of this is that you need to keep that snapshot around (as the image will be tied to it) and snapshot storage costs money.

If not, you could just change bootscript of the existing server to x86_64 <kernel_version> rancher #1, reboot and it should work.

Note that the bootscript type is important, the rancher one is the one that contains all modules necessary for RancherOS’s system-docker to work as expected. If you went the first path, you can also set it up as default bootscript for the image, to ease future deployments.