Running netbird client on Synology DSM

Introduction

I recently made the aquisition of an Arm powered Synology NAS. I looked for a way to securly connect to it from outside my home network. I settled on Netbird as a nice, self-hostable wrapper over Wireguard that also happen to have a mobile application.

I could also have used Tailscale, which has a native app for DSM, the Synology's Linux based operating system the NAS comes with.

Disclaimer

Running the Netbird client implies running may commands as root, which can damage your DSM installation and/or prevent a smooth upgrade. Also, the Netbird client container will run with nearly all powers and could be seen as an additional vector of attack. And the firewall commands I use might not be enough to properly secure the access to the NAS.

Beside, the situation (kernel version, Netbird client version, ...) might have changed by the time you read this and most of those steps might no longer be needed or might do more harm than good.

Follow along at your own risks and use your own judgement.

Running Netbird

The Netbird client is primarily written in Go and a docker image is provided for aarch64. Which is great as it is the architecture of my NAS.

Pre-requisites

  1. Install container manager.

  2. Fetch the Netbird image as per Netbird's documentation.

  3. Generate a setup key on the Netbird's website.

  4. I'm assuming you have already connected a peer or two to your network (not required but I expect you tested it works).

Running

Netbird's documentation recommends this command:

docker run --rm --name PEER_NAME --hostname PEER_NAME --cap-add=NET_ADMIN --cap-add=SYS_ADMIN --cap-add=SYS_RESOURCE -d -e NB_SETUP_KEY=<SETUP KEY> -v netbird-client:/etc/netbird netbirdio/netbird:latest

And while you can map all option of the above command to the right toggle in the Container Manager's UI, do not bother, it does not work. Running the command as root from an SSH session does not work either.

The instead have the error:

INFO client/internal/connect.go:119: starting NetBird client version 0.28.6 on linux/arm64
INFO iface/module_linux.go:76: couldn't access device /dev/net/tun, go error stat /dev/net/tun: no such file or directory, will attempt to load tun module, if running on container add flag --cap-add=NET_ADMIN
ERRO client/internal/engine.go:302: failed creating wireguard interface instance wt0: [couldn't check or load tun module]
ERRO client/internal/connect.go:263: error while starting Netbird Connection Engine: new wg interface: couldn't check or load tun module

Missing module

Netbird says it'll try to load the tun module, but that does not seem to work from inside the container despite the NET_ADMIN capability being set. Thanksfully, loading it beforehand on the host just works.

But that's not all. As mentioned in the error message, Netbird also needs access to the device /dev/net/tun. So we'll need to add --device /dev/net/tun to the command.

With this new commands, you should be able to run Netbird and it should be able to connect to other peers:

modprobe tun
docker run --rm --name PEER_NAME --hostname PEER_NAME --cap-add=NET_ADMIN --cap-add=SYS_ADMIN --cap-add=SYS_RESOURCE -e NB_SETUP_KEY=<SETUP KEY> -v netbird-client:/etc/netbird --device /dev/net/tun -d netbirdio/netbird:latest

Routing traffic

While other peers can connect to Netbird this way, they cannot access any of the services running on the NAS. The main reason is that they connect to the container, which is running in a virtual network, and therefore has no direct access to the services running on the host.

In theory, Netbird sets up iptables rules to direct the traffic it receives from the tunnel back to the hosts, but this fails.

Introducing iptables

It turns out the iptables binary that comes with the Netbird docker image is too "recent". Or rather, it expects the new kernel API to manipulate the packet routing and filtering tables. This new kernel API is known as nftables. And indeed, if we run iptables -V on the host we get

sh-4.4# iptables -V
iptables v1.8.3 (legacy)

Whereas in the Netbird's container (here named netbird), we get:

sh-4.4# docker exec netbird iptables -V
iptables v1.8.10 (nf_tables)

Actually, the host seems to support nftables, but I've not found any way to make them work (the nft binary is missing). Nor did I try very hard. But it's fairly annoying as we seem to be stuck in a situation where the iptables target LOG does not work.

sh-4.4# iptables -A INPUT -p tcp --dport 443 -j LOG
iptables: No chain/target/match by that name.

Even though I tried to add xt_LOG to /proc/sys/net/netfilter/nf_log:

sh-4.4# cat /proc/sys/net/netfilter/nf_log/2
NONE
sh-4.4# modprobe xt_LOG
sh-4.4# echo xt_LOG >/proc/sys/net/netfilter/nf_log/2
sh: echo: write error: No such file or directory
sh-4.4# cat /proc/sys/net/netfilter/nf_log/2
NONE

Well, that's not mandatory but it would be useful for diagnosis. If someone find a workaround, do let me know.

Running the host's iptables inside netbird's container

Anyway, the iptables present in the Netbird's container won't do. So the solution would be to run the host version of iptables from inside the host container and vaguely reproduce the configuration Netbird intends to set up. Well, since I have full control over it, I'll actually change the configuration a bit as I would like to expose only select few services instead of everything and the DSM's admin panel.

But how to run a host binary from inside a docker container? Well, the obvious solution is to mount /bin and maybe /lib inside the container. And maybe mess around with LD_LIBRARY_PATH. But that would be cheating. And it sounds annoying.

Anatomy of a container

On Linux, containers are composed of mainly two things. In short: cgroups, which limit access to physical resources (in the broad sense, with many exceptions) and namespaces, which isolate processes on some given axis.

I cannot really expand too much on the topic myself as I just brushed the surface while searching for a solution to my problem. I'm sure you can find good resources online that go well beyond the vague and inaccurate description I made of them.

However, what I can tell, is that containers are isolated from one-another on a network level using namespaces. So we need to run iptables in the same network namespace as the container.

Enter nsenter

Here is nsenter(1) to save the day. On another machine, I tried the following:

(host)# docker run -d --rm --name httpd docker.io/library/httpd
<snip>
(host)# docker inspect httpd -f '{{.State.Pid}}'
87082
(host)# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
(host)# nsenter --net=/proc/87082/ns/net bash
(httpd)# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
(httpd)# iptables -A INPUT -m tcp -p tcp --dport 80 -j ACCEPT
(httpd)# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
(httpd)# exit
(host)# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
(host)# docker stop httpd

As we can see, adding rules to iptables after nsenter does modify the filtering table, but those changes are invisible when we are back on the host system. Therefore, we must have modified the container's filtering table.

Perfect, that'll do the job nicely. Let's try on the NAS:

sh-4.4# nsenter
sh: nsenter: command not found

Welp.

But the associated syscall should exist, otherwise docker should not be able to work.

Zig to the rescue

I've been dabbling in Zig from time to time and it seems like the perfect fit for this issue. Basically, the code I need is already written, but in C, at the end of the setns(2) man page. This is the syscall used by nsenter as its manpage nicely mentions.

But I also need to cross compile it for aarch64 from x86_64 and, ideally, without linking to libc as to avoid symbol lookup error when I try to run the binary on the NAS.

Here is my code. But beware that I barely know Zig and there might be a few mistakes and non-idiomatic code in there.

const std = @import("std");
const builtin = @import("builtin");

pub fn print_usage(pname: [*:0]u8) void {
    std.log.err("Usage: {s} /proc/PID/ns/NS [CMD ARGS...]", .{pname});
}

pub fn my_streql(lhs: [*:0]u8, rhs: []const u8) bool {
    for (rhs, 0..) |c, i| {
        if (c != lhs[i]) {
            return false;
        }
    }
    return lhs[rhs.len] == 0;
}

pub fn main() !void {
    var buffer = [_]u8{0} ** (64 * 1024);
    var fb_allocator = std.heap.FixedBufferAllocator.init(&buffer);
    var arena_allocator = std.heap.ArenaAllocator.init(fb_allocator.allocator());
    defer arena_allocator.deinit();

    const allocator = arena_allocator.allocator();

    if (std.os.argv.len < 2) {
        print_usage(std.os.argv[0]);
        std.log.err("Missing argument /proc/PID/ns/NS", .{});
        std.process.exit(1);
        return;
    }
    if (my_streql(std.os.argv[1], "-h")) {
        print_usage(std.os.argv[0]);
        std.process.cleanExit();
        return;
    }

    var cmd: [][]u8 = undefined;
    if (std.os.argv.len == 2) {
        if (std.process.getEnvVarOwned(allocator, "SHELL")) |shell| {
            cmd = try allocator.alloc([]u8, 1);
            cmd[0] = shell;
        } else |_| {
            std.log.err("No command specified and could not access the SHELL env var", .{});
            std.process.exit(1);
            return;
        }
    } else {
        cmd = try allocator.alloc([]u8, std.os.argv.len - 2);
        for (cmd, 0..) |*e, i| {
            e.* = std.mem.sliceTo(std.os.argv[i + 2], 0);
        }
    }

    const fname = std.os.argv[1];
    const fd = std.os.linux.openat(std.os.linux.AT.FDCWD, fname, .{}, 0);
    if (fd > (1 << 32)) {
        std.log.err("Error opening file at {s}", .{fname});
        std.process.exit(1);
    }

    const ret = std.os.linux.syscall2(.setns, fd, 0);
    if (ret != 0) {
        std.log.err("Failed to move to namespace described by {s}", .{fname});
        std.process.exit(1);
        return;
    }

    std.process.execv(allocator, cmd) catch {
        std.log.err("Failed to execute command:", .{});
        for (cmd) |e| {
            std.log.err(" {s}", .{e});
        }
        std.log.err("\n", .{});
        std.process.exit(1);
    };
}

And a single

$ zig build -Dtarget=aarch64-linux-musl --release=safe

later and I've my own nsenter.

Routing traffic (for real this time)

After getting sightly side-tracked, let's get back to the matter at hands: we need to route the traffic that comes out of the tunnel to the host. However, as I've mentioned, I do not want to route everything. I only want to expose a select few services.

There are many ways to do it apparently. One could set another tunnel up for example, with proper filtering on the host this time. Most likely on the DOCKER_USER table.

I chose to use the SNAT and DNAT feature of iptables as it seems the easiest. But that wont suffice, I also need to tell the host how to reply to the peers over the VPN. So I'll also add a route on the host to the Netbird's container for this.

Basically, this is the mess I created for myself:

A sequence diagram describing the routing a network packet. The packet comes from the internet and reaches the eth0 interface on the NAS on IP 192.168.0.142 It is then Natted to the docker0 interface to reach the Netbird's container on IP 172.17.0.3 Here it reaches the eth0 interface of the container and Wireguard handles it/decodes it and send the resulting packet on the wt0 interface with a source IP of the form 100.72.0.0/16 and the destination IP 100.72.184.98. This packet is DNATed by iptables to be routed to 172.17.0.1 instead, so the host's docker0 interface, where it is finally processed by the target application. A reply is sent through docker0 to the ip 100.72.0.0/16 thanks to a specific route on the host. It arrives on the container on its eth0 interface where it is SNATed to appear to be from 100.72.184.98 (instead of 172.17.0.1) before being routed through wt0 where Wireguard encrypt it to send it through the tunel. The packet then makes its way back to the internet by first traversing the NAT the other way on the host.

Which is achieved by running the following on the host:

ip route add 100.72.0.0/16 via 172.17.0.3

And the following in the network namespace of the container:

my_nsenter /proc/$(docker inspect netbird -f '{{.State.Pid}}')/ns/net bash
iptables --wait -t nat -N MINE-IN
iptables --wait -t nat -N MINE-OUT
iptables --wait -t nat -A MINE-IN -s 100.72.0.0/16 -d 100.72.184.98/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 172.17.0.1:443
iptables --wait -t nat -A MINE-OUT -s 172.17.0.1/32 -d 100.72.0.0/16 -p tcp -m tcp --sport 443 -j SNAT --to-destination 100.72.184.98:443
iptables --wait -t nat -A PREROUTING -j MINE-IN
iptables --wait -t nat -A INPUT -j MINE-OUT
iptables --wait -t nat -A OUTPUT -j MINE-IN
iptables --wait -t nat -A POSTROUTING -j MINE-OUT

Which allows peer connecting with https to my NAS's Netbird IP to reach the webserver running on my NAS. And nothing else (hopefully).

The complete script

In order to have Netbird's client run and connect when the NAS starts up, I ended up adding the following script, triggered on boot:

#!/usr/bin/env bash

set -euo pipefail

die() {
        echo $@
        exit 1
}

is_container_running() {
        if docker inspect "${CONTAINER_NAME}" -f '{{.State.Running}}' 2>/dev/null | grep -q true; then
                return 0
        else
                return 1
        fi
}

CONTAINER_NAME=netbird

# setup environment for the container to run properly
modprobe tun

# wait for docker command to be available
count=0
until docker ps -q -l >/dev/null 2>/dev/null; do
    if [ $count -gt 20 ]; then
          die "docker still not available"
        fi
        count=$((count+1))
        sleep 6
done

docker run --rm --name "${CONTAINER_NAME}" --hostname "${CONTAINER_NAME}" --cap-add=NET_ADMIN --cap-add=SYS_ADMIN --cap-add=SYS_RESOURCE -v /volume1/docker/netbird:/etc/netbird --device /dev/net/tun -d netbirdio/netbird:latest

container_ip=$(docker inspect "${CONTAINER_NAME}" -f '{{.NetworkSettings.Networks.bridge.IPAddress}}')
container_ip=${container_ip%/*}

# retrieve the netbird_ip, which requires a connection to the coordinator and might take time
netbird_ip=
while [ -z "$netbird_ip" ] && is_container_running; do
        netbird_ip=$(docker exec "${CONTAINER_NAME}" ip a show dev wt0 | awk '/inet/{print $2}' || true)
        netbird_ip=${netbird_ip%/*}
        [ -z "$netbird_ip" ] && sleep 5
done

[ -z "$netbird_ip" ] && die "Could not find netbird's IP (did the container die?)"

netbird_pid=$(docker inspect "${CONTAINER_NAME}" -f '{{.State.Pid}}')
[ -z "$netbird_pid" ] && die "Could not find netbird's PID"

host_docker_ip=${container_ip%.*}.1
nb_netmask=${netbird_ip%.*.*}.0.0/16

nb_iptables() {
        /volume1/admin/bin/my_nsenter "/proc/${netbird_pid}/ns/net" iptables "$@" --wait
}

nb_iptables -t nat -N MINE-IN
nb_iptables -t nat -N MINE-OUT

for port in 443 3434; do
        nb_iptables -t nat -A MINE-IN -s "${nb_netmask}" -d "${netbird_ip}"/32 -p tcp -m tcp --dport "${port}" -j DNAT --to-destination "${host_docker_ip}":"${port}"
        nb_iptables -t nat -A MINE-OUT -s "${host_docker_ip}"/32 -d "${nb_netmask}" -p tcp -m tcp --sport "${port}" -j SNAT --to-source "${netbird_ip}":"${port}"
done

nb_iptables -t nat -A PREROUTING -j MINE-IN
nb_iptables -t nat -A INPUT -j MINE-OUT
nb_iptables -t nat -A OUTPUT -j MINE-IN
nb_iptables -t nat -A POSTROUTING -j MINE-OUT

if ! ip route | grep -q "$container_ip"; then
        # need to create route to netbird's subnet via netbird's container
        ip route add "${nb_netmask}" via "${container_ip}"
fi

You'll notice a loop whose purpose is to wait for docker to be available. Apparently, the "boot trigger" from the DSM interface triggers before applications are "mounted" so to speak. And since docker is part of the Container Manager application, it is not available until after the script started.

There is also a reference to my_nsenter, this is the binary produced from the above zig code and installed in an admin/bin folder on the NAS.