Denial of service via NTP client

Issue #290 new
greentomatodesu created an issue

Summary:

The NTP client is currently launched by init[1] process leading to possible denial of service. The unencrypted NTP protocol allows tampering of NTP traffic by an on-path adversary. Such an adversary can place themselves on-path at any point between the FreshTomato router and the NTP servers being used.

Additional Details:

It’s possible to tamper with the NTP client, as well as the BURST/IBURST parameters such that a side-channel is introduced into router-managed crypto (i.e the side channel is injected into OpenSSL/Sodium/OpenVPN/OpenSSL). Tampering with the NTP communications affects the performance of the FreshTomato router via BURST/IBURST in adverse ways.

In addition to this side-channel and performance impact, it also possible to crash/segfault the NTP client by tampering with it’s (unencrypted) client-to-server traffic. When the NTP client crashes after manipulating this traffic it tears down the init[1] process. Essentially, because NTP is launched as part of init[1] process, a crash/segfault in NTP causes the router to reboot itself. This leads to a denial of service in which the router can be remotely crashed/rebooted by an on-path adversary who targets the NTP communications over time.

Mitigation:

Set the NTP client (Basic > Time > Auto Update Time) to Never, or Only at startup. Update time manually (Tools > System Commands) using date MMDDhhmm command. The “Only at startup” option can be used to track time of crash.

tl;dr

When the NTP client crashes it should not tear down the init[1] process. It should be possible to restart the NTP client after a crash like any other service. It would be useful to have a log entry generated when this happens. To avoid the possibility of side-channels being introduced into router managed crypto: the additional parameters such as BURST/IBURST should be exposed and configurable in the GUI.

Thank you for your fine work on this alternative firmware. I hope this helps.

Comments (10)

  1. pedro repo owner

    Understood, but what can I do with this?

    ntpd is a part of busybox https://git.busybox.net/busybox/tree/networking/ntpd.c and you should rather report this issue to them.

    I have nothing to do with ntpd, and I can’t run it with user other than root (ntpd doesn’t allow such thing).

    Of course there is a

    -k FILE Key file / -p [keyno:NUM:]PEER
    

    option with ntp.keys file etc. but it’s useless in our case.

  2. greentomatodesu reporter

    I agree about the use of a keyfile. Message authentication is default enabled, and creates more overhead under tampering. The sync messages are still unencrypted and can be tampered in either direction of traffic.

    I had thought of reporting the issue upstream. The problem is ntp is allowed to fail under tampering. It’s expected to fail under tampering. So I instead reported it here because these routers have limited resources and, in the least, a crash in ntp(d) shouldn’t reboot the router, after crashing the init process.

    The problem appears to be the defaults used by ntpd itself. It attempts to maintain an accurate clock to the extent of treating system resources like they are unlimited. An on-path adversary exploits ntpd in default configuration. So one solution might be to expose the ntpd parameters. Allow fine tuning and default to only setting the time once at startup (not using ntpd unless the option is chosen and configured). This, of course, will cause problems for routers handling SSH, VPN, etc. Due to the drift. Additionally this still does not impose limits on ntpd resource consumption.

    Maybe another solution is to not use ntpd at all. Set the time periodically, but use the same method used in the “Only at startup” option. That is to sync once, then exit. At fixed (configurable) times, rerun the one-time sync. No ntpd, no fancy error detection/correction algorithms to break. An extension of this solution might be to run the one-time ntp sync from a temporary shell that has ulimit restrictions in place. It would then be possible to run a single sync with limits on memory (<free), scheduling priority (lowest), and runtime (<15s). If the process is killed due to limits create a log entry. Running only the simpler one time sync with limits should prevent init[1] from crashing. Running a one-time sync periodically should keep the clock accurate enough for router managed crypto. How often to run the one time sync will depend on drift severity.

    These are just some initial thoughts on how to approach an improvement. For me the simple solution of manually updating the time works, but I don’t do router managed crypto, and my drift is a tiny bit ahead.

    I’m mostly concerned with resource contention and the possibility of an init[1] crash causing reboots. I think the second option could be tested with a custom scheduler command: launch shell, activate limits, run one-time sync.

  3. greentomatodesu reporter

    It’s quite stellar of you to consider others in the open-source firmware community. Thanks again for your efforts. I’ll be sure to report the issue as you mention.

    Regarding ntpd priority, that’s excellent news. I hadn’t noticed that ntpd was being run with the high priority switch. That should help immensely with system stability.

    I’ve begun testing ntpd stability with a couple changes to system configuration:

    • I read that attacks on ntpd can be mitigated by using stratum servers from different pools. The client will then use multiple responses, potentially from different pools to set time. The round-robin dns used by ntp pool servers means a single resolution may not always resolve to the same server (good). The FreshTomato UI only allows three entries for custom NTP servers. In order to set stratum servers from multiple pools I had to modify the ntp configuration file using Tools > System Commands. The ntpd that comes with busybox doesn’t support configuration options driftfile, logconfig, and pool. So we’re limited to the server directive. The net result is 3 pools having 4 stratum servers.
    cat > /etc/ntp.conf <<EOF
    server 0.europe.pool.ntp.org
    server 1.europe.pool.ntp.org
    server 2.europe.pool.ntp.org
    server 3.europe.pool.ntp.org
    server 0.pool.ntp.org
    server 1.pool.ntp.org
    server 2.pool.ntp.org
    server 3.pool.ntp.org
    server 0.ubuntu.pool.ntp.org
    server 1.ubuntu.pool.ntp.org
    server 2.ubuntu.pool.ntp.org
    server 3.ubuntu.pool.ntp.org
    

    • The shell built-in ulimit is used to activate limits before running the sync process. Core dumps are ignored, niceness is set to (near) lowest priority, lockable memory/max memory/stack size are (arbitrarily) low at 4096kb, open files are limited to 20, total cpu time cannot exceed 15s, and the process forking is limited (for later when daemon is retested). So far the ntp sync process completes within a second, but I intend to test this under heavy load.

      ulimit -c 0 -e 15 -l 4096 -m 4096 -n 20 -s 4096 -t 15 -u 3 -v 4096 * I run the one-time sync, every 10m, to avoid as much error detection and correction logic. The 10m static interval keeps drift within 1s. I’ll check the impact on router-hosted SSH/VPN too. With this choice I wanted to create a predictable window-of-opportunity for attacks while keeping time synced.

    Putting those three together I run a custom scheduler command every 10m. The logger commands are present purely to see if the process indeed takes (reportedly) up to 10s to sync. Thus far the sync process is complete in less than a second.

    logger "Begin NTP sync"
    ulimit -c 0 -e 15 -l 4096 -m 4096 -n 20 -s 4096 -t 15 -u 3 -v 4096
    ntpd -q
    logger "Finished NTP sync"
    

    So I’ll leave that to “soak” for a bit. The next steps will include testing clock accuracy under heavy load (when low priority may cause problems). Additionally testing the impact on router-hosted SSH/VPN. If all goes well daemon-mode should be possible with similar changes/limits.

  4. greentomatodesu reporter

    Hello again pedro, I’m very happy to report success. I was going to wait until 90 days had passed before updating. Since you've requested an update I see no harm in reporting in a little early 😁. I started the daemon test shortly after my last comment. The router has not rebooted itself at all since making the above changes. Not even once. My previous best result under default settings was a router crash/reboot every 1-2 weeks, with a maximum 1 month. I’m now sitting at 85+ days of uptime. It’s 100% stable.

  5. greentomatodesu reporter

    With that said, I’ll detail the changes I made to track ntpd behavior during testing. In case anyone wishes to debug ntp on FreshTomato.

    • I found the option to enable NTP logging didn’t do anything. I read the manual and discovered ntp from busybox has limited logging capability. It’s enabled with the -S option. I created a script to be triggered by the -S option for tracking ntp state. I ran the following from Tools > System Commands:
    cat > /etc/ntp-state <<EOF
    #!/bin/sh
    logger -t ntpd "Action=\$@ Stratum=\$stratum Drift/ppm=\$freq_drift_ppm Interval=\$poll_interval Offset=\$offset"
    

    • Made the script executable with:
    chmod a+x /etc/ntp-state
    

    • Since ntp state logging is limited, I additionally wanted to track the process resource use. I added a custom scheduler command at Administration > Scheduler, to be run every minute. With this command scheduled I can track important metrics like memory use, cpu time, and priority:
    ulimit -c 0 -e 19
    sleep=$(grep -oE "[0-9]+%" /proc/$(pidof ntpd)/status)
    utime=$(cut -d' ' -f14 /proc/$(pidof ntpd)/stat)
    stime=$(cut -d' ' -f15 /proc/$(pidof ntpd)/stat)
    pr=$(cut -d' ' -f18 /proc/$(pidof ntpd)/stat)
    vmem=$(cut -d' ' -f23 /proc/$(pidof ntpd)/stat)
    logger -t status -p user.warning "NTPD sleep/avg=$sleep utime/ticks=$utime stime/ticks=$stime priority=$pr memory/bytes=$vmem"
    

    • Another limitation of ntp state logging is (without further modification/recompile) it’s difficult to correlate state with ntp network traffic. So I added a rule to iptables to separately log ntp network traffic. I ran the following from Tools > System Commands:
    iptables -A OUTPUT -p udp --dport 123 -j LOG --log-prefix "NTPD " --log-level 4
    

    • Ntp is designed to only perform DNS queries at process start, and subsequently only if a time server stops responding. I wanted to log these DNS queries so I enabled “DHCP Client” (dnsmasq) at Administration > Logging. You should see dnsmasq running with log-async from Tools > System Commands like this:
    top -bn1
    ...
    dnsmasq -c 4096 --log-async
    

    • For some reason the above only enables logging without actually logging queries. That needs to be separately added to the dnsmasq configuration. Located at Advanced > DHCP/DNS you will find “Dnsmasq Custom configuration”. To enable the desired query logging I added the following:
    log-queries
    

    • At this point FreshTomato is instrumented to log ntp internal state, process state, network traffic, and dns queries. You’ll be able to correlate dns queries with ntp server failures, and network traffic with ntp internal state. You’ll also be logging the process resource consumption on a minute-to-minute basis. It produces a wealth of data that is useful to track anomalies in ntp behavior.
    • With all that completed, make sure remote logging is configured at Administration > Logging.
    • Make sure ntp is configured to run “Never” or “Only at startup” at Basic > Time.
    • Exit the existing ntpd from Tools > System Commands:
    kill $(pidof ntpd)
    

    • Restart ntpd, making sure to include the -S option, and with low priority from Tools > System Commands:
    ulimit -c 0 -e 15 -l 4096 -m 4096 -n 32 -s 4096 -u 2 -v 4096
    ntpd -S /etc/ntp-state
    

    • Check your logs to confirm priority and let ntp stabilize over a couple days. Your polling interval will eventually increase from the minimum of 64 seconds.

    Now ntp cannot de-stabilize FreshTomato. It looks confirmed. The default setting of up to 3 time servers makes ntp susceptible to tampering in ways that can cause a router crash/reboot. There appears to be no downside to the changes. There shouldn’t be any problem combining low priority with the local ntp server option at Basic > Time. The changes effectively keep ntp on the fast-path of confirming the digital signature of server responses. Ntp will rarely need to perform error correction because it’s faster to ask for consensus.

    Thank you again for your efforts in maintaining FreshTomato!

  6. greentomatodesu reporter

    One more thing…

    If your internet (wan) lease ever expires, and it will, dnsmasq restarts because it handles DNS and DHCP. When this happens you’ll see “lease lost, entering init state” in your logs. This event resets iptables rules. You’ll need to re-add the iptables rule to continue logging ntp network traffic. You just need to go back to Tools > System Commands and re-run:

    iptables -A OUTPUT -p udp --dport 123 -j LOG --log-prefix "NTPD " --log-level 4
    

    That’s it. Hopefully this helps anyone who runs into the same problem.

  7. Log in to comment