dnsmasq restarting every 150 seconds on Starlink Internet (every time WAN lease renews)

Issue #287 resolved
Techie007 created an issue

Apparently, FreshTomato has had a behavior for an extended period of time where dnsmasq exits (due to SIGTERM) and restarts each time udhcpc obtains a lease. For most ISPs issuing 24-hour leases, this has never been much of a problem. However, Starlink issues short, 300 second (5 minute) leases, which renew at the 150 second (2.5 minute) mark, resulting in dnsmasq being reset every 150 seconds. This appears to be causing two problems:

  1. dnsmasq’s effectiveness as a DNS cache server is greatly reduced, as the cache is being constantly cleared. I have not found a viable workaround.
  2. These restarts appear to be causing DNS rebind attack notices from dnsmasq in relation to Windows online connectivity probes of dns.msftncsi.com. Whatever is happening as part of the rebind attack prevention is causing all PCs on the network to report no Internet connectivity in their network icon, despite Internet access working just fine on them. I was able to workaround this issue by disabling DNS rebind protection.

Perhaps a solution would be to differentiate between udhcpc obtaining a new lease (where we would continue to restart dnsmasq), versus udhcpc simply succeeding in renewing an existing lease (where we would avoid restarting dnsmasq). Perhaps another solution would be to track the gateway IP, and instead of restarting dnsmasq every time a new lease is negotiated, only restart it when the gateway IP changes.

May 10 15:54:53 Router daemon.info udhcpc[23413]: sending renew to server 100.64.0.1
May 10 15:54:53 Router daemon.info udhcpc[23413]: lease of 100.75.168.186 obtained from 100.64.0.1, lease time 300
May 10 15:54:53 Router daemon.info dnsmasq[15897]: exiting on receipt of SIGTERM
May 10 15:54:53 Router daemon.info dnsmasq[18618]: started, version 2.89 cachesize 8192
May 10 15:54:53 Router daemon.info dnsmasq[18618]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset Tomato-helper auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile
May 10 15:54:53 Router daemon.info dnsmasq[18618]: asynchronous logging enabled, queue limit is 5 messages
May 10 15:54:53 Router daemon.info dnsmasq-dhcp[18618]: DHCP, IP range 10.0.0.200 -- 10.0.0.229, lease time 7d
May 10 15:54:53 Router daemon.info dnsmasq[18618]: using only locally-known addresses for _dns.resolver.arpa
May 10 15:54:53 Router daemon.info dnsmasq[18618]: using only locally-known addresses for use-application-dns.net
May 10 15:54:53 Router daemon.info dnsmasq[18618]: reading /etc/resolv.dnsmasq
May 10 15:54:53 Router daemon.info dnsmasq[18618]: using nameserver 208.67.222.222#53
May 10 15:54:53 Router daemon.info dnsmasq[18618]: using nameserver 208.67.220.220#53
May 10 15:54:53 Router daemon.info dnsmasq[18618]: using only locally-known addresses for _dns.resolver.arpa
May 10 15:54:53 Router daemon.info dnsmasq[18618]: using only locally-known addresses for use-application-dns.net
May 10 15:54:53 Router daemon.info dnsmasq[18618]: read /etc/hosts - 6 names
May 10 15:54:53 Router daemon.info dnsmasq[18618]: read /etc/hosts.dnsmasq - 46 names

May 10 15:57:23 Router daemon.info udhcpc[23413]: sending renew to server 100.64.0.1
May 10 15:57:23 Router daemon.info udhcpc[23413]: lease of 100.75.168.186 obtained from 100.64.0.1, lease time 300
May 10 15:57:23 Router daemon.info dnsmasq[18618]: exiting on receipt of SIGTERM
May 10 15:57:24 Router daemon.info dnsmasq[22888]: started, version 2.89 cachesize 8192
May 10 15:57:24 Router daemon.info dnsmasq[22888]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset Tomato-helper auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile
May 10 15:57:24 Router daemon.info dnsmasq[22888]: asynchronous logging enabled, queue limit is 5 messages
May 10 15:57:24 Router daemon.info dnsmasq-dhcp[22888]: DHCP, IP range 10.0.0.200 -- 10.0.0.229, lease time 7d
May 10 15:57:24 Router daemon.info dnsmasq[22888]: using only locally-known addresses for _dns.resolver.arpa
May 10 15:57:24 Router daemon.info dnsmasq[22888]: using only locally-known addresses for use-application-dns.net
May 10 15:57:24 Router daemon.info dnsmasq[22888]: reading /etc/resolv.dnsmasq
May 10 15:57:24 Router daemon.info dnsmasq[22888]: using nameserver 208.67.222.222#53
May 10 15:57:24 Router daemon.info dnsmasq[22888]: using nameserver 208.67.220.220#53
May 10 15:57:24 Router daemon.info dnsmasq[22888]: using only locally-known addresses for _dns.resolver.arpa
May 10 15:57:24 Router daemon.info dnsmasq[22888]: using only locally-known addresses for use-application-dns.net
May 10 15:57:24 Router daemon.info dnsmasq[22888]: read /etc/hosts - 6 names
May 10 15:57:24 Router daemon.info dnsmasq[22888]: read /etc/hosts.dnsmasq - 46 names

May 10 15:59:54 Router daemon.info udhcpc[23413]: sending renew to server 100.64.0.1
May 10 15:59:54 Router daemon.info udhcpc[23413]: lease of 100.75.168.186 obtained from 100.64.0.1, lease time 300
May 10 15:59:54 Router daemon.info dnsmasq[22888]: exiting on receipt of SIGTERM
May 10 15:59:54 Router daemon.info dnsmasq[27115]: started, version 2.89 cachesize 8192
May 10 15:59:54 Router daemon.info dnsmasq[27115]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset Tomato-helper auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile
May 10 15:59:54 Router daemon.info dnsmasq[27115]: asynchronous logging enabled, queue limit is 5 messages
May 10 15:59:54 Router daemon.info dnsmasq-dhcp[27115]: DHCP, IP range 10.0.0.200 -- 10.0.0.229, lease time 7d
May 10 15:59:54 Router daemon.info dnsmasq[27115]: using only locally-known addresses for _dns.resolver.arpa
May 10 15:59:54 Router daemon.info dnsmasq[27115]: using only locally-known addresses for use-application-dns.net
May 10 15:59:54 Router daemon.info dnsmasq[27115]: reading /etc/resolv.dnsmasq
May 10 15:59:54 Router daemon.info dnsmasq[27115]: using nameserver 208.67.222.222#53
May 10 15:59:54 Router daemon.info dnsmasq[27115]: using nameserver 208.67.220.220#53
May 10 15:59:54 Router daemon.info dnsmasq[27115]: using only locally-known addresses for _dns.resolver.arpa
May 10 15:59:54 Router daemon.info dnsmasq[27115]: using only locally-known addresses for use-application-dns.net
May 10 15:59:54 Router daemon.info dnsmasq[27115]: read /etc/hosts - 6 names
May 10 15:59:54 Router daemon.info dnsmasq[27115]: read /etc/hosts.dnsmasq - 46 names
May 10 16:00:00 Router syslog.info root: -- MARK --
May 10 16:02:24 Router daemon.info udhcpc[23413]: sending renew to server 100.64.0.1
May 10 16:02:24 Router daemon.info udhcpc[23413]: lease of 100.75.168.186 obtained from 100.64.0.1, lease time 300
May 10 16:02:24 Router daemon.info dnsmasq[27115]: exiting on receipt of SIGTERM
May 10 16:02:24 Router daemon.info dnsmasq[31340]: started, version 2.89 cachesize 8192
May 10 16:02:24 Router daemon.info dnsmasq[31340]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset Tomato-helper auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile
May 10 16:02:24 Router daemon.info dnsmasq[31340]: asynchronous logging enabled, queue limit is 5 messages
May 10 16:02:24 Router daemon.info dnsmasq-dhcp[31340]: DHCP, IP range 10.0.0.200 -- 10.0.0.229, lease time 7d
May 10 16:02:24 Router daemon.info dnsmasq[31340]: using only locally-known addresses for _dns.resolver.arpa
May 10 16:02:24 Router daemon.info dnsmasq[31340]: using only locally-known addresses for use-application-dns.net
May 10 16:02:24 Router daemon.info dnsmasq[31340]: reading /etc/resolv.dnsmasq
May 10 16:02:24 Router daemon.info dnsmasq[31340]: using nameserver 208.67.222.222#53
May 10 16:02:24 Router daemon.info dnsmasq[31340]: using nameserver 208.67.220.220#53
May 10 16:02:24 Router daemon.info dnsmasq[31340]: using only locally-known addresses for _dns.resolver.arpa
May 10 16:02:24 Router daemon.info dnsmasq[31340]: using only locally-known addresses for use-application-dns.net
May 10 16:02:24 Router daemon.info dnsmasq[31340]: read /etc/hosts - 6 names
May 10 16:02:24 Router daemon.info dnsmasq[31340]: read /etc/hosts.dnsmasq - 46 names

May 10 16:04:54 Router daemon.info udhcpc[23413]: sending renew to server 100.64.0.1
May 10 16:04:54 Router daemon.info udhcpc[23413]: lease of 100.75.168.186 obtained from 100.64.0.1, lease time 300
May 10 16:04:54 Router daemon.info dnsmasq[31340]: exiting on receipt of SIGTERM
May 10 16:04:54 Router daemon.info dnsmasq[3064]: started, version 2.89 cachesize 8192
May 10 16:04:54 Router daemon.info dnsmasq[3064]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset Tomato-helper auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile
May 10 16:04:54 Router daemon.info dnsmasq[3064]: asynchronous logging enabled, queue limit is 5 messages
May 10 16:04:54 Router daemon.info dnsmasq-dhcp[3064]: DHCP, IP range 10.0.0.200 -- 10.0.0.229, lease time 7d
May 10 16:04:54 Router daemon.info dnsmasq[3064]: using only locally-known addresses for _dns.resolver.arpa
May 10 16:04:54 Router daemon.info dnsmasq[3064]: using only locally-known addresses for use-application-dns.net
May 10 16:04:54 Router daemon.info dnsmasq[3064]: reading /etc/resolv.dnsmasq
May 10 16:04:54 Router daemon.info dnsmasq[3064]: using nameserver 208.67.222.222#53
May 10 16:04:54 Router daemon.info dnsmasq[3064]: using nameserver 208.67.220.220#53
May 10 16:04:54 Router daemon.info dnsmasq[3064]: using only locally-known addresses for _dns.resolver.arpa
May 10 16:04:54 Router daemon.info dnsmasq[3064]: using only locally-known addresses for use-application-dns.net
May 10 16:04:54 Router daemon.info dnsmasq[3064]: read /etc/hosts - 6 names
May 10 16:04:54 Router daemon.info dnsmasq[3064]: read /etc/hosts.dnsmasq - 46 names

In another attempt to workaround this issue, I have tried using these options in the DHCPC Options box: "-l 86400" and "-x lease:86400". The former took it offline, and the latter didn't seem to change anything. Neither resulted in the desired behavior of increasing the lease time to 24 hours. There is a thread on the forum about this issue: DNSmasq restarting every 2.5 mins (150 seconds) - just started happening overnight

Comments (8)

  1. Techie007 reporter

    Oh sorry, I created this issue two months ago and it just got published a couple days ago. In the meantime, Sean helped me enough to figure out what was happening (before becoming unhelpful), and I have since fixed the bug and compiled it myself. I am in the process of learning how to properly do a commit; and since the issue causing this bug occurs in another function too, writing a bugfix for that also and testing it (which I can only do on a certain customer’s site).

    I am curious to know what you came up with though…

  2. Techie007 reporter

    tmp2—yep, you found the bug and fixed it. When I pointed the memory sharing issue out in private messaging (so wish I had kept the conversation public in hindsight, so that more experienced developers like you and Pedro could’ve seen our discussion and chimed in!), Sean was adamant that this was impossible because strlcat_r “is a reentrant function” and tmp “is merely a buffer, not the storing variable for the data”, and staunchly refused to compile a build containing my “blatantly incorrect fix”. I believe the original code may have worked at one time, but stopped working due to a change/update in the compiler: I discovered that trying to duplicate this issue in a debugger is unfruitful because the debugger’s compiler evaluates the parameters to the nvram_set function right to left (allowing the code to work) instead of left to right as the compiler for the router does (which causes the “name to set” in tmp being overwritten with “name to read” by the time it’s needed).

    I have three hesitations with changing the restart logic itself to resolve this issue:

    1. The underlying bug needs to be fully understood and fixed, not have a bandaid slapped over it (e.g. by hiding the visible dnsmasq restarts, while leaving the route change detection broken and constantly refreshing). This bug exists in the first place because some developer didn’t comprehend exactly how the functions interacted. What other code did they write? By understanding this, we may find other bugs to fix. For example, Sean just wanted to comment out line 307. You correctly identified and resolved the underlying issue, so this concern is alleviated with your patch.
    2. There may be edge cases where a modem or captive portal redirects DNS requests to itself until Internet access becomes available/is granted. Obviously, these should not be cached; but if for some reason they are (failure on the modem/captive portal to specify a low TTL, or cache time/negative caching overridden by dnsmasq custom config), I think it is probably a good idea to take every available hint that something related to the environment has changed to dump the DNS cache. Simply fixing the (broken) route detection accomplishes this; dnsmasq is restarted once when Starlink establishes a connection, and that’s it; the constant 2.5 minute restarts no longer occur, and route change detection never triggers unless I manually tweak it by unsetting wan_routes1 (to which it responds by running once and restarting dnsmasq upon the next lease renewal).
    3. Is it possible that something with IPv6 could change (requiring a dnsmasq restart), and we’re relying on an IPv4 route change as a “catch all” to trigger this restart?

  3. M_ars

    using tmp2 here is the safer way (experience yes… 🙃 - i had a similar case/issue somewhere else in the past).

    adjusted also ppp.c file now, to make sure.

    IPv6 is a different and/or parallel path and could restart dnsmasq any time if the IPv6/Prefix changes for example. If only IPv6 DNS changed, we do not restart dnsmasq and use function “dns_to_resolv()”.

    → i applied this logic now also to IPv4

    removing

    changed |= routes_changed;
    

    If IP/Gateway/Netmask changes are detected → we will call function bound() (again)

    My suggestion would be to use the new “changed_dns”. If you/we find problems because of this we know what to change

    I think we can close this issue 🍻

    Looking forward to see some more fixes/improvements from your side, you are a developer in my opinion 😎 👍

  4. Log in to comment