dnsmasq - possible bug / situation where AAAA record does not return

Issue #54 new
dreamcat4 NA created an issue

Hello there!

Hopefully this is the correct place to start a conversation about this matter. It is just an information about a possible bug in dnsmasq that you can benefit to know of. Or for others to search / find and google for.

Common Problems:

  • Perhaps DNSMasq does not always handle AAAA records properly. It will receive the request, then it will try attempt to contact upstream servers, to pass the AAAA request on. And that works, under normal circumstances
  • However the thing it will not always do is to answer the AAAA request if there is an entry in the statid DHCP for that name.
  • DNSMasq upstream does not seem to fix this issue for many years.
  • Client computer has many softwares in it that sent AAAA request. It cannot be stopped.
  • Generally the symptom is a client side timeout of either 5 seconds, 10 seconds, 15 seconds, or 20 seconds (5s 10s 15s 20s).
  • And ‘no’, setting options such as single-request-reopen in the client resolver does not solve this issue! And the client software cannot be stopped from sending the AAAA request either!
  • Just to be clear: The normal 'A' record is returned immediately and is fine. But the client requests both `A` and `AAAA`. And it is this 2nd AAAA type of DNS request that is never replied to by dnsmasq. So we must put aaaa at the end of our dig command to replicate / reproduce this issue!
  • The AAAA is the new standard / new dns request type has been introduced in last few years, for future IPv6 compatibility purposes and is intended to eventually replace the older traditional ‘type A’ entirely. This is now in most (all?) modern linux glibc / cannot be switched off in the client. And although it is meant for IPv6 usage, its request is also sent over IPv4 too. So switching off IPv6 everywhere (I already tried!) has no effect / cannot help us.

This combination of problem is result in unnecessary wait on the client side, for anything up to 15 or 20 seconds. Just for no reason at all.

So how do you check if this bug occur?

I have researched this issue thoroughly, and it seems the issue may occur if:

  • You put a dot . in the Static DHCP name of 'myhost' (whatever the first part of the name of the client device on the lan, for example myhost)
  • Do not declare that domain as a LAN domain (whatever the part after the dot ., for example localtld
  • Then run the following command:

On a lan DNS client machine

‌ time dig @192.168.1.1 myhost.localtld aaaa

And also run on that client machine:

‌ tcpdump -nt -i eth0 udp port 53

Where eth0 is the interface to the LAN, that the dns query is sent to the fresh tomato router.

The issue (on my LAN), was to add the following line(s) to the textbox. Where to declare the optional free text for `dnsmasq.conf` append fragment (in tomato, advanded → dns) page

‌ domain=localtld

In addition to the existing other local domains (such as .local and .lan etc)

Then dnsmasq knows it is a local domain, under it’s own control. And will return a response. It is probably to do with the way dnsmasq parses and processes those entries that are generated by the static DHCP page. Lines of the dnsmasq.config such as like this:

‌ dhcp-host=9C:04:04:03:02:01,192.168.0.3,infinite

and in this other file dnsmasq/hosts/hosts:

‌ 192.168.0.3 myhost.localtld

It is the same client dhcp device entry, spread across 2 different files.

The correct solution would probably be to tell upstream, as it seems like a bug / buggy behaviour. If they have not fixed it yet. Even despite if it is following RFC specs. Because the . dot separator is being permitted as an allowed character in the hostname field. But is not rejected or otherwise not parsed / recognized as belonging to the same domain.

So basically I did this thinking it would be interpreted as a 3rd level TLD. Like myhost.localtld.lan so that localtld is not actually a TLD... because it is not. It's under th `.lan TLD, which was already declared by default in the dnsmasq.conf.

Hopefully that is a clear enough report of this bugged behaviour. If you cannot follow then may try the example test case / instructions above. Or ignore it then if you are not being affected. My test system is old, however I do not have the luxury of upgrading my own router firmware just yet. This was found on advanced tomato v3.5-140.

Here is some relevant link(s) for other useful info around this issue. Or you can google ‘aaaa + 5 seconds 10 seconds 15 seconds’ etc. for many very confused reporting about this similar issue.

https://github.com/ovh/overthebox-openwrt/commit/2c46395abf00fd47c14909d5ac844c2e571dcc7a

https://gist.github.com/bearice/7d3dc0e63e003d752622

https://www.mail-archive.com/dnsmasq-discuss@lists.thekelleys.org.uk/msg09079.html

https://www.mail-archive.com/dnsmasq-discuss@lists.thekelleys.org.uk/msg04667.html

https://www.mail-archive.com/dnsmasq-discuss@lists.thekelleys.org.uk/msg08793.html

(quick note for the above link: enabling the dnsmasq option domain-needed did not solve, did not fix this issue)

https://unix.stackexchange.com/questions/290987/resolving-hostname-takes-5-seconds

https://nil.uniza.sk/using-tcpdump-diagnostics-dns-debian/

And finally….

https://serverfault.com/a/705712

This last link might be helpful for fixing DNSmasq. Since it seems to suggest that dnsmasq should reply with an ‘SOA’ authority section only, (and not reply the actual DNS response part, just give the authority section and no information if the DNS query was valid etc).

Perhaps if you guys, the new maintainers of FreshTomato can also agree this. Then it should be raised upstream to the author of the dnsmasq. On thekellys mailinglist. The other option is to prevent the use of the . (period / dot) character in the input. However as I explained perhaps it is supposed to be permitted as the 3rd / 4th level of domains etc? But I am not sure. Because i do not keep up with all the details of those RFCs and standards. It is just what other such experts are saying. I made my workaround! So hopefully this bug report can help others who find the same problems. Kind regards.

Comments (3)

  1. pedro

    Thanks for reporting this issue.

    I understand that you have already reported this bug on their mailing list?

  2. dreamcat4 NA reporter

    Not yet because:

    The version of dnsmasq on my router is older than the newest. Therefore it would be nice to check and confirm that the bug is still there. And maybe also from a different user. So that when we report it upstream, there is a stronger evidence. Also because I am not sure about the correct formatting (valid and permitted input string) for valid local domain name in the file dnsmasq/hosts/hosts. If the dot char '.' is meant to be valid in that configuration setting. To be including . separator for local 3rd level domain. Which is a matter of input validation. I suppose if the file is meant to be following the same convention of the /etc/hosts file then that host declaration syntax (what is valid or not), is something been already well defined.

    Perhaps if we report it to the mailinglist, we can just email them a link referring back here? It would save me having to write it all again, all this details in 2 places.

  3. Log in to comment