actual trunk failover between WAN-Port (wan0) and LAN1 (wan1) ist not working, still load balancing

Issue #299 new
TheHiman created an issue

Hi.

I have a static WAN-IP-Setup on the regulary WAN-Port (wan0) and an extra WAN1 on LAN1-Port configured.
The Weight ist on booth interfaces set to 0, which means Failover. As i understand the first WAN (0) ist then the prefered one.
I have keep-alive tests setted/tested at 30 or 60 secods. switching from traceroute and ping, and there is still a load balancing
active. Booth Links are fully working and the destiantions google/1.1.1.1 are always stabile reachable on booth links.

Right after cold start the first link is used and 1-2 minutes later the most traffic goes out on WAN1, randomly back to WAN0,
but mostly it is still balanced.

Can someone please check, why there is still some type of load balacing ins real failover only ?

I asume a problem when booth interfaces are set to 0 (failover), this is actualy not working on a ARM EA6700 device.

Comments (4)

  1. TheHiman reporter

    I forget to mention, that i have CTF enabled, because of higher troughput.

    Further more, not sure is this still true: Read from the Documentation. that Failover should? ONLY work with DHCP & others, but NOT when booth interfaces are staticly
    configured? But here comes the Problem: When the Router in Front send usefull dhcp (instead static config with same interface values), there is still no information,
    if the Front-End-Router itself has an outside connection, because the received IP-Address is an “Transfer-Net”, not the direct outside connection.
    There will never be a “0.0.0.0” or “Front-End-Router is Down” as criteria of an dead interface. Which means: The Transfer Network is always there, no matter if
    received via DHCP(internal) or configured static including next hop gateway for each WAN-Port.

    Maybe there is room to check the Failover-Routine to be usefull for full static configuration which ignore the configured static ip from the interface and just do source based routing through this network and try to ping/traceroute an outside destination via this interface ?

  2. TheHiman reporter

    Here is a further info how my test-config is made:

    regulary WAN: static 192.168.11.2/24, Gateway: 1 (VLAN 2, untagged, default)
    WAN1: used LAN-Port 1, static 192.168.12.2/24, Gateway: 1 (VLAN 12, untagged, only this port is added.)

    WAN(0) = Carrier Grade NAT for ipv4, ipv6 full routed from ISP (FTTH)
    WAN(1) = Dual Stack v4/v6 from ISP (VDSL2)

    Using traceroute als WAN-Check is not really realiabe, for WAN0/CGN so i switched to PING for more testing.
    So overcome evtenually DNS-issues i use the ip addresses 8.8.8.8 and 1.1.1.1 for checking.

    CTF is enabled, because of a 1GE fast WAN-connection on the WAN0-Port!

    The Weight is on booth WAN-Ports “0” - which should disabled any type of load balancing, but it is still
    not really working. Balanncing is always active. The most traffic goes straight to WAN1 (the slow 2nd. ISP-Connection (for whatever Reason…)

    The Info/Overview Website shows booth links as full working. So there must be a problem with the Routing-Table or the used Default-Route ?
    The WAN1 (VDSL2) should ONLY be used, when the v4/CGN connection is really broken or the ISP has Routing issues, but the Traffic
    goes after 2-3 Minutes after a cold boot only via WAN1 out, The WAN0-Link is still working, and checked, but is no longer used as
    primary default gateway.

  3. TheHiman reporter

    Experimenting with different Check-Times, like 30/60/120 seconds doesn´t change anything.
    manually pinging 8.8.8.8 and 1.1.1.1 to test booth ports in the shell shows clearly, that
    booth wan ports are working, and there are no issues with manual testing.

    Eventualy the Watchdog had some issues when one of the link is a CGN instead real v4 at the front end router ?
    Using traceroute on a CGN connections works anyway when no dns names are resolved and produces
    often blockings on the first hops on the ISP side, so ping-method in such mixed scenario is much more reliable.
    Usualy when traceroute is often repeated here, some first ISP-routers are delaying or blocking the entire trace
    and a false positive on reaching destination points can happen. PING instead is always working.

  4. Log in to comment