T2/T4 TCP Coalescing Issues with VLC and Samba under Windows

Issue #402 resolved
Wade Coxon created an issue

Linux Kernel 3.14 introduced TCP coalescing features in order to reduce the number of small packets transmitted in some circumstances.

This new feature appears to cause issues with playing .ts files from a Windows 7 PC running VLC via a Samba network connection. Other file formats/codecs appear to be affected, but no solid pattern was observed. (high bitrate may be a factor).

This feature also causes a SSH connection to the T2 or T4 to "lag" noticably by 100ms or so.

Disabling TCP coalescing in the kernel appears to solve the issue.

This can be done by setting the following sysctl parameter: sysctl net.ipv4.tcp_autocorking=0

I propose that this line be added to: /etc/sysctl.conf

Comments (18)

  1. Peter Urbanec

    I'm having trouble noticing any difference with tcp_autocorking enabled or disabled.

    My setup is fairly simple. T4 <-----> dumb GigE switch <-----> Linux workstation with GigE

    I see no lag in ssh sessions.

  2. Wade Coxon reporter

    The lag is noticable on my setup, on two different Window 7 PCs, though my switch is managed. The slight response delay may not be apparent unless you have a T3 session running in parallel.

    If I execute something as simple as 'ls', the T4 will stop for 100ms or so then produce a result, whereas the T3 delivers the result immediately. With corking off, T4 responds as quickly as the T3.

    I just noted the SSH lag, not because it was a problem as such, but just another noticable effect in addition to the actual problem at hand.

    Of course the real benefit is that VLC under Windows 7 can now play .ts files via Samba. The issue only affects the current public release 2.2.1 of VLC. The latest v3 nightly does not suffer from this issue.

    When I have a moment, I will re-test with my Windows PC connected back-to-back to the T4 to rule out all potential network interference.

  3. Wade Coxon reporter

    Wireshark log of an SSH session to: T3 on 192.168.0.80 T4 on 192.168.0.180 from: PC on 192.168.0.81

    Executed 'ls<enter>' on T4 first, then 'ls<enter>' on T3 second. Packets 30-39 are the command being sent to the T4. Packet 44, containing the response arrives 200ms later.

    Packets 72-80 are the command being sent to the T3. Packets 81 and 83, containing the response arrive 9ms later.

  4. Peter Urbanec

    I'm not doubting that the issue is real. I would like to be able to reproduce and observe the behaviour to understand it better. I'd also like to be able to observe the effectiveness of the proposed fix.

    From all the reading I have done on the uncorking feature and performance regressions, the fix is usually a fix for the application, which almost always results in a performance improvement at the end. I'm not all that keen on disabling uncorking as a shipping default, unless the problem is widespread and it is the only option.

  5. Peter Urbanec

    BTW: Text copies of packet capture are not very useful. A binary capture that I can load into wireshark is more helpful.

  6. Wade Coxon reporter

    Apologies, my bad. I originally intended to capture something that would be quick to copy and paste in-line into the post just to illustrate the observed 200ms delay, but ended up attaching the file instead.

    When I am home tonight I can re-capture as binary on both T3, and T4, then I can switch off autocorking and re-test on the T4.

    I can re-run the tests from a PC running Fedora 22 while directly attached to the T4. If you think that would be valuable.

    I'm fairly confident that you'd be able to see similar packet timing if you were to perform the same test with Wireshark, as it looks like quite normal behaviour for that new autocorking feature to exhibit It's just that it throws VLC for a loop.

    The VLC issue does seem to specifically be limited to: Windows (7, I don't have 8 to test) Samba VLC 2.1.x (even latest nightlies of the 2.x series suffer the same issue) .ts files, and certain other (but not all other) media files.

    While the conditions are quite specific, I fear that the configuration would be common among end-users. It would be interesting to see if anyone else sees it now that the T4 is in the wild.

  7. Wade Coxon reporter

    New capture from my Windows PC attached.

    T3 first.

    "ls<enter>", <enter> is pressed at packet 168, response is at 170, 30ms later.

    T4 next.

    "ls<enter>", <enter> is pressed at packet 220, response is at 224, 200ms later.

    Then I turned off corking on the T4.

    "ls<enter>", <enter> is pressed at packet 917, response is at 224, 40ms later.

  8. Wade Coxon reporter

    Well ok, this is very interesting. SSH from the Fedora PC doesn't trigger corking! Perhaps that's why you aren't seeing the lag on your Linux PC.

    T3 first again:

    pressed <enter> at packet 15, response received at 18, 15ms later.

    T4 next with autocorking re-enabled:

    pressed <enter> at packet 37, response received at 40, 3ms(!!) later.

    Note that the T4's response came back in two packets rather than the one as the Windows PC was getting with autocorking enabled.

    I re-ran the test from Windows to verify that I was indeed still getting the large return packets at 200ms.

    I should note also, that the Fedora 22 PC was plugged into the same Gig switch as the Windows PC, however, it is only equipped with a 100Mbps NIC. Conditions were otherwise the same.

  9. Peter Urbanec

    Can you see if changing the following line in your /etc/samba/smb.conf improves Samba performance?

    socket options = IPTOS_LOWDELAY TCP_NODELAY
    

    As far as the ssh packet capture is concerned, the two things that I can see, that could make a difference is that the Linux machine uses TCP timestamps and has a small (251 byte) TCP windows size. The Windows machine does not have timestamping enabled and uses a 16kB TCP window.

    However, I don't think you are reading the packet trace correctly. As far as I can see, the Windows client sends packet 220 and 1ms later receives a response from the T4 in packet 221. Then, 198ms later, your client sends packet 223 and receives a response from the T4 in packet 224 less than 1ms later.

    Could it be a poor ssh client implementation? Do you get latency issues with plain telnet?

  10. Wade Coxon reporter

    Ahh, I did have mine filtered to SSH only, and was not seeing the TCP ACKs. That is interesting timing there.

    I can try the Samba tweak later tonight and see how I go with it.

    It could possibly be the SSH client, but I am using up-to-date versions of Putty on both Windows and Fedora, so I don't know much of the underlying code to be the same.

    I haven't tried telnet but will do.

    I wonder why the T4 would be waiting for an ACK in one instance and not the other though, as I can see that with corking turned off, it doesn't seem to wait for that ACK.

  11. Wade Coxon reporter

    More interesting results.

    The Samba config change made no difference (I just rebooted the T4 after making the config change and before testing just to make sure).

    Telnet to the T4 has the same 200ms delay, but interestingly, the T3 also shows the same delay with telnet. I tried with both Putty and the Windows telnet client.

    I think that the telnet/SSH delay might be a red herring then. It just happens to go away when autocorking is turned off.

    In the understanding that autocorking is a desirable feature to have turned on. Should autocorking end up being the only solution, would it be a possibility to add it as a user-configurable parameter in the GUI under Network Setup, along with WoL etc?

    Perhaps if it comes to that, unless there is a Samba fix, it is just not worth fixing at all just for the sake of a particular application and platform.

  12. Peter Urbanec

    Let's assume for the moment that the latency you are seeing with telnet/ssh is a separate issue.

    Is VLC over Samba doing something silly, such as trying to do I/O in transport stream packet sized chunks? That would look like I/O in 188 byte chunks, or thereabouts. Probably about as inefficient as you can get.

    I'm not a great fan of making system wide changes to address an issue in a specific application. I would prefer to leave the kernel at it's default and solve the issue closer to the problem source. At this point, I'm still unable to observe the issue, so it's hard to tell where things are going wrong. Perhaps a packet capture of VLC attempting playback via Samba, together with annotations of the symptoms would help. Preferably done in such a way that I can correlate the network I/O packets to specific symptoms.

  13. Wade Coxon reporter

    No worries, I have attached some captures here, but I am at a loss to analyse these in any detail.

    The files are:

    Play VLC ts files corking On.pcapng

    Play VLC ts file corking Off.pcapng

    Play VLC non-ts file corking On.pcapng

    I have uploaded them to Google Drive as bitbucket choked on the 34MB capture file and compression didn't reduce the file size all that much:

    https://drive.google.com/open?id=0Bw0F3epTsY5JfmwxRG5BZ29RVVhSeDQ5bDJQV2xOT0pWWmVJem0tNFllRFZZSnYwRTk4aFU

    I captured these in the following fashion:

    Browsed to the T4's shared folder in Windows explorer and highlighted the recording file for "Every Which Way but Loose".

    Started Wireshark capture.

    Waited two seconds, then double clicked on the recording to launch it in VLC (default file association).

    Attempted to play file for 10s.

    Closed VLC.

    In the attempt with autocorking turned on, VLC does not close cleanly and was left resident as a hung process, so I waited another 10s, then killed the process manually from task manager.

    You can see in the autocorking off example, that VLC seems to settle into streaming at around packet 4677. Packets appear to have a size of 1488.

    The autocorking on example never settles in that same pattern. It looks like I close VLC at about packet 4428, as the pattern suddenly changes at that point.

    The last example was with a non .ts file (Star Wars The Force Awakens Official Teaser.mp4) with autocorking enabled. This played ok, but I can see from the capture that the packets were a lot larger.

  14. Peter Urbanec

    It seems like you only posted the link to Play VLC non-ts file corking On.pcapng, which is probably the least interesting file of the three captures.

  15. Log in to comment