SBS ElementTree error

Issue #125 resolved
Trade Surplus created an issue

I’ve been getting the following error recently with SBS downloads:

File "~/.local/bin/webdl/./grabber.py", line 62, in <module>
main()
File "~/.local/bin/webdl/./grabber.py", line 55, in main
if not n.download():
File "~/.local/bin/webdl/sbs.py", line 38, in download
hls_url = self.get_hls_url(release_url)
File "~/.local/bin/webdl/sbs.py", line 47, in get_hls_url
video = doc.xpath("//smil:video", namespaces=NS)
File "src/lxml/etree.pyx", line 2309, in lxml.etree._ElementTree.xpath
File "src/lxml/etree.pyx", line 1887, in lxml.etree._ElementTree._assertHasRoot
AssertionError: ElementTree not initialized, missing root

Is anyone else experiencing this?

Comments (34)

  1. Fred

    SBS changed something so now waiting for someone with python skills to publish a patch. Don’t think the original developer is around any more.

  2. Jim Thross

    (+1) The crash seems to be caused by some downloads only (ie some vids will still download ok, others wont). Sorry Im not a Python programmer, but here is a start. The lxml library (see https://lxml.de/) crashes at line 1887. Reading the etree.pyx source reveals: “This can happen if ElementTree() is called without any argument and the caller 'forgets' to call parse() afterwards, so this is a bug in the caller program.”

  3. Trade Surplus reporter

    That’s interesting. Would you give me an example of a video you were able to successfully download please. Initially I thought the same thing but I wasn’t able to replicate it.

  4. Jim Thross

    Sure Trade Surplus. Just tested and working: SBS/Genre/Factual/ Cryptoland S1 Ep1 - Where Are The Earliest Bitcoin Investors Now? (Seemed appropriate for your alias ;-)

    A broken example, just tested: SBS/Genre/Factual/ Chernobyl: The New Evidence S1 Ep1 - Situation Critical

  5. Trade Surplus reporter

    Excellent. Thanks. The sbs.py file uses the SBS API to download JSON files for each video. In those JSON files are links that are supposed to provide data in SMIL format that can be navigated to find the source videos. In the Cryptoland example the SMIL data is there but in the Chernobyl example a file in a completely different format is provided and the sbs.py file can’t do anything with it. Why those links provide completely different data, I have no idea.

  6. Fred

    From what I’ve seen, existing shows before a certain point in time still use the old format. It’s the new shows which break. Care to share how you identified and downloaded the json files?

  7. Trade Surplus reporter

    Sure, but first I may have a workaround. Each JSON file contains a link (in SMIL format) to a video source that seems to be designed for mobiles. In the sbs.py file, if you change the line

    release_url = player_params["releaseUrls"]["html"]

    to

    release_url = player_params["releaseUrls"]["htmlandroid"]

    the resulting download has the same resolution (1280x720) but the video kbps rate is lower and so the file size is smaller. I’d appreciate it if people would test whether this works for their busted shows.

  8. Fred

    The workaround is usable, but is of course of lesser quality.

    Of interest is that the video bitrate is exactly the same as that for yt-dlp (which I’ve been using in the interim) at about 1500 kbps. In fact it will save files in exactly the same file size as yt-dlp.

    This compares to the 2000 kbps (approx) for previous downloads for the same show (Letters and Numbers, which is reasonably uniform in hosted content and so is a good candidate for comparing before and after the change).

    Note that to my mind it simply means that yt-dlp is also prevented from downloading at the best available bitrate, not that yt-dlp is a viable alternative to your workaround.

    (Edit: Windows says the new downloads are about 800 kbps/1500 kbps respectively, not 1500 and 2000 as VLC previously indicated).

  9. Jim Thross

    Working for me too - thanks Trade Surplus. Well done.

    An explanation of how you captured the JSON / SMIL data would indeed be appreciated.

  10. foobar

    I’m getting identical (approx 2000kbps) versions from player_params → releaseUrls → htmlandroid (via webdl) and player_params → releaseUrls → html (via yt-dlp) for Chernobyl (by identical I mean same byte size, same codecs, frame rates and bitrates about 2000kbps per VLC and ffmpeg) - there is a trivial difference in header info. For Crypto I tried html + htmlandroid for both webdl and yt-dlp - same story all 4 are about 2000kpbs. I checked out some downloads from before the problem and they are also about 2000kbps.

    I am not a python native, but this could be fixed by a try/except to grab html > to htmlandroid ? - or just use yt-dlp to handle the downloading - or take a closer look at the rest of player_params for clues. From my end I don’t have the spare time + @Trade Surplus soln has no downside for me ?

    ffmpeg 5.1.2 python 3.11 and streamlink 1.7.0 yt-dlp 2023.03.04

  11. Trade Surplus reporter

    There's a lot of discussion about the reduction of SBS video quality on Whirlpool. As mentioned by Fred, users of apps like yt-dlp are also experiencing the same issue. It seems that SBS has changed their streaming method to two files that get combined after downloading rather than one file that contains everything. This may be a permanent state of affairs and it may be that higher bitrate versions of new shows are no longer available. I think I'll run a test every week or so to check whether they revert to their previous streaming method, but if they don't, the change I suggested above will have to become permanent.

    As far as JSON/SMIL is concerned, everything you need is in the sbs.py file. You just have to step through the code manually, the same way the computer does. Long story short, the JSON for all available shows is here:

    https://www.sbs.com.au/api/video_feed/f/Bgtm9B/sbs-section-programs/ -> constant variables defined in sbs.py

    The JSON for Cryptoland is here:

    https://www.sbs.com.au/api/video_pdkvars/id/2102748227651?form=json -> constant variables defined in sbs.py plus I got the id by playing the video at the SBS ondemand site

    The JSON for Chernobyl is here:

    https://www.sbs.com.au/api/video_pdkvars/id/2019334723997?form=json

    In a browser, scroll down the JSON till you get to a paragraph called "releaseUrls" and you will find links to the source videos. The "htmlandroid" option is now the only one that works reliably. Previously, sbs.py used the "html" option.

  12. Fred

    Looks like they further broke SBS downloads the last couple of days.

    Now getting 404 errors with yt-dlp, and, while the video can still download using webdl with the htmlandroid workaround) the side-project wrapper webdl_srt (https://bitbucket.org/temper8ur/webdl_srt/src/master/) can now only pull truncated caption files.

    They really don’t like downloaders, do they?

  13. Trade Surplus reporter

    My fear is that they’ll change their streaming tech to something like what Seven or Nine use and which, for whatever reason, this app hasn’t been programmed to download from.

  14. Trade Surplus reporter

    I tested sbs.py with the “html” option and got a different result. Instead of the above ElementTree error, it now tries to resolve a URL and fails. Tried manually, the URL returns an Access Denied “You don't have permission to access … on this server.” page. Not sure whether this is progress or not.

  15. Trade Surplus reporter

    Since the issues encountered by webdl clearly aren’t the result of a problem with the code but rather at the SBS end, I’m going to resolve this issue.

  16. Larry

    Hi Trade Surplus,

    Would you be able to reopen this? I understand what you are saying, but if webdl can't download SBS videos, it is not working properly and we need to get it updated. I'm hoping to have a bit of free time to go through youtube-dl and work out what needs to be done to get the code working again and will post here then.

  17. Larry

    (@delx - If it is you rather than Atlassian blocking my access, could you reinstate my access?)

    And a second round - I've got a hack fix for this, but Atlassian or delx is preventing me opening issues, so the fix I've done in sbs.py is:

    Add an API v3 constant at line 12:

    # Quick and dirty fix for SBS v3 api
    API_V3_URL = BASE + "/api/v3/video_smil?id="
    # end fix 
    

    And replace line 35 (hls_url = self.get_hls_url(release_url)) with the following block:

            # Quick and dirty fix for SBS v3 api
            # Try v3 api first
            hls_url = self.get_hls_url(API_V3_URL + self.video_id)        
            if not hls_url:
                # Fall back to v2 api. I suspect this is already redundant.
                hls_url = self.get_hls_url(release_url)
            # end fix 
    

    This seems to working properly now.

  18. Fred

    Thanks for the patch, Larry. Unfortunately it looks like SBS may have permanently decreased maximum bitrates, meaning crap video quality on the desktop. They’re probably targeting mobile devices instead of PCs now.

    So far my test downloads using your patch resulted in exactly the same file size as those done using the “htmlandroid” workaround suggested by Trade Surplus. Adding insult to injury, the webdl_fu add-on (for caption downloads) hasn’t been patched to fix the occasional truncated srt, and probably won’t ever be. The webdl_fu page is no longer accessible.

  19. Larry

    Comments on whirlpool suggest that bitrate is in flux at the moment. Hopefully they'll revert to higher rates in the long term.

    I would not rely on the htmlandroid working fix working in the long term - I suspect that will last as long as it takes SBS to roll out a new client using the new API (I did some checks and it looks as though the new api works for both new and old videos).

  20. Fred

    Thanks for the update, Larry. I’m staying with your sbs.py patch, and have also hacked webdl_fu to wrap around it, and so far autograbber-cron.sh is working (as well as can be expected given the SBS-imposed limitations).

  21. Trade Surplus reporter
    • changed status to open

    I've reopened this at Larry's request so that he can propose a long-term fix to the problem.

  22. Trade Surplus reporter

    My ability to create issues has been removed as well. Just to let you know, this account was subject to an issue-spamming incident a few weeks ago. A dozen or more bogus issues were raised which clearly have now been deleted. Perhaps the solution to that problem was to remove the ability for anyone to raise new issues.

  23. Larry

    this account was subject to an issue-spamming incident a few weeks ago.

    I figured this was the case - if delx is reading comments, I guess they'll respond at some point.

  24. Trade Surplus reporter

    Thanks for the code fix Larry. Do you think it’s worth doing a pull request for it? I’m happy to do that if you think so.

  25. Larry

    Do you think it’s worth doing a pull request for it?

    My fix is a quick hack based on work done on the yt-dlp codebase. It may or may not have long term legs^. I'd suggest we give delx time to come back to this, and if it's still idle in few months time, we can fork a longer term version maintained by active users.

    ^ In particular, given the other things happening on the front end, I suspect SBS may be updating their catalogue format - which would not be a bad thing (hopefully smaller and faster), but would require rework of other parts of sbs.py.

  26. Trade Surplus reporter

    I’ve only used yt-dlp to download YouTube videos. Is there a reason to stick with webdl if yt-dlp can handle SBS as well?

  27. Larry

    Is there a reason to stick with webdl if yt-dlp can handle SBS as well?

    You'd need to wrap a batch function around it - I like webdl's catalogue function. And it also does ABC as well!

    (I've been running it largely fire and forget for a lot of years now - it picks up whatever we aren't interested in and I postprocess for later viewing in plex).

  28. Fred

    Some more reasons:

    Subtitle download support (if using webdl_fu)
    Download history is saved (autograbber)
    Save to predetermined folders (autograbber)
    Range downloading
    Wildcard support
    Exclusion support

    And probably some other features I can’t remember.

  29. Trade Surplus reporter

    It’s good to see there’s still interest in it. It’s a great piece of software that I use every day. I’ll set myself a reminder to check back in July. Hopefully we’ll have a better idea of what SBS is doing with their streaming by then as well.

  30. delx repo owner

    Sorry for the slow reply, life has been busy lately. Thanks for this fix, I’ve committed it :-)

  31. Fred

    Just got the above message as an email, and this jogged my memory.

    Regarding the webdl_fu truncated caption issue: Truncated subtitle files occur because the _fu code is unable to read some characters, notably pound symbols (£). It will just drop the ball as soon as it encounters such symbols. For those interested, my workaround is now to use yt-dlp to save failing captions in DFXP format, then manually convert them to SRT. Below is an example using an Alone Australia URL, identified as per the notes above:

    yt-dlp --ignore-config -v --write-sub --sub-lang en --sub-format dfxp --skip-download "https://www.sbs.com.au/api/v3/video_smil?context=tv&id=2204846147868"

  32. Log in to comment