SBS ElementTree error
I’ve been getting the following error recently with SBS downloads:
File "~/.local/bin/webdl/./grabber.py", line 62, in <module>
main()
File "~/.local/bin/webdl/./grabber.py", line 55, in main
if not n.download():
File "~/.local/bin/webdl/sbs.py", line 38, in download
hls_url = self.get_hls_url(release_url)
File "~/.local/bin/webdl/sbs.py", line 47, in get_hls_url
video = doc.xpath("//smil:video", namespaces=NS)
File "src/lxml/etree.pyx", line 2309, in lxml.etree._ElementTree.xpath
File "src/lxml/etree.pyx", line 1887, in lxml.etree._ElementTree._assertHasRoot
AssertionError: ElementTree not initialized, missing root
Is anyone else experiencing this?
Comments (34)
-
-
(+1) The crash seems to be caused by some downloads only (ie some vids will still download ok, others wont). Sorry Im not a Python programmer, but here is a start. The lxml library (see https://lxml.de/) crashes at line 1887. Reading the etree.pyx source reveals: “This can happen if ElementTree() is called without any argument and the caller 'forgets' to call parse() afterwards, so this is a bug in the caller program.”
-
reporter That’s interesting. Would you give me an example of a video you were able to successfully download please. Initially I thought the same thing but I wasn’t able to replicate it.
-
Sure Trade Surplus. Just tested and working: SBS/Genre/Factual/ Cryptoland S1 Ep1 - Where Are The Earliest Bitcoin Investors Now? (Seemed appropriate for your alias ;-)
A broken example, just tested: SBS/Genre/Factual/ Chernobyl: The New Evidence S1 Ep1 - Situation Critical
-
reporter Excellent. Thanks. The sbs.py file uses the SBS API to download JSON files for each video. In those JSON files are links that are supposed to provide data in SMIL format that can be navigated to find the source videos. In the Cryptoland example the SMIL data is there but in the Chernobyl example a file in a completely different format is provided and the sbs.py file can’t do anything with it. Why those links provide completely different data, I have no idea.
-
From what I’ve seen, existing shows before a certain point in time still use the old format. It’s the new shows which break. Care to share how you identified and downloaded the json files?
-
reporter Sure, but first I may have a workaround. Each JSON file contains a link (in SMIL format) to a video source that seems to be designed for mobiles. In the sbs.py file, if you change the line
release_url = player_params["releaseUrls"]["html"]
to
release_url = player_params["releaseUrls"]["htmlandroid"]
the resulting download has the same resolution (1280x720) but the video kbps rate is lower and so the file size is smaller. I’d appreciate it if people would test whether this works for their busted shows.
-
The workaround is usable, but is of course of lesser quality.
Of interest is that the video bitrate is exactly the same as that for yt-dlp (which I’ve been using in the interim) at about 1500 kbps. In fact it will save files in exactly the same file size as yt-dlp.
This compares to the 2000 kbps (approx) for previous downloads for the same show (Letters and Numbers, which is reasonably uniform in hosted content and so is a good candidate for comparing before and after the change).
Note that to my mind it simply means that yt-dlp is also prevented from downloading at the best available bitrate, not that yt-dlp is a viable alternative to your workaround.
(Edit: Windows says the new downloads are about 800 kbps/1500 kbps respectively, not 1500 and 2000 as VLC previously indicated).
-
Working for me too - thanks Trade Surplus. Well done.
An explanation of how you captured the JSON / SMIL data would indeed be appreciated.
-
I’m getting identical (approx 2000kbps) versions from player_params → releaseUrls → htmlandroid (via webdl) and player_params → releaseUrls → html (via yt-dlp) for Chernobyl (by identical I mean same byte size, same codecs, frame rates and bitrates about 2000kbps per VLC and ffmpeg) - there is a trivial difference in header info. For Crypto I tried html + htmlandroid for both webdl and yt-dlp - same story all 4 are about 2000kpbs. I checked out some downloads from before the problem and they are also about 2000kbps.
I am not a python native, but this could be fixed by a try/except to grab html > to htmlandroid ? - or just use yt-dlp to handle the downloading - or take a closer look at the rest of player_params for clues. From my end I don’t have the spare time + @Trade Surplus soln has no downside for me ?
ffmpeg 5.1.2 python 3.11 and streamlink 1.7.0 yt-dlp 2023.03.04
-
reporter There's a lot of discussion about the reduction of SBS video quality on Whirlpool. As mentioned by Fred, users of apps like yt-dlp are also experiencing the same issue. It seems that SBS has changed their streaming method to two files that get combined after downloading rather than one file that contains everything. This may be a permanent state of affairs and it may be that higher bitrate versions of new shows are no longer available. I think I'll run a test every week or so to check whether they revert to their previous streaming method, but if they don't, the change I suggested above will have to become permanent.
As far as JSON/SMIL is concerned, everything you need is in the sbs.py file. You just have to step through the code manually, the same way the computer does. Long story short, the JSON for all available shows is here:
https://www.sbs.com.au/api/video_feed/f/Bgtm9B/sbs-section-programs/ -> constant variables defined in sbs.py
The JSON for Cryptoland is here:
https://www.sbs.com.au/api/video_pdkvars/id/2102748227651?form=json -> constant variables defined in sbs.py plus I got the id by playing the video at the SBS ondemand site
The JSON for Chernobyl is here:
https://www.sbs.com.au/api/video_pdkvars/id/2019334723997?form=json
In a browser, scroll down the JSON till you get to a paragraph called "releaseUrls" and you will find links to the source videos. The "htmlandroid" option is now the only one that works reliably. Previously, sbs.py used the "html" option.
-
Looks like they further broke SBS downloads the last couple of days.
Now getting 404 errors with yt-dlp, and, while the video can still download using webdl with the htmlandroid workaround) the side-project wrapper webdl_srt (https://bitbucket.org/temper8ur/webdl_srt/src/master/) can now only pull truncated caption files.
They really don’t like downloaders, do they?
-
reporter My fear is that they’ll change their streaming tech to something like what Seven or Nine use and which, for whatever reason, this app hasn’t been programmed to download from.
-
reporter I tested sbs.py with the “html” option and got a different result. Instead of the above ElementTree error, it now tries to resolve a URL and fails. Tried manually, the URL returns an Access Denied “You don't have permission to access … on this server.” page. Not sure whether this is progress or not.
-
reporter Since the issues encountered by webdl clearly aren’t the result of a problem with the code but rather at the SBS end, I’m going to resolve this issue.
-
reporter - changed status to resolved
The problem wasn't with the code but rather the way videos are now streamed from SBS.
-
Hi Trade Surplus,
Would you be able to reopen this? I understand what you are saying, but if webdl can't download SBS videos, it is not working properly and we need to get it updated. I'm hoping to have a bit of free time to go through youtube-dl and work out what needs to be done to get the code working again and will post here then.
-
(@delx - If it is you rather than Atlassian blocking my access, could you reinstate my access?)
And a second round - I've got a hack fix for this, but Atlassian or delx is preventing me opening issues, so the fix I've done in sbs.py is:
Add an API v3 constant at line 12:
# Quick and dirty fix for SBS v3 api API_V3_URL = BASE + "/api/v3/video_smil?id=" # end fix
And replace line 35 (
hls_url = self.get_hls_url(release_url)
) with the following block:# Quick and dirty fix for SBS v3 api # Try v3 api first hls_url = self.get_hls_url(API_V3_URL + self.video_id) if not hls_url: # Fall back to v2 api. I suspect this is already redundant. hls_url = self.get_hls_url(release_url) # end fix
This seems to working properly now.
-
Thanks for the patch, Larry. Unfortunately it looks like SBS may have permanently decreased maximum bitrates, meaning crap video quality on the desktop. They’re probably targeting mobile devices instead of PCs now.
So far my test downloads using your patch resulted in exactly the same file size as those done using the “htmlandroid” workaround suggested by Trade Surplus. Adding insult to injury, the webdl_fu add-on (for caption downloads) hasn’t been patched to fix the occasional truncated srt, and probably won’t ever be. The webdl_fu page is no longer accessible.
-
Comments on whirlpool suggest that bitrate is in flux at the moment. Hopefully they'll revert to higher rates in the long term.
I would not rely on the htmlandroid working fix working in the long term - I suspect that will last as long as it takes SBS to roll out a new client using the new API (I did some checks and it looks as though the new api works for both new and old videos).
-
Thanks for the update, Larry. I’m staying with your sbs.py patch, and have also hacked webdl_fu to wrap around it, and so far autograbber-cron.sh is working (as well as can be expected given the SBS-imposed limitations).
-
reporter - changed status to open
I've reopened this at Larry's request so that he can propose a long-term fix to the problem.
-
reporter My ability to create issues has been removed as well. Just to let you know, this account was subject to an issue-spamming incident a few weeks ago. A dozen or more bogus issues were raised which clearly have now been deleted. Perhaps the solution to that problem was to remove the ability for anyone to raise new issues.
-
this account was subject to an issue-spamming incident a few weeks ago.
I figured this was the case - if delx is reading comments, I guess they'll respond at some point.
-
reporter Thanks for the code fix Larry. Do you think it’s worth doing a pull request for it? I’m happy to do that if you think so.
-
Do you think it’s worth doing a pull request for it?
My fix is a quick hack based on work done on the yt-dlp codebase. It may or may not have long term legs^. I'd suggest we give delx time to come back to this, and if it's still idle in few months time, we can fork a longer term version maintained by active users.
^ In particular, given the other things happening on the front end, I suspect SBS may be updating their catalogue format - which would not be a bad thing (hopefully smaller and faster), but would require rework of other parts of sbs.py.
-
reporter I’ve only used yt-dlp to download YouTube videos. Is there a reason to stick with webdl if yt-dlp can handle SBS as well?
-
Is there a reason to stick with webdl if yt-dlp can handle SBS as well?
You'd need to wrap a batch function around it - I like webdl's catalogue function. And it also does ABC as well!
(I've been running it largely fire and forget for a lot of years now - it picks up whatever we aren't interested in and I postprocess for later viewing in plex).
-
Some more reasons:
Subtitle download support (if using webdl_fu)
Download history is saved (autograbber)
Save to predetermined folders (autograbber)
Range downloading
Wildcard support
Exclusion supportAnd probably some other features I can’t remember.
-
reporter It’s good to see there’s still interest in it. It’s a great piece of software that I use every day. I’ll set myself a reminder to check back in July. Hopefully we’ll have a better idea of what SBS is doing with their streaming by then as well.
-
Sounds good!
-
repo owner Sorry for the slow reply, life has been busy lately. Thanks for this fix, I’ve committed it :-)
-
repo owner - changed status to resolved
-
Just got the above message as an email, and this jogged my memory.
Regarding the webdl_fu truncated caption issue: Truncated subtitle files occur because the _fu code is unable to read some characters, notably pound symbols (£). It will just drop the ball as soon as it encounters such symbols. For those interested, my workaround is now to use yt-dlp to save failing captions in DFXP format, then manually convert them to SRT. Below is an example using an Alone Australia URL, identified as per the notes above:
yt-dlp --ignore-config -v --write-sub --sub-lang en --sub-format dfxp --skip-download "https://www.sbs.com.au/api/v3/video_smil?context=tv&id=2204846147868"
- Log in to comment
SBS changed something so now waiting for someone with python skills to publish a patch. Don’t think the original developer is around any more.