metaTED is a tool that makes it easy to download all of the TED talks. It does so by creating over 1,500 metalinks of TED talks varying in both the quality levels and possible talk groupings by directory. Features include:
- Creates talks with informative file names - i.e. Unconventional Explanations/Hans Rosling on HIV - New facts and stunning data visuals.mp4 instead of original HansRosling_2009_480.mp4.
- Provides subtitles for talks in over 85 supported languages. New languages and translations are added daily through the TED Open Translation Project, and you help out by becoming a translator today.
- Tries hard to get all of the talks, or at least most of them - with a good reason if some have failed.
- More choice - creates one metalink per available quality level (currently low, standard and high).
- More choice - creates one metalink per available talk grouping, with all talks belonging to the same group placed inside a common directory. The possible talk groupings are extracted from talks metadata (currently filming year, publishing year, talk theme, event name and author).
- Aggressive caching throughout the project, to avoid expensive network/CPU operations as much as possible. Proper cache invalidation included.
- High levels of fault tolerance.
- Simple, yet powerful homegrown web crawler.
- Flexible and extensible software design with changes in mind.
- Provides both the console script and a public API.
Downloading TED talks
If you just want to download TED talks, you don't need to install this package, or even Python. All you need to do is get a download client that supports the Metalink standard and choose one of the daily updated metalinks.
Installing and running metaTED
You can install metaTED with pip via pip install metaTED. You can run it with metaTED, or metaTED -h to get help and the list of all available options.
The project itself is hosted on bitbucket, from where you can get the code and report bugs.
New in metaTED 2.0.0
metaTED 2.0.0 has been in the works for some time. It's the bigest change yet in the projects short lived 2.5 year history. Cool stuff has been added, bugs were fixed, but the API has seen some backwards incompatible changes as well.
- Fixed issue #4 - Added talk subtitles support, as per popular request. Major thanks to Randall Mason for the initial implementation.
- Added parallelism to the crawler, leading to substantial performance improvements.
- Added filming year, publishing year and event name to talks metadata, which automagically added new possible talk groupings.
- Added parallelism to the metalink generator, leading to substantial performance improvements.
- Updated talk theme markers as TED updated their HTML layout and improved error handling.
- Updated video download markers and download URLs detection code as TED updated their HTML layout.
- Updated author markers and detection code as TED updated their HTML layout.
- talk_info metadata cache is written to disk as soon as possible to minimize data loss on errors.
- Removed setup.cfg as we no longer need it.
- Removed the dreamy-trac project reference from LICENSE.
- Switched from using setuptools to distribute for packaging.
- Removed crawler based page caching as it's no longer used nor needed.
- Switched from BeautifulSoup to lxml and removed custom crawler code in favor of lxml.html.parse.
- Minimal Python version bumped from 2.4+ to 2.6+.
- Major refactoring to modernize existing codebase, while improving code style, optimizing performance and getting rid of accumulated technical debt. Existing API has suffered a bit.