metaTED is a tool that makes it easy to download all of the `TED talks`_. It
does so by creating over 1,500 `metalinks`_ of TED talks varying in both the
quality levels and possible talk groupings by directory. Features include:
* Creates talks with informative file names - i.e.
``Unconventional Explanations/Hans Rosling on HIV - New facts and stunning data visuals.mp4``
instead of original ``HansRosling_2009_480.mp4``.
* Provides subtitles for talks in over 85 supported languages. New
languages and translations are added daily through the
`TED Open Translation Project`_, and you help out by
`becoming a translator today`_.
* Tries hard to get all of the talks, or at least most of them - with a good
reason if some have failed.
* More choice - creates one metalink per available quality level
(currently low, standard and high).
* More choice - creates one metalink per available talk grouping, with all
talks belonging to the same group placed inside a common directory. The
possible talk groupings are extracted from talks metadata (currently
filming year, publishing year, talk theme, event name and author).
* Aggressive caching throughout the project, to avoid expensive network/CPU
operations as much as possible. Proper cache invalidation included.
* High levels of fault tolerance.
* Simple, yet powerful homegrown web crawler.
* Flexible and extensible software design with changes in mind.
* Provides both the console script and a public API.
.. _becoming a translator today: http://www.ted.com/translate/forted
.. _metalinks: http://en.wikipedia.org/wiki/Metalink
.. _TED talks: http://www.ted.com/
.. _TED Open Translation Project: http://www.ted.com/pages/view/id/287
Downloading TED talks
If you just want to `download TED talks`_, you don't need to install this
package, or even Python. All you need to do is get a
`download client that supports the Metalink standard`_ and choose one of the
`daily updated metalinks`_.
.. _download TED talks: http://metated.petarmaric.com/
.. _download client that supports the Metalink standard:
.. _daily updated metalinks: http://metated.petarmaric.com/
Installing and running metaTED
You can install metaTED with `pip`_ via ``pip install metaTED``. You can run it
with ``metaTED``, or ``metaTED -h`` to get help and the list of all available
The project itself is `hosted on bitbucket`_, from where you can get the code
and report bugs.
.. _pip: http://pip.openplans.org/
.. _hosted on bitbucket: http://bitbucket.org/petar/metated/
New in metaTED 2.0.0
metaTED 2.0.0 has been in the works for some time. It's the bigest change yet in
the projects short lived 2.5 year history. Cool stuff has been added, bugs were
fixed, but the API has seen some backwards incompatible changes as well.
* Fixed `issue #4`_ - Added talk subtitles support, as per popular request.
Major thanks to Randall Mason for the initial implementation.
* Added parallelism to the crawler, leading to substantial performance
* Added filming year, publishing year and event name to talks metadata,
which automagically added new possible talk groupings.
* Added parallelism to the metalink generator, leading to substantial
.. _issue #4: https://bitbucket.org/petar/metated/issue/4/include-subtitles
* Updated talk theme markers as TED updated their HTML layout and improved
* Updated video download markers and download URLs detection code as TED
updated their HTML layout.
* Updated author markers and detection code as TED updated their HTML
* talk_info metadata cache is written to disk as soon as possible to
minimize data loss on errors.
* Removed `setup.cfg` as we no longer need it.
* Removed the `dreamy-trac` project reference from `LICENSE`.
* Switched from using `setuptools` to `distribute` for packaging.
* Removed crawler based page caching as it's no longer used nor needed.
* Switched from `BeautifulSoup` to `lxml` and removed custom crawler code in
favor of `lxml.html.parse`.
* Minimal Python version bumped from 2.4+ to 2.6+.
* Major refactoring to modernize existing codebase, while improving code
style, optimizing performance and getting rid of accumulated technical
debt. Existing API has suffered a bit.