Wiki

Clone wiki

gnd / ETL_Findings

Via oDesk developers I've been conducting a series of comparative tests between the two main Open Source ETL engines: Talend Open Studio and Pentaho Kettle.

Pentaho

Pros

  • Wide standard inbuilt function set
  • I believe it has Open Source web-based processing (can run jobs as web-service)
  • Also contains the Weka data mining tool

Cons

  • Pentaho website is very thin, little content

Talend

Pros

  • Has a more "rich" embedded Java component that breaks processing down to setup, per-row, teardown
  • Appear to be more downloadable components, well organised in TalendForge
  • Lots of good quality training
  • Eclipse based, nice and familiar
  • Coherent, tidy, thorough UI (makes Kettle appear a little amateurish)
  • tMemorize allows easy access to previous rows, without custom Java code

Cons

  • Appears to have a more limited function set
  • Customer engagement feels heavily focussed on up-sell

Maybe

  • Generates Java before running Job. This may be a 'con', due to fragility or performance. It may be a 'pro' due to having transferrable Java jobs.

Updated