Wiki
Clone wikignd / ETL_Findings
Via oDesk developers I've been conducting a series of comparative tests between the two main Open Source ETL engines: Talend Open Studio and Pentaho Kettle.
Pentaho
Pros
- Wide standard inbuilt function set
- I believe it has Open Source web-based processing (can run jobs as web-service)
- Also contains the Weka data mining tool
Cons
- Pentaho website is very thin, little content
Talend
Pros
- Has a more "rich" embedded Java component that breaks processing down to setup, per-row, teardown
- Appear to be more downloadable components, well organised in TalendForge
- Lots of good quality training
- Eclipse based, nice and familiar
- Coherent, tidy, thorough UI (makes Kettle appear a little amateurish)
- tMemorize allows easy access to previous rows, without custom Java code
Cons
- Appears to have a more limited function set
- Customer engagement feels heavily focussed on up-sell
Maybe
- Generates Java before running Job. This may be a 'con', due to fragility or performance. It may be a 'pro' due to having transferrable Java jobs.
Updated