Source

riak / www / mapreduce.html

Diff from to

File www/mapreduce.html

 </p>
 
 <p>
-A "reduce phase" is conceptually simpler.  As it receives inputs from the preceding phase, it collates those inputs along with the ones already received and continually "reduces" the input set until it receives notification that it will receive no further data, at which point the entire reduced set will be streamed to the next phase. Note that this makes a reduce phase a concurrency barrier, as opposed to map phases which can be processing in parallel.  In order for this process to make any sense, a reduce phase's function must be commutative, associative, and idempotent.  Good examples are <code>sum</code> and <code>set-union</code>.  As Riak's core focus is on decentralized data storage and not on compute farming, reduce phases are generally run on a single cluster -- there is no data-locality gain to be had in reduce.
+A "reduce phase" is conceptually simpler.  As it receives inputs from the preceding phase, it collates those inputs along with the ones already received and continually "reduces" the input set until it receives notification that it will receive no further data, at which point the entire reduced set will be streamed to the next phase. Note that this makes a reduce phase a concurrency barrier, as opposed to map phases which can be processing in parallel.  In order for this process to make any sense, a reduce phase's function must be commutative, associative, and idempotent.  Good examples are <code>sum</code> and <code>set-union</code>.  As Riak's core focus is on decentralized data storage and not on compute farming, reduce phases are generally run on a single node -- there is no data-locality gain to be had in reduce.
 </p>
 
 <h3>A perfect fit for the Web</h3>