mycloud /

Filename Size Date modified Message
src
tests
144 B
1.3 KB
2.7 KB

mycloud

Leverage small clusters of machines to increase your productivity.

mycloud requires no prior setup; if you can SSH to your machines, then it will work out of the box. mycloud currently exports a simple mapreduce API with several common input formats; adding support for your own is easy as well.

usage

Starting your cluster:

# list each machine and the number of cores to use
cluster = mycloud.Cluster([('machine1', 4),
                           ('machine2', 4)],
                           tmp_prefix='/path/to/store/results')

Invoke a function over a list of inputs:

result = cluster.map(my_expensive_function, range(1000))

Use the MapReduce interface to easily handle processing of larger datasets:

from mycloud.resource import CSV
input_desc = [CSV('/path/to/my_input_%d.csv' % i for i in range(100)]
output_desc = [CSV('/path/to/my_output_file.csv']

def map_identity(k, v, output):
  output(k, int(v[0]))

def reduce_sum(k, values, output):
  output(k, sum(values))

mr = mycloud.mapreduce.MapReduce(cluster,
                                 map_identity,
                                 reduce_sum,
                                 input_desc,
                                 output_desc)

result = mr.run()

for k, v in result[0].reader():
  print k, v
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.