1. Nobue Itoh
  2. ruffus-ex

Wiki

Clone wiki

ruffus-ex / Home

Welcome

The Ruffus-Ex project is a slighlty improved version (fork) of the Ruffus python module.

For more information about the Ruffus itself please refer to the official documentation

Ruffus-Ex is licensed under MIT License which is exactly the same as the original Ruffus project license.

Ruffus-Ex features

  • Fully-functional support of the dictionaries as input and output arguments of the Nicola Fabiani decorator
  • Automatic serialization and deserialization decorators for input and output files:
    • Strings as txt files (file extension must be '.txt')
    • Numpy arrays using the numpy.save and numpy.load functions (file extension must be '.npy')
    • Other objects are saved using the pickle module (use any other extension)

Notes

Examples

Here is an example with all of the currently implemented Ruffus-Ex features:

import sys
import numpy

from ruffus import pipeline_run, \
                   autoserialization, \
                   files

######################################
# First task
######################################

inp_params = { 'filein': 'testdata/1.txt' }

out_params = { 'txt_out':     'testdata/1_out.txt',
               'list_out': 'testdata/1_list_out.pkl',
               'array_out' : 'testdata/1_numpy_out.npy' } 

@files(inp_params, out_params)
@autoserialization
def zero_task(inp, outp):
    
    # input
    
    deserialized_file = inp['filein']
    
    # output
    
    some_list = ['ABC', '123', 'GHF']
    some_array = numpy.arange(5)
    
    
    output_objs = {
                   'txt_out': deserialized_file,
                   'list_out': some_list,
                   'array_out': some_array, 
                   }
    
    return output_objs

########################################    
# Second task (depends on the first one)
########################################

input_params = { 'func_input': zero_task, # dependence is here
                 'txt_input' : 'testdata/1.txt', }

output_params = { 'txt_out_2': 'testdata/2_out.txt' }


@files(input_params, output_params)
@autoserialization
def first_task(inp, outp):
    
    # input: from the zero_task
    
    deserialized_file_1 = inp['func_input']['txt_out']
    list_example = inp['func_input']['list_out']
    numpy_array = inp['func_input']['array_out']

    # input: from the txt file

    deserialized_file_2 = inp['txt_input']

    # output
    
    output_string = deserialized_file_1 + \
                    deserialized_file_2 + \
                    list_example[2] + \
                    str(numpy_array)
    
    output_objs = {
                   'txt_out_2': output_string,
                   }
    
    return output_objs

######################################
# Run!
######################################

pipeline_run([first_task], verbose = 3)
      

Updated