HTTPS SSH
Background:

Data Elevator is designed to move data between different storage layers transparently in a hierarchical storage of HPC systems. It supports two I/O interfaces: HDF5 and MPI-IO. This file mostly focuses on instructions to compile/install/use the HDF5 external VOL plugin.

For more information/help, contact:
        Bin Dong, Suren Byna, or Kesheng Wu [dbin@lbl.gov, sbyna@lbl.gov, or kwu@lbl.gov].

To use Data Elevator on the Cori system @ NERSC, unload darshan and load the data-elevator module:

       > module unload darshan
       > module load data-elevator/0.2

For any questions on using Data Elevator on Cori @ NERSC, contact: Jialin Liu [jalnliu@lbl.gov]

Instructions 1 and 2 below are to build Data Elevator on your own.

Some configuration parameters used in the instructions:

        DE_DIR                : directory of unpacked Data Elevator source code
        DE_DIR/build          : directory for installing Data Elevator code
        H5_DIR                : directory of HDF5  source code
        H5_DIR/build          : directory for installing HDF5 VOL code
      
1, Preparation
    
    1.1 Download the Data Elevator code 

       > git clone https://bitbucket.org/sbyna/dataelevator.git DE_DIR
   
    1.2 Download the HDF5 "develop" branch (Note: Once the VOL functionality 
       is released in HDF5, one can use HDF5 directly. Until then, HDF5 develop branch is needed) 

       > git clone https://bitbucket.hdfgroup.org/scm/hdffv/hdf5.git  H5_DIR

    1.3 automake/autoconf may be needed, if there are any "configuration errors".

        > module load automake/1.15 
        > module load autoconf/2.69 


2, Installation

    Note: "module unload darshan" before the following steps on the Cori system located at NERSC

    2.1 Compile HDF5 

        >  cd H5_DIR
        >  ./autogen.sh  (skip this step if ./configure exists)
        >  ./configure --prefix=$PWD/build --enable-parallel CC=cc
            Note: (add --disable-shared on Cori@NERSC) 
        >  make install

    2.2 compile Data Elevator 

        > cd DE_DIR
        > ./configure --prefix=$PWD/build CC=cc --disable-shared 
            Note: (try ./bootstrap if errors are seen with this step)
        > make install
            
        Note: With successful installation, $PWD/build should contain three sub-directories: bin, include, and lib 
              The Data Elevator job is $PWD/build/bin/dejob 
              See Section 4 for the usage options of dejob
 
3, Test

    > cd Data Elevator_DIR/example
       Note: edit "./Makefile" by updating below two variables with right locations
       -- H5_HOME = H5_DIR/build
       -- DE_HOME = DE_DIR/build
       -- Replace mpicc to the compiler, e.g., cc @ NERSC
    
    > make 
    Note: "test" executable file is created at this step.

    Please jump to 3.3 to see test with job batch script on NERSC machines.  

    3.1  Test the HDF5 file write functionality  
       Start two terminals and run the following two commands in order  
       
        On Terminal 1 
        > ./run.app.sh write 
                      ......  
                      [Test Passed] Read data back on BB , no error!
                      [Test Passed] Read data back on BB , no error!
       
        On Terminal 2 
        > ./run.de.sh  write 
                       =====================================
                       Summary Information
                       =====================================
                        Total files        = 3
                        .....
                        Average write rate = 0.119557 GB/s
                       =====================================
        
        "./run.app.sh write" tests the writing and reading functions for Data Elevator.
        It writes three HDF5 files and these three files are redirected to a burst buffer.
        It then reads the data back from the burst buffer for a verification.
        "./run.de.sh  write" moves the files from the burst buffer to a destination file systems for archiving. 

        One can also manually check the correctness of the files in the burst buffer and in the destination file system:
                //Compare the file on the burst buffer a previously written file in the "examples" directory.
                > h5diff h5file-0.h5.on.bb  h5file-0.h5.right
              
                //Compare file on the destination file System and the verified correct file
                > h5diff h5file-0.h5  h5file-0.h5.right

    3.2  Test the HDF5 Read functionality
        This test is done using two terminals.
        On terminal 1: 
        
        > ./run.app.sh prefetch
                      [Test Passed] Prefetch data from Disk to BB , no error!
                      [Test Passed] Prefetch data from Disk to BB , no error!
         
        On terminal 2: 
        > ./run.de.sh  prefetch
                      =====================================
                      Summary Information for Prefetch
                      =====================================
                      Total prefetch batch      =  51
                      Total prefetch time (max)    = 0.071189 seconds, ave = 0.001396
                      Total prefetch time (min)    = 0.000546 seconds, ave = 0.000011
                      Prediction time, sum = 0.070649, predict_counts = 51, ave = 0.001385 s
                      =====================================

        "./run.app.sh prefetch" test the prefetch function in Data Elevator to support read.
        It read file "prefetch-100by100.h5" in File System as 10 by 10 chunk.   
        "./run.de.sh  prefetch" reads chunks from  "prefetch-100by100.h5" and stores them in Burst Buffer.

    3.3  Test on Cori @ NERSC using Slurm batch scripts in the "examples" directory.

         //For write test
         > sbatch test-write-cori.sh   
                  //Check "de-test-bb-write.*.out" file for [Test Passed]
                 
         //For read test
         > sbatch test-read-cori.sh   
                  //Check "de-test-bb-read.*.out" file for [Test Passed]
    
4, Using Data Elevator with an application code.
Please refer to "examples/test.c" for reference about how to add Data Elevator external VOL connector.

To use the HDF5 Data Elevator the following code has to be added to an application before creating/opening an HDF5 file to register the VOL connector.
          
            #include <data_elevator_vol.h>
            .....
            int  mpi_size, mpi_rank;
            MPI_Init(&argc, &argv);
            MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
            MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
            de_fapl = H5Pcreate(H5P_FILE_ACCESS); 
            H5Pset_fapl_mpio(de_fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
            de_vol_id = H5VLregister_connector(&H5VL_data_elevator_g, H5P_DEFAULT); 
            H5VL_data_elevator_t under_vol;
            hid_t under_vol_id; void *under_vol_info;
            H5Pget_vol_id(de_fapl, &under_vol_id);
            H5Pget_vol_info(de_fapl, &under_vol_info);
            under_vol.under_vol_id = under_vol_id;
            under_vol.under_object = under_vol_info;
            H5Pset_vol(de_fapl, de_vol_id, &under_vol);
            ..... 

Then, use "de_fapl" as the file access property to either create or open an file. 

            file_id  = H5Fcreate(file_name, H5F_ACC_TRUNC, H5P_DEFAULT, de_fapl);     
            file_id  = H5Fopen(file_name, H5F_ACC_RDONLY, de_fapl);

Finally, close and unregister the connector.

            H5VLunregister_connector (de_vol_id);  
            H5Pclose(de_fapl);  

Note: for parallel case, you may need beflow enviroment variable to disable HDF5 file lock:

            export HDF5_USE_FILE_LOCKING=FALSE


5, Other 
   One may need following commands to regenerate "configure" file for HDF5. 

            aclocal 
            autoheader
            automake --force-missing --add-missing --copy --force-missing
            autoconf

6, Citations:

    Bin Dong, Suren Byna, Kesheng Wu, Prabhat, Hans Johansen, Jeffrey N. Johnson, 
    and Noel Keen, "Data Elevator: Low-contention Data Movement in Hierarchical Storage 
    System", HiPC 2016. 
    Paper: https://sdm.lbl.gov/~sbyna/research/papers/201612_DataElevator_HiPC2016_Bin_Byna.pdf
    Presentation: https://sdm.lbl.gov/~sbyna/research/papers/201612_DataElevator_HiPC2016_slides.pdf 
    

    Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, and Suren Byna, "ARCHIE: Data Analysis 
    Acceleration with Array Caching in Hierarchical Storage", IEEE Big Data 2018
    Paper: https://sdm.lbl.gov/pdc/pubs/201812-BigData-ARCHIE.pdf