Wiki

Clone wiki

HEP-Frame / Home

HEP-Frame

HEP-Frame (Highly Efficient Pipelined Framework) is a software engineered framework to aid the development of parallel scientific code with a pipeline structure. In the context of High Energy Physics, HEP-Frame aims to improve physicists coding productivity and robustness, while ensuring an efficient parallel execution of the resulting application on a wide set of multicore computing platforms. The framework is custom designed to this problem domain to provide a user-friendly interface, without sacrificing code execution efficiency.

Installation

Requirements

It is necessary to have the following packages installed:

Procedure

If you're using clang some compilation options might need to be changed. Edit the Makefile's CXXFLAGS variable as needed.

Download and unzip the HEP-Frame folder to a directory of your choosing. Be sure to have ROOT properly installed and configured. Enter the scripts directory and execute the install.sh script as follows:

  • ./install.sh, the Boost library will also be installed

  • ./install.sh /My/Boost/Dir, if Boost is installed in a custom directory

Do not forget to add the Boost library directory to the LD_LIBRARY_PATH environment variable in the bash session. Ex.: export LD_LIBRARY_PATH=/Boost/Dir/lib/:$LD_LIBRARY_PATH

It may take a long time to install all tools.

Updating HEP-Frame

On the scripts folder run the update.sh script and the framework will automatically update:

  • ./update.sh, the Boost library will also be installed

  • ./update.sh /My/Boost/Dir, if Boost is installed in a custom directory

A New Analysis

Creating an analysis

Executing the newAnalysis.sh script without any parameters produces a message with the expected inputs and their order. A folder with the analysis name is created inside the Analysis directory, containing all required files inside src. Run in as "./newAnalysis AnalysisName /root/tuple/dir Tree". The Tree is the name of a specific tree, if it's left unset the script will use the first tree in the file. If the ROOT tuple was generated with delphes replace Tree with delphes. The relevant generated files are:

  • AnalysisName.cxx - write all the code for the analysis here (cuts, initialization, etc), following the template in the file. Warning: The name cannot contain special characters (-, &, ~, >, etc)

  • AnalysisName_Event.h - the event information, as stored in the input ROOT file is available here, and the user can add its own variables

  • AnalysisName_cfg.h - add the event variables that you want to save per cut in the specified section inside this file (use pdfs at your own risk - not fully tested), which are later written to a ROOT file with a TTree per cut

Don't forget to update the AnalysisName.cxx with the number of cuts of the analysis! Otherwise, the event variables won't be stored properly. You will see the variable to edit in the main function.

IMPORTANT

If you want to run with root tuples generated by delphes go to the lib/Makefile and uncomment line after "# With delphes inputs" and create your analysis with ./newAnalysis AnalysisName /root/tuple/dir delphes

Coding an analysis

Create cuts following the sample cut provided on the analysis template. Code what you want to perform to an event, as the cuts will be automatically applied to all events in the input files. Add your cut to the analysis anl.addCut("cut_name", cut_function);. The order you add them will be their execution order. Do not forget to update the number_of_cuts variable in the main function so that the framework executes all cuts properly. You can code auxiliary functions as you want.

Event variables are accessed as if they were in global memory. Writing cout << lep1 << endl; will print the lep1 variable of the current event being processed. HEP-Frame takes care of applying the code to all events.

TH1D, TH2D, and TH3D histograms are loaded automatically to global memory. They can be accessed like th1d["root"]["hist_name"]->Print();. Histograms inside directories of the input ROOT file are also loaded, and can be accessed as th1d["directory_name"]["hist_name"]->Print();

Compiling an analysis

To compile the code just type make, which will clean make the code. It may take some time to compile. The code will be automatically parallelized, and the variables you want to store (defined in the _cfg.cxx file) will be saved for every event per cut. If you want to execute the code sequentially type export SEQ=yes on your terminal and re-compile the code. To revert this just type unset SEQ.

Executing an analysis

After compilation, the analysis binary is stored in the bin folder. It receives at least 1 input parameter:

  • -f - The input root file to process - exclusive with -d

  • -d - A directory with input root files to process (all must have the same name of the TTree to process) - exclusive with -f

  • -r - The name of the file to store the event variables saved per cut

  • -s - The name of the file with the signal pdfs (optional - beta feature)

  • -b - The name of the file with the background pdfs (optional - beta feature)

  • Usage example: ./analysis -r rec_vars_file -d ../dir/somanyfiles/

Advanced usage of the framework

Some pro-tips:

  • export HEPF_NUM_THREADS=X - defines the number of threads to use

  • Add as many .cxx files as you want in your analysis src directory, they will be automatically compiled

  • PseudoRandomGenerator prn; - define as many PRNG as you want. Initialize it before use by calling prn.init(YOUR_NUMBER_OF_THREADS); and prn.initialize(avg, stddev);

  • PRNG functions prn.uniformTrand(); and prn.gaussianTrand(); use ROOT TRandom3

  • PRNG functions prn.uniform(); and prn.gaussian(); use the PCG PRNG (faster and statistically better, see www.pcg-random.org)

  • If you have compute intensive cuts, and only need to store variables of the events that pass all cuts, consider using a better scheduler (export HEPF_SCHEDULER=yes, experimental). This scheduler will find the best order for your cuts and update it at runtime. If you have dependencies between cuts define them in the main function after adding all cuts, and before anl.run(), by doing anl.addCutDependency("cut_name1", "cut_name2");. This ensures that "cut_name2" will be executed after "cut_name1"

  • export HEPF_THREAD_BALANCE=yes before make to enable simultaneous loading/initialization of the input data and processing

  • export HEPF_MPI=yes before make to enable compilation with MPI

  • export HEPF_DEBUG=yes before make will compile the framework and your analysis with debug information

  • export HEPF_INTEL=yes before make will compile the framework and your analysis with the Intel compiler

  • export HEPF_GPU=yes before make will enable NVidia GPU support for PRNG (much faster for PRN intensive analyses)

Recording Variables per Pipeline Stage

User and dataset variables can be automatically stored by HEP-Frame if indicated by the user. To store a variable edit a specific section of the AnalysisName_cfg.cxx file:

// Write here the variables and expressions to record per cut

#ifdef RecordVariables

#endif

You should insert the variables you wish to store per pipeline stage. Lets assume that a given dataset has the following variables:

int v1;

int a1[5]

Class c1

vector <float> a2

These variables can be stored as follows:

// Write here the variables and expressions to record per cut

#ifdef RecordVariables

v1 -> stores the scalar

a1 -> stores every position of the array a1 [0 to 4]

c1.getValue() -> stores the result of the getValue() method

a2[0] -> stores the position 0 of the a2 vector

#endif

Moreover, HEP-Frame also supports arithmetic:

// Write here the variables and expressions to record per cut

#ifdef RecordVariables

v1 + a1[2] -> stores the scalar v1, a1[2], and the result of v1 + a1[2]

a2[0] / a2[1] - a2[2] -> stores a2[0], a2[1], a2[3], and the result of a2[0] / a2[1] - a2[2], and you have to ensure that the vector has at least 3 positions otherwise it will crash

#endif

It is necessary that the variables to store are declared in the AnalysisName_Event.h file (you can add your own variables here). You should not edit or remove any other section of the AnalysisName_cfg.cxx file.

These variables will automatically be stored for each pipeline stage for each dataset element. An ROOT output file will be created with this data.

List of publications

  • A. Pereira, A. Onofre, and A. Proenca, “Removing Inefficiencies from Scientific Code: The Study of the Higgs Boson Couplings to Top Quarks,” in Proceedings of the 14th International Conference on Computational Science and Its Applications. Springer International Publishing, 2014, pp. 576–591.

  • A. Pereira, A. Onofre, and A. Proenca, “Tuning Pipelined Scientific Data Analyses for Efficient Multicore Execution,” in Proceedings of the International Conference on High Performance Computing Simulation (HPCS). IEEE, 2016, pp. 751–758.

  • A. Pereira, A. Onofre, and A. Proenca, “HEP-Frame: A Software Engineered Framework to Aid the Development and Efficient Multicore Execution of Scientific Code,” in Proceedings of the 2015 International Conference on Computational Science and Computational Intelligence. IEEE, 2015, pp. 615–620.

  • A. Pereira and A. Proenca, “Efficient Use of Parallel PRNGs on Heterogeneous Servers,” in Proceedings of the International Conference on Mathematical Applications. Institute of Knowledge and Development, 2018, pp. 7–12.

Updated