HTTPS SSH

README

This is the RADIO secure summation protocol (RASSP). RASSP implements the privacy-preserving data mining protocol described by Zamani et al. (2016)

RASSP is implemented as an Akka cluster. The cluster must contain at least 10 nodes and all nodes are in groups of 10. If any node from a group is unresponsive then that group is excluded from the query result.

In order to run a query in the system a RASSP client has to be used.

This repository has three modules:

  • The implementation of the RASSP protocol is in directory proto. This code must be executed at all the nodes that participate in a RASSP cluster, and provides the primitive computation functions.

  • The implementation of complex statistical functions using the RASSP primitives above is in directory stats. This code must be used to develop Java client applications that perform a computation at a RASSP cluster.

  • The implementation of an R interface to the statistical functions above is in directory RStats. This code must be used to develop R client applications that perform a computation at a RASSP cluster.

SET UP

Below you will find the appropriate instructions to set up the system in Linux:

  • Install java (JDK-JRE) >= 1.8

  • Set JAVA_HOME and PATH environment variables properly:

    export JAVA_HOME=<path-to-java> export PATH=$JAVA_HOME/bin:$PATH

  • Install APACHE MAVEN. At least v3.3 or higher must be used. Tested with v3.3.9.

  • Install R. Tested with v3.2.3.

    apt-get update apt-get install r-base

CLIENT SET UP

  • Install RStudio:

  • Go to RStudio website and download the version that fits your system. Then install it.

  • For instance, if you download a .deb file, you can install it by typing:

    dpkg -i <path-to-file>/<name-of-file.deb>

  • Go to R (or to RStudio) and install rJava CRAN package (v = 0.9-8):

    install.packages('rJava')

    Alternatively go to your system's console and type:

    apt-get update apt-get install r-cran-rjava

  • Go to R console (or to RStudio) and install the gridExtra, gtable and grid packages respectively:

    install.packages('gridExtra') install.packages('gtable') install.packages('grid')

INSTALLATION

The following steps show how to install the project succefully.

git clone https://bitbucket.org/dataengineering/rassp
cd rassp
git checkout feat-multicluster
mvn clean install && mvn dependency:copy-dependencies

CLIENT INSTALLATION

  1. If you use R-Studio, do the following:

    • Open R-Studio and load the RStats project :

      File -> Open Project -> <path_to_dir> -> rassp -> RStats -> radioStatistics.Rproj

    • Build the RStats project by pressing Build and Reload button of Build tab

    • Load the radioStatistics package, as well as the other packages (rJava, gridExtra, gTable, grid) that you have already installed:

      • Go to tab Packages and click the packages that you need to load

      • Otherwise, execute the library() command to the R console. For instance, to load radioStatistics package:

      library('radioStatistics')

    • Compile and run the sourceFiles script:

      • Go to R console of RStudio and compile sourceFiles script:

      source('<dir-of-project>/rassp/RStats/R/sourceFiles.R')

      • Then run it:

      sourceFiles()

  2. If you use R-console directly, do the following:

    • Start R within your package directory - <dir_of_project>/rassp/RStats

    • Install devtools by typing:

      install.packages("devtools")

    • Then load devtools package

      library(devtools)

    • Then, to build the package, type

      build()

    • And finally install it

      install()

    • In order to o use our package, type

      library(radioStatistics)

Now, you are ready to execute the secure statistics from rassp package. For further information of running the secure statistics, please go to README_stats.md in ./RStats directory.

RASSP NODE CONFIGURATION

RASSP is configured as an Akka cluster. All properties of the configuration are the same as in a typical Akka cluster configuration except the following:

cluster {
...
   roles = ["private-data-worker"]
...
}

All rassp nodes must have the private-data-worker role.

rassp {
  peers = ["akka.ssl.tcp://benaloh@ip1:port1","akka.ssl.tcp://benaloh@ip2:port2",...,"akka.ssl.tcp://benaloh@ip20:port20"]
  group-proxies = ["akka.ssl.tcp://benaloh@seedip1:seedport1","akka.ssl.tcp://benaloh@ip2:seedport2"]
}

In the peers field define all the IPs and ports of the the RASSP nodes in the nodes group (including that node). In the group-proxies field define all the nodes that are group proxy for the group. The group proxy are responsible for collecting the query results of all nodes in the group and every group must have one group proxy.

An example configuration can be found at proto/src/resources/application.conf. For each node the user must define the fowllowing values:

hostname node's ip

port port for the Akka service

key-store keystore location

key-store-password keystore password

trust-store truststore location

trust-store-password truststore password

seed-nodes URL for at least one seed node, e.g. akka.ssl.tcp://benaloh@10.0.10.10:2552

peers URLs of all the nodes in that node's group

group-proxies URLs of all the group proxies in the cluster

The application file path is defined upon runtime with -Dconfig.file=/path/to/application.conf.

DEPLOY AND RUN

RASSP nodes are executed as Akka cluster nodes. All communication between the nodes is encrypted and secure. Before running the nodes one keypair must be created and inserted in the nodes' local keystore. Access to this keystore is specified by the key-store and key-store-password configuration properties.

Also the public keys of all nodes must be inserted in each node's local truststore. Access to this truststore is specified by the trust-store and trust-store-password configuration properties.

It is recommended to first start the nodes that are seeds for the Akka cluster. The seed nodes are configured contact points for initial, automatic, join of the cluster Akka doc.

Each node can be started by issuing

cd rassp/proto/target && java -cp "rassp-proto-0.0.1-SNAPSHOT.jar:dependency/*" -Ddb.file=/path/to/values.json -Dconfig.file=/path/to/application.conf gr.demokritos.iit.radio.home.protocols.RASSP

The gossip will start between the nodes and a cluster will be setup.

RASSP CLIENT CONFIGURATION AND EXECUTION

The RASSP client also uses Akka to connect to the RASSP nodes. The rassp node which will be the initial contanct for the client must be defined in the configuration file.

cluster {

  client {
     ...
     initial-contacts = ["akka.ssl.tcp://benaloh@contact_ip:contact_port/system/receptionist"]
     ...
  }
}

An example configuration can be found at proto/src/resources/application-client.conf. For each node the user must define the fowllowing values:

hostname node's ip

port port for the Akka service

key-store keystore location

key-store-password keystore password

trust-store truststore location

trust-store-password truststore password

initial-contacts URL of the initial contact, which can be any member of the RASSP cluster

The application file path is defined upon runtime in file RStats/R/onAttach.R in .jcall("java/lang/System","S","setProperty","config.file","/path/to/application.conf"). By default the configuration file is found at /root/application.conf.

The RASSP cluster can now be queried using the RASSP client. More info on how to query along with examples can be found at RStats/README_stats.md.

DATASET

Each RASSP node reads values from a json file. The file must be in the form of

{
  "var1" : "value1",
  "var2" " "value2"
}

The location of the json file is defined upon runtime with -Ddb.file=/path/to/values.json.

We offer a dataset 1 to test and evaluate our statistics. The dataset is located at: * ./rassp/stats/src/test/resources/ in the folder dbFiles.

The dataset includes a set of 'User_k_.json' files, where each file is a database that has a schema that is appropriate for the queries in this example.

Each file represents the sensitive data of one node. Each file should be deployed at one of the RASSP nodes.

Description of the dataset

  • Dependent Variable (DV) = Value (2 levels: Before/After)

  • Independent Variables (IV):

  • Between Participants

    • Age (2 levels: Young/ Old)
    • Sex (2 levels: Male/ Female)
  • Within Participants

    • Time (2 levels: Before/ After)

Therefore, a representative example of a participant's values in .json format is:

{
  "age": "old",
  "sex": "F",
  "value_time_before": 9.5,
  "value_time_after": 7.1,
  "value_time_avg": 8.3,      /*average of value_time_after and value_time_before of the same participant*/
  "value_age_old_time_before": 9.5,
  "value_age_old_time_after": 7.1,
  "value_age_old_time_avg": 8.3   /*average of value_age_old_time_after and value_age_old_time_before of the same participant*/ 
}

REFERENCES

A Peer-to-Peer Protocol and system Architecture for Privacy-Preserving Statistical Analysis Katerina Zamani, Angelos Charalambidis, Stasinos Konstantopoulos, Maria Dagioglou and Vangelis Karkaletsis. In Proceedings of the Workshop on Privacy Aware Machine Learning for Health Data Science (PAML 2016), Salzburg, Austria, 31 August 31 - 2 September 2016. DOI: 10.1007/978-3-319-45507-5_16 Full text at https://zenodo.org/record/61017