HTTPS SSH

Table of Contents

Project: Shortcuts through Colocation Facilities

Network overlays, running on top of the existing Internet substrate, are of perennial value to Internet end-users in the context of, e.g., real-time applications. Such overlays can employ traffic relays to yield path latencies lower than the direct paths, a phenomenon known as Triangle Inequality Violation (TIV). Past studies identify the opportunities of reducing latency using TIVs. However, they do not investigate the gains of strategically selecting relays in Colocation Facilities (Colos). In this work, we answer the following questions: (i) how Colo- hosted relays compare with other relays as well as with the direct Internet, in terms of latency (RTT) reductions; (ii) what are the best locations for placing the relays to yield these reductions. To this end, we conduct a large-scale one-month measurement of inter-domain paths between RIPE Atlas (RA) nodes as endpoints, located at eyeball networks. We employ as relays Planetlab nodes, other RA nodes, and machines in Colos. We examine the RTTs of the overlay paths obtained via the selected relays, as well as the direct paths. We find that Colo-based relays perform the best and can achieve latency reductions against direct paths, ranging from a few to 100s of milliseconds, in 76% of the total cases; ∼75% (58% of total cases) of these reductions require only 10 relays in 6 large Colos.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes (e.g., for replicating the measurement study results during a different time frame).

Prerequisites and Installation

First of all, the entire software was written in python3, which needs to be pre-installed on your system.

Install pip3:

sudo apt-get install python3-pip

In order to isolate the following installation and runs from other parts of the system, we can run everything in a python3 venv environment. This can be done according to the instructions on the python3 venv tutorial

Please follow the aforementioned guide to set up such an environment on your system.

Then, install the required python3 packages within the venv:

pip3 install -r requirements.txt

In case a required dependency is missing please contact vkotronis at ics.forth.gr.

Running the software

The software is composed of different modules, categorized in 3 categories:

1) Measurement, 2) Analysis, and 3) Visualization. In this README we do not go into detail on all modules comprising the code, but provide the needed information to have a basic setup up and running, as well as process and visualize the datasets that have been already collected during this study. If your intention is only the latter one, then you can skip the first category (Measurement).

Note that offline steps that precede the basic RIPE Atlas measurements such as:

  1. discovery and verification of Colo IPs
  2. setup of PlanetLab nodes
  3. search of eyeball ASes based on user coverage
  4. manual verification of eyeball ASes

are not included in the current README for brevity. For more details on those steps, please check the original IMC publication, or contact vkotronis at ics.forth.gr.

Measurement

The main measurement execution script is super_script.py, which runs on a server of your choice and orchestrates all measurements for all measurement rounds, performing the following functions per measurement round:

  1. fetch all updated RIPE Atlas nodes belonging to eyeballs, after proper classification
  2. fetch all updated pingable alive PlanetLab nodes, and geolocate them
  3. fetch all verified pingable Colo IPs, and geolocate them (verification is an offline step)
  4. generate sample of RIPE Atlas nodes for RAE endpoints based on (CC,ASN) (1 per country, only for eyeball probes, step (1) of the workflow of Section 2.5)
  5. generate sample of RIPE Atlas nodes for RAR relays based on (CC,ASN) (1 per country, both for eyeball and all probes)
  6. generate sample of colo IPs for COR relays based on facility information (1-3 IPs per facility)
  7. generate sample of PL nodes for PLR relays based on site information (1-3 nodes per site)
  8. issue and fetch (RAE,RAE) measurements to retrieve direct RTTs (step (2) of the workflow of Section 2.5)
  9. for each (RAE,RAE) pair calculate the feasible RAR, PLR and COR relays based on the direct RTT and the geo_latency (step (3) of the workflow of Section 2.5)
  10. issue and fetch all (RAE,RAE), (RAE,RAR), (RAE,COR), (RAE,PLR) measurements (step (4) of the workflow of Section 2.5)

The script is parametarized in-code, adjusting the following parameters:

ALL_MSM_DATA_DIR = <PATH_TO_MSM_DIR_WHERE_ALL_DATA_IS_STORED>
ALL_MSM_SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__))
MSM_INTERVAL = 12 * 3600  # interval (seconds) between the 10-step measurement rounds
MIN_STEP = 1 # numeric id of the first step to be done per round, always set to 1
MAX_STEP = 10 # numeric id of the last step to be done per round, always set to 10
MAX_INTER_STEP_DURATION = 1 * 3600  # maximum duration (seconds) to wait between consecutive steps, if the previous fails (e.g., if the script dies at a certain point, and then after re-initiation by the monitoring script it still dies at a certain step, then the max waiting interval to ditch this round and proceed to the normal rotation is this parameter)
EYEBALL_CUTOFF = 10.0 # user coverage cutoff percentage, to consider and AS as an eyeball in a certain country
MAX_IPS_PER_FAC = 3 # maximum number of Colo IPs sampled per facility
MAX_PLS_PER_SITE = 2 # maximum number of PLR IPs sampled per site
PING_INTERVAL = 5 * 60  # interval (seconds) between consecutive pings for a specific pair of nodes within a measurement window
PING_PKTS = 6 # number of pings for a specific pair of nodes per measurement window
RUN_MSM = True # flag to run the actual measurements, instead of simply testing the code
CHECK_NET = True # flag to intruct the server which runs the measurements, to check its network connectivity before proceeding to any steps requiring Internet connection (such as step (1))
PY3_BIN = '/usr/bin/python3' # the default location of the python executable
END_DATE_TIMESTAMP = 1496275199  # the final timestamp (unix epochs), after which no more measurements are done on the server

To run within venv:

python3 -m venv /path/to/new/virtual/environment
source venv/bin/activate
python3 super_script.py
deactivate

Note that for running RIPE Atlas measurements yourself, you need a list of API keys present at the location:

<home>/.atlas/auth.json.

Also, note that we include a bash script monitor_super_script.sh whose job is to be used as a Linux cron job that constantly monitors the super_script.py process, and revive it in case it dies (e.g., because the measurement server crashes or reboots during the long collection period). The revival is soft, meaning that the script will potentially resume from the failed step; in the worst case, e.g., if the server has been dead for multiple hours/days, to avoid inconsistent/stale data, the current measurement round is aborted and the measurement will continue at the time of the next expected measurement round.

Analysis

For the analysis part, you need to have a folder where all measurement data (collected by yourself using the measurement script or collected by us for the study) reside. For simplicity, let's name this folder ALL_MSM_DATA_DIR. It will contain all tar.gz files of all measurement rounds, code-named as follows: msm_<msm_round_id>.tar.gz.

For this study, you can download the datasets here.

The format of the measurement data is described in detail later (see Online Resources --> Measurement Datasets).

The next steps suffice to generate what is stated in the paper. We will include more analysis commands that are available in the software at a later stage.

Step 0:

cd analysis

Step 1: retrieve all valid ping medians from measurements

python3 get_valid_ping_medians.py -m <ALL_MSM_DATA_DIR>

This will produce, within ALL_MSM_DATA_DIR, json files named msm_<round_num>_valid_ping_medians.json, as dictionaries with format:

  1. keys: (RAE_src, RAE_dst) tuple
  2. values:
    1. rae2plr: key = PLR relay name, value = median latency of relayed path, cc, continent, asn
    2. rae2cor: key = COR relay IP, value = median latency of relayed path, cc, continent, asn, facility
    3. rae2rar_eye: key = RAR_eye probe/anchor ID, value = median latency of relayed path, cc, continent, asn
    4. rae2rar_other: key = RAR_other probe/anchor ID, value = median latency of relayed path, cc, continent, asn
    5. rae2rae: key = RAE probe/anchor ID, value = median latency of direct path, cc, continent, asn

This step is prerequisite for all following steps (including visualization).

Step 2: retrieve statistics for valid measurements (pings), used in first paragraph of Sec. 3

python3 get_valid_ping_stats.py -m <ALL_MSM_DATA_DIR>

This will generate terminal output of the following form:

...
all_msm_data/msm_17 - Is valid: True - Last step: 10
Removed rae2rae pairs with common rae2rar_eye relays: 60004
Pairs without responses: 234015 - 16.01%
Pairs with less than three responses: 1729 - 0.12%
Pairs with more than three responses: 1225773 - 83.87%
Rae2rae pairs: 90223
Rae2rae pairs with cor: 90223
Rae2rae pairs with plr: 90203
Rae2rae pairs with eye: 90223
Rae2rae pairs with other: 90204
Rae2rae pairs with all: 90203
Total pings: 8744146

Step 3 (optional): retrieve results related to the node set sizes employed in the measurements

python3 get_node_set_sizes.py -m <ALL_MSM_DATA_DIR> -o <ALL_MSM_DATA_DIR>/node_set_sizes.json 

This will generate, within ALL_MSM_DATA_DIR, a node_set_sizes.json, as a dictionary with the following format:

  1. key = ID of measurement round
  2. values = dictionary with following key-values:
    1. planetlab: sampled, feasible, total nodes
    2. colos: sampled, feasible, total nodes
    3. ripe_atlas: eyeball_total, other_total, total, feasible (rar_eye, rar_other), sampled (rae, rar_eye, rar_other)
    4. start_timestamp (in UNIX epochs)
    5. end_timestamp (in UNIX epochs)

Step 4 (optional): retrieve results related to the timing of the measurements

python3 get_msm_times.py -m <ALL_MSM_DATA_DIR> -o <ALL_MSM_DATA_DIR>/msm_times.json

This will generate, within ALL_MSM_DATA_DIR, a msm_times.json, as a dictionary with the following format:

  1. key = ID of measurement round
  2. value = dictionary with key = number of round step, and value = start, end timestamps (in UNIX epochs)

Step 5: calculate top relay gain (in terms of percentages of improved pairs) vs number of relays

python3 get_relay_gain_vs_num.py -i <ALL_MSM_DATA_DIR> -p ../colo_tools/data/pdb_facs.json -o <ALL_MSM_DATA_DIR>/relay_gain.json

This will generate, within ALL_MSM_DATA_DIR, a relay_gain.json, as a dictionary with the following format:

  1. keys = rae2rar_other, rae2rar_eye, rae2plr, rae2cor
  2. values = cumulative percentage of improved pairs using 1 relay, 2 relays, etc.

Note that there is an all_instances extra key with value the total number of pairwise communications encountered.

Step 6: calculate top relay gains (in terms of percentages of improved pairs surpassing a certain latency threshold) vs number of relays

python3 get_relay_gain_vs_num_ms.py -i <ALL_MSM_DATA_DIR> -p ../colo_tools/data/pdb_facs.json -o <ALL_MSM_DATA_DIR>/relay_gain_ms.json

This will generate, within ALL_MSM_DATA_DIR, a relay_gain_ms.json, as a dictionary with the following format:

  1. keys = rae2rar_other, rae2rar_eye, rae2plr, rae2cor
  2. values = dictionary with following key-values:
    1. improvements: list of best latency improvements (msec) using 1 relay, 2 relays, etc.
    2. best_improvements: list of best latency improvements (msec) using all available relays
    3. instances: list of instances (pairwise communications), where 1 relays was used, 2 relays were used, etc.

Note that there is an all_instances extra key with value the total number of pairwise communications encountered.

Step 7: calculate top-N relays (in terms of percentages of improved pairs) over all measurement data, and collect their features (paragraph "Features of Top Facility Relays" of Sec. 3)

python3 get_top_relays.py -i <ALL_MSM_DATA_DIR> -p ../colo_tools/data/pdb_facs.json -n <N>

This will generate terminal output of the form:

----TYPE rae2cor----
top-1 RELAY = 195.2.24.177:
USES = 27543 (30.45478167604684 % of all cases, 39.87809115654138 % of all good cases)
REL COUNTRY = NL
REL FAC = Equinix Amsterdam South East (AM7) (62)
top-2 RELAY = 195.2.30.202:
...
---TOP FACS---
---TOP-1---
ID = 34, NAME = Telehouse London (Docklands North), CITY = London, COUNTRY = GB
NET COUNT = 361
RELAY LIST = ['41.188.60.215', '41.188.60.194', '197.149.9.7']
USES = 32122 (35.51786286889506 % of all cases, 46.50778942491458 % of all good cases)
TOP COUNTRIES
KW (1881))
PE (1591))
DO (1544))
PR (1471))
ZA (1441))
...

Visualization

This step assumes that the analysis of the previous steps is already done. Here, we include only the code snippets to generate the figures presented in the paper. We will include more commands that are available in the software at a later stage.

Step 0

cd viz_tools

Step 1, Fig. 1: Number of covered ASes/countries (log-scale) worldwide vs. the cutoff Internet user coverage (coverage for each AS in its respective country of operation).

--> please contact gnomikos at ics.forth.gr for generating this figure using matlab.

Step 2, Fig. 2: CDF of latency differences (RTT) vs. direct paths for the best relays (inducing minimal latency) per type per RAE pair. Improvements between 1 and 200ms are shown (83% of total cases).

python3 plot_min_latency_diffs_cdf.py -i <ALL_MSM_DATA_DIR> --max 200 -o Figures/diffs_cdf_min.eps

Step 3, Fig. 3: % of total cases (pairwise communications) where relayed paths improve latency against direct paths, vs. number of top relays (cut at top-100 relays for clarity).

python3 plot_relay_gain_vs_num.py -i <ALL_MSM_DATA_DIR>/relay_gain.json -o Figures/relay_gain_vs_num.eps

Step 4, Fig. 4: % of total cases (pairwise communications) where relayed paths improve latency against direct paths (top-10/all relays), vs. improvement threshold (cut at 100 ms). The best performance of each relay set is considered per case.

python3 plot_relay_gain_ms_cumul.py -i <ALL_MSM_DATA_DIR>/relay_gain_ms.json -n 10 -o Figures/relay_gain_ms_top.eps

Step 5, paragraph "Changing Countries and Paths" of Sec. 3:

python3 plot_min_latency_diffs_box_criterion.py -i <ALL_MSM_DATA_DIR>  -o Figures/diffs_box_cc_min.eps -c cc

Step 6, paragraph "Stability over Time" of Sec. 3:

python3 plot_cv_cdf.py -i <ALL_MSM_DATA_DIR> -o Figures/CV_cdf.eps --max 40

Contributing

Please contact vkotronis at ics.forth.gr for details on how to contribute to the project (e.g., in the form of pull requests), or if you find any issues/bugs with the current software. Also, please let us know if you find a problem with the current README instructions, as well as ideas for edits/extensions/simplifications (e.g., if you believe sth that should be included in the wiki is omitted).

Versioning

Currently using the default mechanisms of Bitbucket. Other mechanisms will be determined at a later point by the authors.

Authors

Other Contributions

  • Lefteris Manassakis leftman at ics.forth.gr (manual verification of eyeball ASes, used in the study and co-author of IMC publication)
  • Xenofontas Dimitropoulos fontas at ics.forth.gr (father ERC project (NetVolution) PI and co-author of IMC publication)

License

This project is licensed under the FreeBSD License - see the LICENSE.txt file for details

Online Resources

IMC Publication

IMC Publication

Measurement Datasets

Datasets can be downloaded from here.

Note that some data (e.g., the list of verified Colo IPs) is attainable from the software repository; in this onedrive link we include the datasets of Periscope measurements, a .xls with all verified eyeball ASes used in the study, and the main measurements that support Section 3 of the paper.

ATTENTION: The datasets are to be used in conjunction with the software provided within this repository.

The dataset format is as follows:

  1. set of msm_<num>.tar.gz files: these are all zipped files, each pertaining to a certain measurement round with ID=<num>. In particular, each unzipped folder contains:
    1. metadata.json: metadata pertaining to the round such as validity, total spend RA credits, last finished step, and timing per step (start, stop and duration).
    2. initial_ra_files: contains the classified eyeball RA probes/anchors (eyeball_<timestamp>.json), the unknown (other) RA probes/anchors (unknown_<timestamp>.json) and metadata on the classification process (ra_classifier_ret_dict.json), like total number of probes, etc. For each probe we preserve the following information: ID, latitude, longitude, is_anchor, asn_v4, cc, prefix_v4, address_v4.
    3. initial_pl_files: contains a list of alive PL nodes (alive_planetlab_nodes.json), a list of alive and pingable PL nodes (pingable_alive_planetlab_nodes.json) and a dictionary mapping the PL nodes to longitude, latitude and site ID (geolocated_pingable_alive_planetlab_nodes.json).
    4. initial_colo_files: contains the pingable checked (verified) Colo IPs (pingable_checked_colo_ips.json), following the format from the initial dataset of Giotsas et al, i.e., a dictionary keyed by IP, with values the neighbor IXP(s) and ASN, as well as the facility PeeringDB ID. Also, it contains a more complete json representation of the verified Colo IPs, in the format of a dictionary keyed by IP, with values: fac (PDB facility ID), longitude, latitude, city, country.
    5. sampled_rae.json: contains all sampled RAE endpoints, in the format of a dictionary keyed by ID, with values: latitude, longitude, is_anchor, asn_v4, cc, dns name (optional), prefix_v4, address_v4.
    6. sampled_rar_eye.json: contains all sampled RAR_eye relays, in the format of a dictionary keyed by ID, with values: latitude, longitude, is_anchor, asn_v4, cc, dns name (optional), prefix_v4, address_v4.
    7. sampled_rar_other.json: contains all sampled RAR_other relays, in the format of a dictionary keyed by ID, with values: latitude, longitude, is_anchor, asn_v4, cc, dns name (optional), prefix_v4, address_v4.
    8. sampled_plr.json: contains all sampled PLR relays, in the format of a dictionary keyed by PLR node name, with values: latitude, longitude, site ID.
    9. sampled_cor.json: contains all sampled COR relays, in the format of a dictionary keyed by COR IP address, with values: latitude, longitude, facility ID, city and country.
    10. fetched_msms: all raw msm data fetched during this round. Each file in this folder is named after the respective msm_id, and is a json containing measurement results from all issued source RAE, to a certain destination (RAE, RAR_other, RAR_eye, PLR or COR). For each particular src-dst result, we keep the average/minimum/maximum RTT, the source probe_id, the measurement timestamp, the src/dst IP address, and the public IP address of the src.
    11. rae2rae_prepare: contains 3 files: issue.json (a list of issued measuremnt tuples [<msm_id>, dst, number of src probes, packets per ping]), results.json (a list of all measurement results in the format described in item (10)), and valid_direct_rtts.json (a dictionary keyed by src, with first-level values the dst, and second-level values the actual median RTTs). These are essentially the results of step (2) of the workflow of Section 2.5.
    12. feasible_rar_other.json: for each RAE, contains a list of all feasible RAR_other relays (step (3) of the workflow of Section 2.5). If a RAR_other is feasible for two RAE nodes, then it can be used to form a relayed path for this RAE node pair.
    13. feasible_rar_eye.json: for each RAE, contains a list of all feasible RAR_eye relays (step (3) of the workflow of Section 2.5). If a RAR_eye is feasible for two RAE nodes, then it can be used to form a relayed path for this RAE node pair.
    14. feasible_plr.json: for each RAE, contains a list of all feasible PLR relays (step (3) of the workflow of Section 2.5). If a PLR is feasible for two RAE nodes, then it can be used to form a relayed path for this RAE node pair.
    15. feasible_cor.json: for each RAE, contains a list of all feasible COR relays (step (3) of the workflow of Section 2.5). If a COR is feasible for two RAE nodes, then it can be used to form a relayed path for this RAE node pair.
    16. rae2rae_results: contains 2 files: issue.json (a list of issued measuremnt tuples [<msm_id>, dst, number of src probes, packets per ping]), and results.json (a list of all measurement results in the format described in item (10)). This corresponds to step (4) of the workflow of Section 2.5).
    17. rae2rar_eye_results: contains 2 files: issue.json (a list of issued measuremnt tuples [<msm_id>, dst, number of src probes, packets per ping]), and results.json (a list of all measurement results in the format described in item (10)). This corresponds to step (4) of the workflow of Section 2.5).
    18. rae2rar_other_results: contains 2 files: issue.json (a list of issued measuremnt tuples [<msm_id>, dst, number of src probes, packets per ping]), and results.json (a list of all measurement results in the format described in item (10)). This corresponds to step (4) of the workflow of Section 2.5).
    19. rae2plr_results: contains 2 files: issue.json (a list of issued measuremnt tuples [<msm_id>, dst, number of src probes, packets per ping]), and results.json (a list of all measurement results in the format described in item (10)). This corresponds to step (4) of the workflow of Section 2.5).
    20. rae2cor_results: contains 2 files: issue.json (a list of issued measuremnt tuples [<msm_id>, dst, number of src probes, packets per ping]), and results.json (a list of all measurement results in the format described in item (10)). This corresponds to step (4) of the workflow of Section 2.5).
  2. periscope_extra: contains the following in relation to the geolocation verification of Colo IPs, using the respective Periscope LGs in the candidate city of presence of each facility:
    1. periscope_created_msm_ids.json: dictionary of Colo IPs mapped to measurement IDs.
    2. periscope_results.tar.gz: contains all measurement results from Periscope, in the form of periscope_results_<checked_colo_IP>.json, each result follows the format of the official API, found here.
  3. ASN_coverage.xls: excel file which contains the following information for all verified eyeball ASes:
    1. User coverage (%)
    2. ASN
    3. Information from bgoview.io
    4. CC (country code)
    5. Country (full name)
    6. Company name
    7. URL of website
    8. Information from Wikipedia (where available)
    9. Information from other sources (where available)

Acknowledgments

  • This work has been funded by the EU Research Council Grant Agreement no. 338402.
  • Thanks to the RIPE Atlas community for providing access to their RIPE Atlas probes and measurement interfaces.
  • Thanks to the PlanetLab community for maintaining a publicly available global testbed with rich features.
  • Thanks to Christos Papachristos from FORTH-ICS for helping with the setup of the PlanetLab nodes that were used.
  • Thanks to CAIDA and Vasileios Giotsas for making the dataset related to facility detection publicly available.
  • Thanks to Ioanna Papafili from COSMOTE for the initial work and inpiration of this research and measurement effort.
  • Thanks to Pavlos Sermpezis from FORTH-ICS for his helpful feedback on the publication.