# DownGlacier

DownGlacier is an empirical statistical downscaling (ESD) tool developed to retrieve glacier Surface Energy and Mass Balance (SEB/SMB) fluxes from large-scale atmospheric data. It is first described and used in the following publication:

Maussion, F., Gurgiser, W., Großhauser, M., Kaser, G., and Marzeion, B.: ENSO influence on surface energy and mass balance at Shallap Glacier, Cordillera Blanca, Peru, The Cryosphere, 9, 1663-1683, doi:10.5194/tc-9-1663-2015, 2015

Built on top of solid machine learning libraries (statsmodels and scikit-learn), DownGlacier provides a framework to the non-statiticians or non-programmers. It's purpose is to extend measured (or modelled) SEB/SMB timeseries to longer time periods in a semi-automated manner. It is inspired from similar tools used in the climate research community (e.g. [Wilby.etal.2002]), but is specifically developed for glaciological applications.

The SEB/SMB fluxes are either downscaled (predictands) or computed (diagnostic variables). The user has to provide the calibration data as well as the candidate predictor timeseries. The chosen statistical model and other run-time parameters are entered in a configuration file. The rest of the process is fully automated:

$downglacier configfile.cfg  ## Workflow A DownGlacier run works as follows: 1. Set-up: read the configuration file, read the data and create the working directory 2. Check input: test the predictors for collinarity, the preditands for autocorrelation, and verify that the surface energy balance input data is well understood (i.e that DownGlacier computes the diagnostic mass balance as expected) 3. Screening: for each single predictand, check the preditors and select the "best" model 4. Diagnostics: compute the diagnostic variables from the downscaled estimates 5. Scores: perform out-of-sample cross-validation and compute the skill scores 6. Plots: plot regression and/or validation and/or lasso paths 7. Prediction: compute downscaled and diagnostic variables for the whole predictors' period ## Input data The input data can be provided as NetCDF or CSV files. ### Calibration data DownGlacier requires gap-free monthly time-series of the full surface energy and mass balance budget fluxes. Since some of these fluxes cannot be measured directly, usually a SEB/SMB model is needed to compute variables such as melt, refreezing, etc. Currently, DownGlacier accepts input from one specific SEB/SMB model [Mölg.etal.2012], but it is possible to incorporate new models easily [*]. The energy and mass balance on a glacier surface can be written as follows: $$SW_{in} + SW_{out} + LW_{in} + LW_{out} + Q_{s} + Q_{l} = F$$ $$MB = PRCP_{solid} - F_{(T_{s}=0)} / l_{melt} - Q_{l} / l_{sub} + M_{sub}$$ With: $$SW$$, $$LW$$: short & long wave radiation $$Q_{s}$$, $$Q_{l}$$: sensible & latent heat fluxes $$F$$: energy residual available for ice warming or melting $$l_{melt}$$, $$l_{sub}$$: latent heat for melt & sublimation $$M_{sub}$$: subsurface mass fluxes (e.g. refreezing) $$SW_{in}$$, $$SW_{out}$$, $$LW_{in}$$, $$LW_{out}$$, $$Q_{s}$$, $$Q_{l}$$ are downscaled variables, the other variables are computed diagnostic variables. It is possible to provide the calibration SEB/SMB as point data valid at a specific position on the glacier, or as several altitude slices to compute the distributed glacier mass balance.  [*] Please contact me if you're interested in using DownGlacier with your own SEB/SMB data. ### Predictors The candidate predictor time-series must be provided by the user. These can be extracted from global reanalysis datasets or obtained elsewhere, but they must comply with certain conditions: • DownGlacier relies on the hypothesis that a substantial part of SEB variability can be explained by local weather which is, in turn, linked to the large-scale circulation. Therefore, a certain causal relationship is expected to exist between the predictors and the predictands. • Predictors might be collinear to a certain point, but highly collinear predictors will increase noise without skill gain (or even with skill loss in certain cases). Depending on the chosen model, highly collinear predictors should be removed beforehand. • All implemented models are designed for and can deal with high dimensional problems with a large number of predictors $$p$$ and smaller number of observations $$n$$. However, increasing $$p$$ will always increase noise, making the job of the shrinkage algorithms more difficult and might affect the out-of-sample skill of the model considerably. Appropriate care should be given to the choice of the candidate predictors. ## Available models Currently, DownGlacier integrates several regression models: Stepwise regression Stepwise regression is an iterative search of a subset of predictors to use for the ordinary least squares (OLS) regression. Predictors are added one by one according to a certain rule. Two possible rules are implemented at choice: (i) partial correlation (statistically significant at a chosen threshold) or (ii) improvement of the in-sample cross-validation RMSE (at a chosen threshold). After a predictor is added, it is verified that all previously selected predictors still follow the chosen rule (adding a new predictor in the combination can make previous predictors unsignificant). The "bad" predictors are removed and the process is reitered as long as no more predictor can be added or removed, thus being close to (but not guaranteed to be) a "best-subset" selection [Hastie.etal.2009]. Lasso The Least Absolute Shrinkage and Selection Operator (LASSO) is a shrinkage (or regularisation) method designed to address some problems of least-squares regression [Tibshirani.1996]. Among other advantages, it prevents overfitting by penalising the coefficients. The penalization coefficient $$\lambda$$ is determined using in-sample cross-validation. In our test cases as well as in many examples of the scientific litterature, LASSO proved to be more efficient and stable then stepwise-regression. Relaxed Lasso The relaxed Lasso [Meinshausen.2007] is an extension of the regular Lasso. It can be usefull for noisy high-dimensional problems. Lasso is run once to select potential candidates and then a second time without the noisy competitors. Constrained Lasso The Lasso penalization coefficient is chosen so that a maximal predefined number of predictors are selected. It allows an easier inference, but it is mostly less performant (especially with collinear problems). Lasso OLS Lasso is run once to select potential candidates, which are then used as predictors in a standard OLS model. Principle Component Regression All the models above can be used on the most-important principle components of the predictor time-series. It reduces the problem of collinearity, but the results are strongly dependant on the choice of the chosen threshold of explained variance at which to select the principle components. Each of this model can be selected in the configuration file without having to modify the code. The design of DownGlacier allows to incorporate new models easily if needed. ## Dependencies DownGlacier is tested with Python 3.3+ and 2.7. A certain number of packages are required before install, all of them available via pip or conda: numpy scipy scikit-learn statsmodels netCDF4 pandas matplotlib seaborn configobj  If you wish to reproduce the analyses and plots of the TCD paper, you will also need: ipython jupyter runipy  ## Installation Install with pip: $ pip install git+https://bitbucket.org/fmaussion/downglacier.git#egg=DownGlacier


Test:

$python /path_to_your_python_install/site-packages/downglacier/test/test_downglacier.py  Uninstall: $ pip uninstall DownGlacier


Or you could clone the repository directly:

$git clone https://fmaussion@bitbucket.org/fmaussion/downglacier.git  And install DownGlacier in "editable mode" for development: $ pip install -e path/to/DownGlacier_clone


## User guide

A notebook explaining the basics of DownGlacier is found in the repository's examples directory.

All parameters necessary for a run are gathered in a configuration file (.cfg). The different options are documented in the CONFIGFILE.rst file in the docs directory.

In the project's sandbox package you will find a tcd_utils.py module which contains a few utilitary routines which should work out of the box. run_all_tcd() will create a working directory in your home folder and run all experiments (runs in about an hour with multiprocessing). run_all_figs() generates all plots of the TCD paper.