HTTPS SSH

The UniverseMachine

Most code: Copyright (C)2011-2018 Peter Behroozi

License: GNU GPLv3

Science/Documentation Paper: https://arxiv.org/abs/1806.07893

Contents

Overview

The UniverseMachine applies simple empirical models of galaxy formation to dark matter halo merger trees. For each model, it generates an entire mock universe, which it then observes in the same way as the real Universe to calculate a likelihood function. It includes an advanced MCMC algorithm to explore the allowed parameter space of empirical models that are consistent with observations.

Precalculated Data Products

The latest data release is available at http://www.peterbehroozi.com/data.html. The UniverseMachine can generate many data products (e.g., stellar mass functions, stellar mass---halo mass relations, star formation histories, etc.) and the list is growing continuously. This section describes some of the available data products, which are all in the data subdirectory of the data release tarball. Except where otherwise specified, errors represent the uncertainties in the model posterior distribution.

  1. Observed versus True Stellar Masses and Star Formation Rates

    The UniverseMachine keeps track of two stellar masses -- the "true" stellar mass, and the "observed" stellar mass. The true stellar mass is the physically self-consistent stellar mass given by the integral of past star formation minus stellar mass loss. The observed stellar mass includes systematic offsets as well as scatter that both change as a function of redshift. Which one is more useful depends on the purpose---those wanting physically self-consistent star formation histories should use the true stellar mass; those wanting to compare with other observations should use the observed stellar mass. A similar distinction applies to star formation rates, where the observed rates include additional scatter and systematic offsets, but are the closest match to other observations; the true SFRs are by contrast guaranteed to be physically self-consistent with the true stellar masses.

  2. Meaning of Values and Uncertainties

    Almost all data products include the bestfit value as well as the 68% confidence interval from the model posterior space. The column Err+ gives the difference between the 84th-percentile model and the bestfit model; the column Err- gives the difference between the bestfit model and the 16th-percentile model. Hence, if the best-fit value, Err+, and Err- columns are A, B, and C, respectively, the 68% confidence interval ranges from A-C to A+B. Exceptions to this rule are always noted in the data files.

    As a general rule, Bolshoi-Planck becomes increasingly incomplete for satellite galaxies with peak halo masses Mpeak < 10^10.5M⊙ and for central galaxies with peak halo masses Mpeak < 10^10M⊙. The incompleteness for massive halos varies with redshift; most data files include galaxy/halo counts so that it is clear where the statistics become unreliable.

  3. Correlation Functions

    Found in data/corrs. Correlation functions are in corr_sm*; the filename gives the observed stellar mass range and the scale factor at which the correlation function was calculated. Correlation functions for all, star-forming, and quenched galaxies are included, as is the star-forming x quenched cross-correlation function. Values for, e.g., π_max, redshift errors, etc., depend on the parameter file used, but are documented in each file. Ratios of correlation functions are in corr_ratios_sm*. The meaning of the filename is the same (stellar mass range and the scale factor at which the correlation function was calculated). Ratios of quenched to star-forming, star-forming to all, quenched to all, and the cross-correlation to all galaxies are included.

  4. Cosmic Star Formation Rates

    Found in data/csfrs. This includes the total observed CSFR, the observed CSFR for galaxies with M1500<-17 (AB), and the true CSFR. For Bolshoi-Planck, the M1500<-17 CSFR is almost identical to the total CSFR, as the simulation becomes increasingly incomplete for M1500>-19. This also suggests that the "total" CSFRs at z> 8 are underestimates of the true total CSFRs.

  5. Ex-Situ Fractions

    Found in data/ex_situ. Ex-situ fractions (i.e., fractions of mass accreted in mergers) as a function of observed stellar mass are in ex_situ_a*, where the filename includes the scale factor. Ex-situ fractions as a function of Mpeak are in ex_situ_hm_a*.

  6. Halo Mass Functions

    Found in data/hmfs. Both the total mass function (including satellites) as well as the satellite fractions are in hmf_a*, where the filename includes the scale factor. While central halos are invariant for a given simulation, the satellite fraction can vary as a result of the orphan threshold (see the UniverseMachine paper).

  7. Infall and Quenching Distribution Statistics for Satellites

    Found in data/infall_stats. The time distributions since satellite first infall (i.e., the 50th, 84th, and 16th percentiles) as a function of observed satellite stellar mass are in infall_delay_times_a*, where the filename includes the scale factor. These percentiles refer to the distribution for individual galaxies; each percentile is accompanied by uncertainties across the model posterior distribution. I.e., the median time since infall is reported for the best-fit model, followed by the 68% confidence interval on the median across model posterior space, followed by the 84th percentile time since infall, followed by the 68% confidence interval on the 84th percentile across model posterior space, and so on. The time distributions are recorded for satellites of Milky Way--mass hosts, group-mass hosts, and cluster-mass hosts; the definitions for each host mass are given in the file.

    Quenching delay times (i.e., the time delay between satellite infall and satellite quenching for quenched satellites) as a function of observed satellite stellar mass are in quenching_delay_times_a*. As with infall delay times, the 50th, 84th, and 16th percentiles of the distribution for individual satellites are given, for satellites of Milky Way--mass hosts, group-mass hosts, and cluster-mass hosts. Similarly, infall SSFR distributions as a function of observed satellite stellar mass are given in infall_ssfrs_a*. Only a restricted set of scale factors are available for the latter to reduce the disk space necessary for postprocessing.

  8. Observations and Best-fit Model

    Found in data/obs.txt. The top line contains the best-fit model, which may be used to generate new catalogs with the make_sf_catalog command (Section 4.3). The remaining lines contain one data point per line, including both the observed value and the modeled value. The data point type can be one of the following:

    • smf: observed stellar mass function (i.e., galaxy number density); units of comoving Mpc^-3 dex^-1.
    • uvlf: M1500,UV luminosity function; units of comoving Mpc^-3 mag^-1.
    • qf: quenched fraction as a function of observed stellar mass, using the Moustakas/PRIMUS quenching threshold.
    • qf_uvj: UVJ quenched fraction as a function of observed stellar mass.
    • ssfr: average observed specific star formation rate (for all galaxies) as a function of observed stellar mass; units of yr^-1.
    • csfr: total observed cosmic star formation rate; units of M⊙ yr^-1 comoving Mpc^-3.
    • csfr_(uv): total observed cosmic star formation rate with M1500<-17 (AB); same units as csfr.
    • correlation: projected autocorrelation functions; units of comoving Mpc.
    • conformity: central galactic conformity, currently unused.
    • lensing: weak lensing, currently unused.
    • cdens_fsf: fraction of star-forming central galaxies, as a function of environment density.
    • cdens_ssfr_sf: observed specific star formation rates for star-forming central galaxies, as a function of environmental density; currently unused.
    • uvsm: median stellar mass as a function of M1500; units of M⊙.

    The data point subtype is typically "a" for "all galaxies," but for correlation functions, it can also be "s" for star-forming galaxies and "q" for quenched galaxies. The columns Z1 and Z2 contain the redshift range of the observation; step1 and step2 are the corresponding range of simulation snapshot numbers used. SM1 and SM2 are the stellar mass bin, except for csfr (where the stellar mass bin is meaningless), uvlf (where the UV magnitude bin is given instead), and cdens_fsf (where the bin for the number of neighbors is given instead). smb1 and smb2 give the internal stellar mass/UV/environment bin indices used by the UniverseMachine. R1 and R2 are the range of radii used, which is only relevant for correlation functions.

    The observed data point is given by the Val column, with +/- uncertainties given in the Err_h and Err_l columns, respectively. All values and uncertainties are given in log10 units, except for quenched and star-forming fractions (qf, qf_uvj, cdens_fsf), which are in linear units. The best-fit model result is given in the Model_Val column, with the +/- 68% confidence interval of the model posteriors in the MV+ and MV- columns. The best-fit χ^2 and 68% range are given in the next three columns. Here, χ^2 values may be zero if the model result is within the calculation error tolerance of the observed value (to prevent over-fitting). Correlation functions and other observations that use covariance matrices may not have a direct relation between χ^2 values and model -- observed differences for individual data points.

  9. Stellar Mass--Halo Mass Relations

    1. Median Measurements from the Simulation

      Direct measurements of the median relations binned on halo peak mass are found in data/smhm/median_raw. The median SMHM ratios are in smhm_a*, and are available for both observed and true stellar masses, as well as subsamples (e.g., centrals, satellites, quenched, star-forming, etc.); the filename includes the scale factor. The halo mass column gives log10(Mpeak/M⊙), the SMHM ratio columns give the median log10(M⭑ / Mpeak), and the error columns give the +/- uncertainties in dex. Errors on the observed stellar mass ratios should be interpreted as statistical errors; errors on the true stellar mass ratios should be interpreted as statistical+systematic errors. Ratios of subsamples (e.g., central galaxies vs. all galaxies) are in ratios_a*. As the model currently applies the same offset between observed and true stellar masses to all galaxies, the stellar mass ratios are the same for observed and true stellar masses; hence, no distinction is made in the file. Finally, measurements of the scatter in log10(M⭑ / Mpeak) (in dex) are in smhm_scatter_a*.

    2. Median Fits

      Fits to the measurements above are found in data/smhm/median_fits in pretabulated form for both smhm_a* and ratios_a*; the filename includes the scale factor. Residuals with the direct measurements are found in smhm_residuals*; these should be examined if using the fits for a rare population (e.g., high-redshift quiescent galaxies).

      The fit parameters listed in the paper are found in data/smhm/params, along with a Python script to generate SMHM ratios at arbitrary redshifts.

    3. Average Measurements from the Simulation

      Averages of log10(M⭑ / Mpeak) (both for observed and true stellar mass) as a function of peak halo mass are found in data/smhm/averages/sm_averages_a*; the filename includes the scale factor. Average halo masses as a function of observed stellar mass are found in data/smhm/averages/hm_averages_a*. Here, several different averages are available, including the linear average peak halo mass (⟨Mpeak⟩), the log average peak halo mass (⟨log10(Mpeak)⟩), and the weak lensing-averaged halo mass (⟨Mpeak^2/3⟩^3/2).

  10. Quenched Fractions

    Found in data/qfs. Basic quenched fractions according to three different quenching definitions (Moustakas/PRIMUS, SSFR < 10^-11 yr^-1, and UVJ) are found as a function of observed stellar mass in qf_a* and as a function of peak halo mass in qf_hm_a*; the filename includes the scale factor. Statistics for the quenched fraction of all centrals and all satellites, as well as satellites of Milky Way-mass hosts, group-mass hosts, and cluster-mass hosts are found as a function of observed stellar mass in qf_groupstats_a* and of peak (satellite) halo mass in qf_hm_groupstats_a*. The fractions of quenched satellites that were quenched after infall for Milky Way-mass, group-mass, and cluster-mass hosts are found as a function of observed stellar mass in qf_groupstats_infall_a* and of peak (satellite) halo mass in qf_hm_groupstats_infall_a*. The fractions of quenched satellites that were quenched due to infall (calculated as f_{q,sat} - f_{q,cen}) for all satellites as well as satellites of Milky Way-mass hosts, group-mass hosts, and cluster-mass hosts are found as a function of observed stellar mass in qf_quenched_infall_a* and of peak (satellite) halo mass in qf_hm_quenched_infall_a*. The fractions of galaxies' most-massive progenitors that were quenched as a function of cosmic time for both currently star-forming and currently quenched galaxies are found for bins of observed stellar mass in qf_sm_histories_sm* and for bins of peak halo mass in qf_hm_histories_hm*. The exact range of stellar masses or halo masses in each bin is detailed in the file header.

  11. Rejuvenation Statistics

    Found in data/rejuvenation. The fractions of galaxies that rejuvenated (i.e., were quenched for at least 300 Myr and then were star-forming for at least 300 Myr thereafter) are found as a function of observed stellar mass in rejuv_a* and as a function of peak halo mass in rejuv_hm_a*; the filename includes the scale factor.

  12. Average Star Formation Histories

    Found in data/sfhs. Average star formation histories for all galaxies, centrals, satellites, star-forming, and quenched galaxies are found in bins of observed stellar mass in sfh_sm* and in bins of peak halo mass in sfh_hm*; the filename includes the mass bin and the scale factor.

  13. Stellar Mass Functions and Satellite Fractions

    Found in data/smfs. The stellar mass function (i.e., galaxy number density) as well as the satellite fraction as a function of observed stellar mass are found in smf_a*, where the filename includes the scale factor. The Bolshoi-Planck simulation is incomplete for low-mass galaxies and halos. This incompleteness is significant below 10^7M⊙ at z=0 and 10^8.5M⊙ at z=8.

  14. Average Specific Star Formation Rates

    Found in data/ssfrs. The average linear ratio of observed SFR to observed stellar mass as a function of observed stellar mass is found in ssfr_a*, where the filename includes the scale factor.

  15. UV Luminosity Functions

    Found for z≥4 in data/uvlfs and z<4 in data/uvlfs_uncalibrated; the directory naming reflects that z<4 UV luminosities do not have proper dust calibration in the model and are likely incorrect. Galaxy number densities as a function of UV magnitude (M1500,AB) are found in uvlf_a*; the filename includes the scale factor. The Bolshoi-Planck simulation becomes increasingly incomplete for M1500>-19.

  16. UV--Stellar Mass Relations

    Found for z≥4 in data/uvsm and z<4 in data/uvsm_uncalibrated; the directory naming reflects that z<4 UV luminosities do not have proper dust calibration in the model and are likely incorrect. Median observed stellar masses as a function of UV magnitude (M1500,AB) are found in uvsm_z*; the filename includes the redshift range. The Bolshoi-Planck simulation becomes increasingly incomplete for M1500>-19.

  17. Weak Lensing

    Found in data/weak_lensing. Galaxy-galaxy weak lensing shear predictions are found for all galaxies, star-forming galaxies, and quenched galaxies in wl_sm*. The filename includes the scale factor and the lower observed stellar mass threshold; i.e., only galaxies with masses above the threshold in the filename are included. Ratios of weak lensing predictions for quenched to star-forming, star-forming to all galaxies, and quenched to all galaxies are found in wl_ratios_sm*.

Mock Catalogs and Lightcones

Mock catalogs and lightcones are available for the best-fit model (see http://www.peterbehroozi.com/data.html), as described below.

  1. Halo and Galaxy Properties

    Halo and galaxy properties at every simulation snapshot are available at SFR/sfr_catalog_* and SFR_ASCII/sfr_catalog_*; the filename includes the scale factor. Halo properties include the halo ID (for cross-matching to Bolshoi-Planck halo catalogs), descendant ID, parent ID (for satellites), position, velocity, current mass and v_max, peak mass, and v_max at peak mass. Galaxy properties include the true stellar mass in the galaxy, true stellar mass in the intrahalo light, observed stellar mass, observed SFR, observed SSFR, true stellar mass / halo mass ratio, and observed UV luminosity (valid for z>4). Both binary (SFR) and text (SFR_ASCII) versions are available. The binary version is consecutive catalog_halo structures; Python and C loaders are provided in the same directory. See halo.h for the structure definition and print_sm_catalog.c in the UniverseMachine source code for a more advanced example of how to read the binary version. You can also use HaloTools to load the binary catalogs.

  2. Star Formation Histories

    Catalogs with star formation histories are available at specific redshifts (e.g., z=0, 1, 2) in SFH/sfh_catalog_*; the filename includes the scale factor. Besides the halo and galaxy properties in Section 3.1, these files contain star formation histories for the present-day stellar population in the galaxy (i.e., including all merged progenitors), star formation histories for the present-day population in the intrahalo light, the main progenitor galaxy's stellar mass history, the main progenitor's intrahalo light history, the main progenitor's halo mass history, the main progenitor's SFR history (i.e., excluding any mergers), the main progenitor's v_Mpeak (i.e., v_max at peak mass), and the main progenitor's \Deltav_max rank (expressed in units of standard deviations). These files are split into many pieces (144 for Bolshoi-Planck) to make them easier to analyze in parallel.

  3. CANDELS Lightcones

    Lightcones for the CANDELS fields (EGS, COSMOS, UDS, GOODS-N/S) are available in CANDELS_Lightcones/survey_*. The filenames include the field, the redshift range, the width of the lightcone in arcminutes ("x"), the height of the lightcone in arcminutes ("y"), and the lightcone index. Eight lightcones are available for each field---these are separate realizations of each field to aid in estimating sample variance. These lightcones contain the galaxy sky position (RA, Dec, z), the halo ID, lightcone 3D position, velocity, halo mass and v_max, galaxy true/observed stellar mass, intrahalo light, true/observed SFR, observed SSFR, true stellar mass to halo mass ratio, observed UV luminosity (only valid at z>4), and UV attenuation (only valid at z>4).

Running Basic Analyses

  1. Compiling

    If you use the GNU C compiler version 4.0 or above on a 64-bit machine, compiling should be as simple as typing "make" at the command prompt. If you use the Intel C compiler, uncomment the lines CC=icc and OPT_FLAGS=-fast in the Makefile before running "make".

    The UniverseMachine does not support compiling on 32-bit machines and has not been tested with other compilers. Additionally, it does not support non-Unix environments. (Mac OS X is fine; Windows is not). If you use the code to convert new merger trees to UniverseMachine format, you will need the GNU Scientific Library (GSL) installed; to compile this code, you should run "make treereg".

  2. Making New Lightcones

    Lightcones for arbitrary fields can be generated with the lightcone command after compiling. You will need the binary SFR catalogs (Section 3.1), the config file, and the list of snapshots (snaps.txt). After downloading, you'll have to edit the config file so that INBASE is the directory path where you've downloaded snaps.txt and OUTBASE is the directory where the binary SFR catalogs are located. Running the lightcone command gives a brief usage statment. z_low and z_high give the redshift range to generate the lightcone, x_arcmin gives the width of the lightcone in arcminutes, y_arcmin gives the height of the lightcone in arcminutes, samples gives the number of lightcone realizations to generate, and id_tag gives optional text to add to the output filename. do_collision_test, if specified as 1, will ensure that the lightcone doesn't overlap with itself. This is inadvisable except with very small lightcones; e.g., most volumes of interest will be of comparable size to Bolshoi-Planck. ra and dec give the center of the lightcone on the sky; theta gives the additional rotation (in degrees) of the lightcone around this central axis. Finally, rseed allows specifying the random seed for generating lightcone positions and orientations within the simulation. This is helpful if you want to generate a lightcone using the same halos for a different UniverseMachine model.

  3. Making New Catalogs

    The script scripts/make_sf_catalog.pl will generate new catalogs (as in Section 3.1) as well as, optionally, new star formation histories for a specified model. You will need the base simulation data (base/*.*), the config file, as well as the model parameters (e.g., those in the data release, in data/obs.txt). You will need to edit the config file so that INBASE is the directory where the base simulation data is located and OUTBASE is the directory to which the binary SFR catalogs (as in Section 3.1) and text SFHs (if specified) should be written. The script will use as many threads as you specify to generate the final catalog; the memory used is about 3GB per thread for Bolshoi-Planck. The final SFH catalogs will be in text format, but the final halo/galaxy property catalogs will be in binary format. To convert the latter to text format, you can use the print_sm_catalog command. For testing purposes, you can also use the make_sf_catalog command directly; this will generate a single piece of the catalog at a time. Run make_sf_catalog without options to see usage information.

Advanced Parallel Analysis

Pending.