Merge pull request #888 from aphearin/cam_overhaul_pr

Bin-free conditional abundance matching
astropy · Mar 12, 2018 · 5b351c8 · 5b351c8
2 parents 25eabab + fe21a9f
commit 5b351c8
Show file tree

Hide file tree

Showing 33 changed files with 2,619 additions and 110 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -15,6 +15,8 @@
 
 - Renamed old implementation of `conditional_abunmatch` to `conditional_abunmatch_bin_based`
 
+- Added new bin-free implementation of `conditional_abunmatch`.
+
 
 0.6 (2017-12-15)
 ----------------

diff --git a/docs/_static/cam_example_assembias_clf.png b/docs/_static/cam_example_assembias_clf.png
diff --git a/docs/_static/cam_example_bulge_disk_ratio.png b/docs/_static/cam_example_bulge_disk_ratio.png
diff --git a/docs/_static/cam_example_complex_sfr.png b/docs/_static/cam_example_complex_sfr.png
diff --git a/docs/_static/cam_example_complex_sfr_dmdt_correlation.png b/docs/_static/cam_example_complex_sfr_dmdt_correlation.png
diff --git a/docs/_static/cam_example_complex_sfr_recovery.png b/docs/_static/cam_example_complex_sfr_recovery.png
diff --git a/docs/function_usage/utility_functions.rst b/docs/function_usage/utility_functions.rst
@@ -58,3 +58,10 @@ Probabilistic binning
 .. autosummary::
 
     fuzzy_digitize
+
+Estimating two-dimensional PDFs
+===============================================================
+
+.. autosummary::
+
+    sliding_conditional_percentile
diff --git a/docs/notebooks/cam_modeling/cam_complex_sfr_tutorial.ipynb b/docs/notebooks/cam_modeling/cam_complex_sfr_tutorial.ipynb
diff --git a/docs/notebooks/cam_modeling/cam_decorated_clf.ipynb b/docs/notebooks/cam_modeling/cam_decorated_clf.ipynb
diff --git a/docs/notebooks/cam_modeling/cam_disk_bulge_ratios_demo.ipynb b/docs/notebooks/cam_modeling/cam_disk_bulge_ratios_demo.ipynb
diff --git a/docs/quickstart_and_tutorials/index.rst b/docs/quickstart_and_tutorials/index.rst
@@ -49,7 +49,7 @@ Galaxy-Halo Modeling
    tutorials/model_building/preloaded_models/index
    tutorials/model_building/index
    ../source_notes/empirical_models/factories/index
-   ../quickstart_and_tutorials/tutorials/model_building/cam_tutorial
+   ../quickstart_and_tutorials/tutorials/model_building/cam_tutorial_pages/cam_tutorial
 
 Galaxy/Halo Catalog Analysis
 ---------------------------------------------------------

diff --git a/docs/quickstart_and_tutorials/tutorials/model_building/cam_tutorial.rst b/docs/quickstart_and_tutorials/tutorials/model_building/cam_tutorial.rst
diff --git a/...t_and_tutorials/tutorials/model_building/cam_tutorial_pages/cam_complex_sfr.rst b/...t_and_tutorials/tutorials/model_building/cam_tutorial_pages/cam_complex_sfr.rst
@@ -0,0 +1,95 @@
+.. _cam_complex_sfr:
+
+Modeling Complex Star-Formation Rates
+==============================================
+
+In this example, we will show how to use Conditional Abundance Matching to model
+a correlation between the mass accretion rate of a halo and the specific
+star-formation rate of the galaxy living in the halo.
+The code used to generate these results can be found here:
+
+    **halotools/docs/notebooks/cam_modeling/cam_complex_sfr_tutorial.ipynb**
+
+Observed star-formation rate distribution
+------------------------------------------
+
+We will work with a distribution of star-formation
+rates that would be difficult to model analytically, but that is well-sampled
+by some observed galaxy population. The particular form of this distribution
+is not important for this tutorial, since our CAM application will directly
+use the "observed" population to define the distribution that we recover.
+
+.. image:: /_static/cam_example_complex_sfr.png
+
+The plot above shows the specific star-formation rates of the
+toy galaxy distribution we have created for demonstration purposes.
+Briefly, there are separate distributions for quenched and star-forming galaxies.
+For the quenched galaxies, we model sSFR using an exponential power law;
+for star-forming galaxies, we use a log-normal;
+implementation details can be found in the notebook.
+
+
+Modeling sSFR with CAM
+------------------------------------------
+
+We will start out by painting stellar mass onto subhalos
+in the Bolshoi simulation, which we do using
+the stellar-to-halo mass relation from Moster et al 2013.
+
+.. code:: python
+
+    from halotools.sim_manager import CachedHaloCatalog
+    halocat = CachedHaloCatalog()
+
+    from halotools.empirical_models import Moster13SmHm
+    model = Moster13SmHm()
+
+    halocat.halo_table['stellar_mass'] = model.mc_stellar_mass(
+        prim_haloprop=halocat.halo_table['halo_mpeak'], redshift=0)
+
+
+Algorithm description
+~~~~~~~~~~~~~~~~~~~~~~
+
+We will now use CAM to paint star-formation rates onto these model galaxies.
+The way the algorithm works is as follows. For every model galaxy,
+we find the observed galaxy with the closest stellar mass.
+We set up a window of ~200 observed galaxies bracketing this matching galaxy;
+this window defines :math:`{\rm Prob(< sSFR | M_{\ast})}`, which allows us to
+calculate the rank-order sSFR-percentile for each galaxy in the window.
+Similarly, we set up a window of ~200 model galaxies; this window
+defines :math:`{\rm Prob(< dM_{vir}/dt | M_{\ast})}`, which allows us to
+calculate the rank-order accretion-rate-percentile of our model galaxy,
+:math:`r_1`. Then we simply search the observed window for the
+observed galaxy whose rank-order sSFR-percentile equals
+:math:`r_1`, and map its sSFR value onto our model galaxy.
+We perform that calculation for every model galaxy with the following syntax:
+
+.. code:: python
+
+    from halotools.empirical_models import conditional_abunmatch
+    x = halocat.halo_table['stellar_mass']
+    y = halocat.halo_table['halo_dmvir_dt_100myr']
+    x2 = galaxy_mstar
+    y2 = np.log10(galaxy_ssfr)
+    nwin = 201
+    halocat.halo_table['log_ssfr'] = conditional_abunmatch(x, y, x2, y2, nwin)
+
+
+Results
+~~~~~~~~~~~~~~~~~~~~~~
+
+Now let's inspect the results of our calculation. First we show that the
+distribution specific star-formation rates of our model galaxies
+matches the observed distribution across the range of stellar mass:
+
+
+.. image:: /_static/cam_example_complex_sfr_recovery.png
+
+Next we can see that these sSFR values are tightly correlated
+with halo accretion rate at fixed stellar mass:
+
+.. image:: /_static/cam_example_complex_sfr_dmdt_correlation.png
+
+
+
diff --git a/...and_tutorials/tutorials/model_building/cam_tutorial_pages/cam_decorated_clf.rst b/...and_tutorials/tutorials/model_building/cam_tutorial_pages/cam_decorated_clf.rst
@@ -0,0 +1,92 @@
+.. _cam_decorated_clf:
+
+
+Modeling Central Galaxy Luminosity with Assembly Bias
+==========================================================================
+
+In this example, we will show how to use Conditional Abundance Matching to
+map central galaxy luminosity onto halos in a way that simultaneously correlates
+with halo :math:`M_{\rm vir}` and halo :math:`V_{\rm max}`.
+The code used to generate these results can be found here:
+
+    **halotools/docs/notebooks/cam_modeling/cam_decorated_clf.ipynb**
+
+
+Baseline mass-to-light model
+------------------------------------------
+
+The approach we will demonstrate in this tutorial is very similar to the ordinary
+Conditional Luminosity Function model (CLF) of central galaxy luminosity.
+In the CLF, a parameterized form is chosen for the median luminosity
+of central galaxies as a function of host halo mass. The code below
+shows how to calculate this median luminosity for every host halo in the Bolshoi simulation:
+
+.. code:: python
+
+    from halotools.sim_manager import CachedHaloCatalog
+    halocat = CachedHaloCatalog(simname='bolplanck')
+    host_halos = halocat.halo_table[halocat.halo_table['halo_upid']==-1]
+    from halotools.empirical_models import Cacciato09Cens
+    model = Cacciato09Cens()
+    host_halos['median_luminosity'] = model.median_prim_galprop(
+        prim_haloprop=host_halos['halo_mvir'])
+
+To generate a Monte Carlo realization of the model,
+one typically assumes that luminosities are distributed
+as a log-normal distribution centered about this median relation.
+While there is already a convenience function
+`~halotools.empirical_models.Cacciato09Cens.mc_prim_galprop` for the
+`~halotools.empirical_models.Cacciato09Cens` class that handles this,
+it is straightforward to do this yourself
+using the `~scipy.stats.norm` function in `scipy.stats`.
+You just need to generate uniform random numbers and pass the result to the
+`scipy.stats.norm.isf` function:
+
+.. code:: python
+
+    from scipy.stats import norm
+    loc = np.log10(host_halos['median_luminosity'])
+    uran = np.random.rand(len(host_halos))
+    host_halos['luminosity'] = 10**norm.isf(1-uran, loc=loc, scale=0.2)
+
+The *isf* function analytically evaluates the inverse CDF of the normal distribution,
+and so this Monte Carlo method is based on inverse transformation sampling.
+It is probably more common to use `numpy.random.normal` for this purpose,
+but doing things with `scipy.stats.norm` will make it easier
+to see how CAM works in the next section.
+
+
+Correlating scatter in luminosity with halo :math:`V_{\rm max}`
+----------------------------------------------------------------
+
+As described in :ref:`cam_basic_idea`, we can generalize the inverse transformation sampling
+technique so that the modeled variable is not purely stochastic, but is instead
+correlated with some other variable. In this example, we will choose to
+correlate the scatter with :math:`V_{\rm max}`. To do so, we need to calculate
+:math:`{\rm Prob}(<V_{\rm max}\vert M_{\rm vir})`, which we can do using
+the `~halotools.utils.sliding_conditional_percentile` function.
+
+.. code:: python
+
+    from halotools.utils import sliding_conditional_percentile
+    x = host_halos['halo_mvir']
+    y = host_halos['halo_vmax']
+    nwin = 301
+    host_halos['vmax_percentile'] = sliding_conditional_percentile(x, y, nwin)
+
+Now that :math:`{\rm Prob}(<V_{\rm max}\vert M_{\rm vir})` has been calculated,
+we simply use these values instead of uniform randoms as the argument to the
+`scipy.stats.norm.isf` function. This way, below-average values of :math:`V_{\rm max}`
+will be assigned below-average values of luminosity, and conversely.
+
+.. code:: python
+
+    loc = np.log10(host_halos['median_luminosity'])
+    u = host_halos['vmax_percentile']
+    host_halos['luminosity'] = 10**norm.isf(1-u, loc=loc, scale=0.2)
+
+The plot below illustrates our results. The gray points show the Monte Carlo realization
+of central galaxy luminosity as a function of host halo mass. Each of the different
+curves shows the median relation calculated for halos with different values of :math:`V_{\rm max}`.
+
+.. image:: /_static/cam_example_assembias_clf.png
-Original file line number
+Diff line change
@@ Expand Up / @@ -15,6 +15,8 @@ @@
     - Renamed old implementation of `conditional_abunmatch` to `conditional_abunmatch_bin_based`
+    - Added new bin-free implementation of `conditional_abunmatch`.
 .6 (2017-12-15)
     ----------------
@@ Expand Down @@