Skip to content

Commit

Permalink
Merge pull request #888 from aphearin/cam_overhaul_pr
Browse files Browse the repository at this point in the history
Bin-free conditional abundance matching
  • Loading branch information
aphearin authored Mar 12, 2018
2 parents 25eabab + fe21a9f commit 5b351c8
Show file tree
Hide file tree
Showing 33 changed files with 2,619 additions and 110 deletions.
2 changes: 2 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@

- Renamed old implementation of `conditional_abunmatch` to `conditional_abunmatch_bin_based`

- Added new bin-free implementation of `conditional_abunmatch`.


0.6 (2017-12-15)
----------------
Expand Down
Binary file added docs/_static/cam_example_assembias_clf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/cam_example_bulge_disk_ratio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/cam_example_complex_sfr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/cam_example_complex_sfr_recovery.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions docs/function_usage/utility_functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,10 @@ Probabilistic binning
.. autosummary::

fuzzy_digitize

Estimating two-dimensional PDFs
===============================================================

.. autosummary::

sliding_conditional_percentile
362 changes: 362 additions & 0 deletions docs/notebooks/cam_modeling/cam_complex_sfr_tutorial.ipynb

Large diffs are not rendered by default.

223 changes: 223 additions & 0 deletions docs/notebooks/cam_modeling/cam_decorated_clf.ipynb

Large diffs are not rendered by default.

252 changes: 252 additions & 0 deletions docs/notebooks/cam_modeling/cam_disk_bulge_ratios_demo.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/quickstart_and_tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Galaxy-Halo Modeling
tutorials/model_building/preloaded_models/index
tutorials/model_building/index
../source_notes/empirical_models/factories/index
../quickstart_and_tutorials/tutorials/model_building/cam_tutorial
../quickstart_and_tutorials/tutorials/model_building/cam_tutorial_pages/cam_tutorial

Galaxy/Halo Catalog Analysis
---------------------------------------------------------
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
.. _cam_complex_sfr:

Modeling Complex Star-Formation Rates
==============================================

In this example, we will show how to use Conditional Abundance Matching to model
a correlation between the mass accretion rate of a halo and the specific
star-formation rate of the galaxy living in the halo.
The code used to generate these results can be found here:

**halotools/docs/notebooks/cam_modeling/cam_complex_sfr_tutorial.ipynb**

Observed star-formation rate distribution
------------------------------------------

We will work with a distribution of star-formation
rates that would be difficult to model analytically, but that is well-sampled
by some observed galaxy population. The particular form of this distribution
is not important for this tutorial, since our CAM application will directly
use the "observed" population to define the distribution that we recover.

.. image:: /_static/cam_example_complex_sfr.png

The plot above shows the specific star-formation rates of the
toy galaxy distribution we have created for demonstration purposes.
Briefly, there are separate distributions for quenched and star-forming galaxies.
For the quenched galaxies, we model sSFR using an exponential power law;
for star-forming galaxies, we use a log-normal;
implementation details can be found in the notebook.


Modeling sSFR with CAM
------------------------------------------

We will start out by painting stellar mass onto subhalos
in the Bolshoi simulation, which we do using
the stellar-to-halo mass relation from Moster et al 2013.

.. code:: python
from halotools.sim_manager import CachedHaloCatalog
halocat = CachedHaloCatalog()
from halotools.empirical_models import Moster13SmHm
model = Moster13SmHm()
halocat.halo_table['stellar_mass'] = model.mc_stellar_mass(
prim_haloprop=halocat.halo_table['halo_mpeak'], redshift=0)
Algorithm description
~~~~~~~~~~~~~~~~~~~~~~

We will now use CAM to paint star-formation rates onto these model galaxies.
The way the algorithm works is as follows. For every model galaxy,
we find the observed galaxy with the closest stellar mass.
We set up a window of ~200 observed galaxies bracketing this matching galaxy;
this window defines :math:`{\rm Prob(< sSFR | M_{\ast})}`, which allows us to
calculate the rank-order sSFR-percentile for each galaxy in the window.
Similarly, we set up a window of ~200 model galaxies; this window
defines :math:`{\rm Prob(< dM_{vir}/dt | M_{\ast})}`, which allows us to
calculate the rank-order accretion-rate-percentile of our model galaxy,
:math:`r_1`. Then we simply search the observed window for the
observed galaxy whose rank-order sSFR-percentile equals
:math:`r_1`, and map its sSFR value onto our model galaxy.
We perform that calculation for every model galaxy with the following syntax:

.. code:: python
from halotools.empirical_models import conditional_abunmatch
x = halocat.halo_table['stellar_mass']
y = halocat.halo_table['halo_dmvir_dt_100myr']
x2 = galaxy_mstar
y2 = np.log10(galaxy_ssfr)
nwin = 201
halocat.halo_table['log_ssfr'] = conditional_abunmatch(x, y, x2, y2, nwin)
Results
~~~~~~~~~~~~~~~~~~~~~~

Now let's inspect the results of our calculation. First we show that the
distribution specific star-formation rates of our model galaxies
matches the observed distribution across the range of stellar mass:


.. image:: /_static/cam_example_complex_sfr_recovery.png

Next we can see that these sSFR values are tightly correlated
with halo accretion rate at fixed stellar mass:

.. image:: /_static/cam_example_complex_sfr_dmdt_correlation.png



Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
.. _cam_decorated_clf:


Modeling Central Galaxy Luminosity with Assembly Bias
==========================================================================

In this example, we will show how to use Conditional Abundance Matching to
map central galaxy luminosity onto halos in a way that simultaneously correlates
with halo :math:`M_{\rm vir}` and halo :math:`V_{\rm max}`.
The code used to generate these results can be found here:

**halotools/docs/notebooks/cam_modeling/cam_decorated_clf.ipynb**


Baseline mass-to-light model
------------------------------------------

The approach we will demonstrate in this tutorial is very similar to the ordinary
Conditional Luminosity Function model (CLF) of central galaxy luminosity.
In the CLF, a parameterized form is chosen for the median luminosity
of central galaxies as a function of host halo mass. The code below
shows how to calculate this median luminosity for every host halo in the Bolshoi simulation:

.. code:: python
from halotools.sim_manager import CachedHaloCatalog
halocat = CachedHaloCatalog(simname='bolplanck')
host_halos = halocat.halo_table[halocat.halo_table['halo_upid']==-1]
from halotools.empirical_models import Cacciato09Cens
model = Cacciato09Cens()
host_halos['median_luminosity'] = model.median_prim_galprop(
prim_haloprop=host_halos['halo_mvir'])
To generate a Monte Carlo realization of the model,
one typically assumes that luminosities are distributed
as a log-normal distribution centered about this median relation.
While there is already a convenience function
`~halotools.empirical_models.Cacciato09Cens.mc_prim_galprop` for the
`~halotools.empirical_models.Cacciato09Cens` class that handles this,
it is straightforward to do this yourself
using the `~scipy.stats.norm` function in `scipy.stats`.
You just need to generate uniform random numbers and pass the result to the
`scipy.stats.norm.isf` function:

.. code:: python
from scipy.stats import norm
loc = np.log10(host_halos['median_luminosity'])
uran = np.random.rand(len(host_halos))
host_halos['luminosity'] = 10**norm.isf(1-uran, loc=loc, scale=0.2)
The *isf* function analytically evaluates the inverse CDF of the normal distribution,
and so this Monte Carlo method is based on inverse transformation sampling.
It is probably more common to use `numpy.random.normal` for this purpose,
but doing things with `scipy.stats.norm` will make it easier
to see how CAM works in the next section.


Correlating scatter in luminosity with halo :math:`V_{\rm max}`
----------------------------------------------------------------

As described in :ref:`cam_basic_idea`, we can generalize the inverse transformation sampling
technique so that the modeled variable is not purely stochastic, but is instead
correlated with some other variable. In this example, we will choose to
correlate the scatter with :math:`V_{\rm max}`. To do so, we need to calculate
:math:`{\rm Prob}(<V_{\rm max}\vert M_{\rm vir})`, which we can do using
the `~halotools.utils.sliding_conditional_percentile` function.

.. code:: python
from halotools.utils import sliding_conditional_percentile
x = host_halos['halo_mvir']
y = host_halos['halo_vmax']
nwin = 301
host_halos['vmax_percentile'] = sliding_conditional_percentile(x, y, nwin)
Now that :math:`{\rm Prob}(<V_{\rm max}\vert M_{\rm vir})` has been calculated,
we simply use these values instead of uniform randoms as the argument to the
`scipy.stats.norm.isf` function. This way, below-average values of :math:`V_{\rm max}`
will be assigned below-average values of luminosity, and conversely.

.. code:: python
loc = np.log10(host_halos['median_luminosity'])
u = host_halos['vmax_percentile']
host_halos['luminosity'] = 10**norm.isf(1-u, loc=loc, scale=0.2)
The plot below illustrates our results. The gray points show the Monte Carlo realization
of central galaxy luminosity as a function of host halo mass. Each of the different
curves shows the median relation calculated for halos with different values of :math:`V_{\rm max}`.

.. image:: /_static/cam_example_assembias_clf.png
Loading

0 comments on commit 5b351c8

Please sign in to comment.