Missing subvolume(s) in Bolshoi-Planck z=0 catalog #598

aphearin · 2016-07-20T17:19:11Z

The following Halotools-provided halo catalog is missing a substantial number of halos:

simname = bolplanck.
redshift = 0
halo_finder = rockstar
version_name = halotools_alpha_version2.

The missing halos appear to be isolated to x, y > 200 Mpc/h. All scientific results deriving from the halotools_alpha_version2 catalog are invalid.

Until this is resolved with the v0.4 release, users should download the latest ASCII data from http://www.slac.stanford.edu/~behroozi/BPlanck_Hlists/hlist_1.00231.list.gz and reprocess it themselves using RockstarHlistReader.

CC @andrew-zentner, @duncandc, @vdbosch69.

The text was updated successfully, but these errors were encountered:

aphearin · 2016-08-10T18:23:05Z

This figure shows that the v0.4 catalogs currently up on the Yale website resolve the problem missing subvolume problem first pointed out by @johannesulf. The quick way to read these plots is just to notice that there are no "holes" missing in any of them. In a little more detail, each panel shows a 2-d scatter plot of the positions of 1e4 randomly selected halos from a single snapshot; the left column shows x-y scatter plots, the middle column x-z, the right column y-z. From top to bottom, the rows show bolshoi, bolshoi-planck, consuelo and multidark. Within each panel, results for all four redshifts are shown.

aphearin · 2016-08-10T18:43:49Z

To help protect against this problem in the future, Peter Behroozi now includes a file containing the result of running sha1sum on all halo catalogs posted on SLAC. Before processing these snapshots, I verified that the sha1sum run on each downloaded catalog agrees with his tabulated values, which should guarantee that the download of each catalog proceeded without interruption.

This check should be done every time any rockstar halo catalog is downloaded, either by halotools developers or users. The reason this is so important for large-scale structure statistics is that the rows of publicly available rockstar catalogs are chunked by spatial subvolume, so silently-incomplete downloads are systematically missing spatial sections of the snapshot. The above plot, and the ones to follow, provide further testing on the updated catalogs.

aphearin · 2016-08-10T22:29:33Z

The above plot compares the 3d clustering of halos of a fixed mass of Mvir~1e12. Different colored curves show different simulations. Different panels show results for different redshifts.

In the plot below, I show the ratio of each bolshoi-planck and consuelo relative to bolshoi, so that values on the vertical axis less than unity correspond to situation in which bolshoi-planck (consuelo) has weaker clustering than bolshoi.

Notice that milky way halos in bolshoi-planck show 10-15% weaker clustering than bolshoi. That's the sense of the effect that should be expected by the shift in M*, but that magnitude is a bit surprising. This has been confirmed by @johannesulf in an independently downloaded catalog. @vandenbosch69 and/or @surhudm - does this level of difference also seem a bit high to you?

aphearin · 2016-08-10T22:38:10Z

The plot below is the same as the one above, except here I go to a slightly higher mass, logMvir ~ 12.5, so that I can include multidark.

Even though the z=2 panel shows a larger discrepancy for multidark than for the bolshoi-planck ratio, this is reasonable since I've made no attempt to compare halo clustering at fixed peak height, and this mass range is way above collapse mass at z=2, where bias is a more rapidly varying function of mass. The fact that the multidark discrepancy increases with redshift is comforting, and note that the bolshoi-planck ratio does not show this redshift-dependence.

aphearin · 2016-08-10T22:51:16Z

Another simple way to check this is just to do simple counts-in-subvolume statistics. I divide each snapshot into the same subvolumes used to chunk the data hosted on SLAC, and just count the number of host halos with peak mass greater than 300 particles, Mpeak > 300mp. I then compute the minimum counts divided by the median counts and plot the result below. I show results for all redshifts and simulations. At each of the four redshifts 0, 0.5, 1 and 2, I slightly stagger each simulation's bar plot to make it easier to read.

In the previous buggy catalogs, the z = 0 value of bolshoi-planck would have been zero. The bolshoi(-planck) subvolumes are 50 comoving Mpc/h in size, and there's still quite a lot of cosmic structure on these scales: the typical value of the vertical axis for Poisson statistics would be ~0.95.

surhudm · 2016-08-11T02:08:56Z

Hi Andrew,

The Tinker 2010 bias for BolshoiP is about 4 percent lower at z=0 for 1.E12
Msun/h halos (for 200m definition), which would result in 8 percent
difference. Can you check what the rough difference is for the virial
definition?

Cheers,
Surhud

On Thu, Aug 11, 2016 at 7:51 AM Andrew Hearin [email protected]
wrote:

Another simple way to check this is just to do simple counts-in-subvolume
statistics. I divide each snapshot into the same subvolumes used to chunk
the data hosted on SLAC, and just count the number of host halos with peak
mass greater than 300 particles, Mpeak > 300mp. I then compute the
minimum counts divided by the median counts and plot the result below. I
show results for all redshifts and simulations. At each of the four
redshifts 0, 0.5, 1 and 2, I slightly stagger each simulation's bar plot to
make it easier to read.

[image: subvol_counts]
https://cloud.githubusercontent.com/assets/6951595/17573766/89723ae4-5f2a-11e6-9c89-7a55ada95554.png

In the previous buggy catalogs, the z = 0 value of bolshoi-planck
would have been zero. The bolshoi(-planck) subvolumes are 50 comoving
Mpc/h in size, and there's still quite a lot of cosmic structure on these
scales: the typical value of the vertical axis for Poisson statistics would
be ~0.95.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#598 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEMGuYCVRdAqDJxukz_1-zhTTuBIB_t4ks5qelXlgaJpZM4JRAha
.

aphearin · 2016-08-11T13:19:07Z

Many thanks for the sanity check, Surhud. I figured you had code for that tinker bias estimate at-the-ready. That's slightly lower than what I'm seeing here, but close enough to chalk up the remaining residual to a combination of sample variance and fitting function error, so I'm not so worried about this anymore. I think this is convincing that the v0.4 catalogs have been processed properly.

aphearin added bug sim-manager labels Jul 20, 2016

aphearin added this to the v0.4 milestone Jul 20, 2016

aphearin self-assigned this Jul 20, 2016

aphearin mentioned this issue Jul 30, 2016

Reprocess sims #614

Merged

aphearin closed this as completed Aug 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing subvolume(s) in Bolshoi-Planck z=0 catalog #598

Missing subvolume(s) in Bolshoi-Planck z=0 catalog #598

aphearin commented Jul 20, 2016

aphearin commented Aug 10, 2016

aphearin commented Aug 10, 2016 •

edited

Loading

aphearin commented Aug 10, 2016

aphearin commented Aug 10, 2016 •

edited

Loading

aphearin commented Aug 10, 2016

surhudm commented Aug 11, 2016

aphearin commented Aug 11, 2016

Missing subvolume(s) in Bolshoi-Planck z=0 catalog #598

Missing subvolume(s) in Bolshoi-Planck z=0 catalog #598

Comments

aphearin commented Jul 20, 2016

aphearin commented Aug 10, 2016

aphearin commented Aug 10, 2016 • edited Loading

aphearin commented Aug 10, 2016

aphearin commented Aug 10, 2016 • edited Loading

aphearin commented Aug 10, 2016

surhudm commented Aug 11, 2016

aphearin commented Aug 11, 2016

aphearin commented Aug 10, 2016 •

edited

Loading

aphearin commented Aug 10, 2016 •

edited

Loading