Feature/benchmark #1457

jhamman · 2017-06-16T00:11:52Z

Closes PERF: Add benchmarking? #1257
Tests added / passed
Passes git diff upstream/master | flake8 --diff
Fully documented, including whats-new.rst for all changes and api.rst for new API

This is a very bare bones addition of the asv benchmarking tool to xarray. I have added four very rudimentary benchmarks in the dataset_io.py module.

Usage of asv is pretty straightforward but I'll outline the steps for those who want to try this out:

cd xarray
conda install asv -c conda-forge
asv run  # this will install some conda environments in ./.asv/envs
asv publish  # this collates the results
asv preview  # this will launch a web server so you can visually compare the tests

Before I go any further, I want to get some input from @pydata/xarray on what we want to see in this PR. In previous projects, I have found designing tests after the fact can end up being fairly arbitrary and I want to avoid that if at all possible. I'm guessing that we will want to focus our efforts for now on I/O and dask related performance but how we do that is up for discussion.

cc @shoyer, @rabernat, @MaximilianR, @Zac-HD

Zac-HD · 2017-06-16T03:29:12Z

I like the idea of benchmarks, but have some serious concerns. For Dask and IO-bound work in general, benchmark results will vary widely depending on the hardware and (if relevant) network properties. Results will be noncomparable between SSD and HDD, local and remote network access, and in general depend heavily on the specific IO patterns and storage/compute relationship of the computer.

This isn't a reason not to benchmark though, just a call for very cautious interpretation - it's clearly useful to catch some of the subtle-but-pathological performance problems that have cropped up. In short, I think benchmarks should have a very clear warnings section in the documentation, and no decision should be taken to change code without benchmarking on a variety of computers (SSD/HDD, PC/cluster, local/remote data...).

Also JSON cannot include comments, and there are a number of entries that you need to update, but that's a passing concern.

shoyer · 2017-06-16T03:50:33Z

@wesm just setup a machine for dedicated benchmarking of pandas and possibly other pydata/scipy project (if there's extra capacity as expected). @TomAugspurger has been working on getting it setup. So that's potentially an option, at least for single machine benchmarks.

The lore I've heard is that benchmarking on shared cloud resources (e.g., Travis-CI) can have reproducibility issues due to resource contention and/or jobs getting scheduled on slightly different machine types. I don't know how true this still is, or if there are good work arounds for particular cloud platforms. I suspect this should be solvable, though. I can certainly make an internal inquiry about benchmarking on GCP if we can't find answers on our own.

Zac-HD · 2017-06-16T03:57:57Z

The tests for Hypothesis take almost twice as long to run on Travis at certain times of day, so I certainly wouldn't use it for benchmarking anything!

Also concerned that a dedicated benchmarking machine may lead to software (accidentally!) optimized for a particular architecture or balance of machine resources without due consideration. Maybe @wesm could investigate fault injection to (eg) slow down disk access or add latency for some sets of benchmarks?

max-sixty · 2017-06-16T04:45:49Z

This is a great start! Thanks @jhamman !

Our most common performance problems are handling pandas 'oddities', like non-standard indexes. Generally when an operation that is generally vectorized becomes un-vectorized, and starts looping in python. But that's probably not a big use case for most.

What are the instances others have seen performance issues? Are there ever issues with the standard transform operations, such as merge?

(addendum, I just saw the comments above):
I think there's some real benefit in benchmarks to ensure we don't add code that slow down operations by an order of magnitude slower - i.e. outside the bounds of reasonable error. That's broader than optimizing around them, particularly since xarray is all python, and shouldn't be doing performance intensive work internally.

max-sixty · 2017-06-16T04:51:37Z

benchmarks/dataset_io.py

+
+    def time_load_dataset_netcdf4(self):
+        xr.open_dataset(self.filepath, engine='netcdf4').load()
+        pass


FYI the pass statements don't do anything

jhamman · 2017-06-16T05:20:24Z

Keep the comments coming! I think we can distinguish between benchmarking for regressions and benchmarking for development and introspection.

The former will require some thought as to what machines we want to rely on and how to achieve consistency throughout the development track. It sounds like there are a number of options that we could pursue toward those ends.

The latter use of benchmarking is useful on a single machine with only a few commits of history. For the four benchmarks in my sample dataset_io.py, we get the following interesting results (for one environment):

--[  0.00%]  Benchmarking conda-py2.7-bottleneck-dask-netcdf4-numpy-pandas-scipy
---[  3.12%]  Running dataset_io.IOSingleNetCDF.time_load_dataset_netcdf4       134.34ms
---[  6.25%]  Running dataset_io.IOSingleNetCDF.time_load_dataset_scipy         82.60ms
---[  9.38%]  Running dataset_io.IOSingleNetCDF.time_write_dataset_netcdf4      57.71ms
---[ 12.50%]  Running dataset_io.IOSingleNetCDF.time_write_dataset_scipy        267.29ms

So the relative performance is useful information in deciding how to use and/or develop xarray. (Granted the exact factors will change depending on machine/architecture/dataset).

jhamman · 2017-07-13T22:37:02Z

@rabernat - do you have any thoughts on this?

@pydata/xarray - I'm trying to decide if this is worth spending any more time on. What sort of coverage would we want before we merge this first PR?

shoyer · 2017-07-14T05:24:04Z

We should do this to the extent that it is helpful in driving development. Even just a few realistic use cases can be helpful, especially for guarding against performance regressions.

…

On Thu, Jul 13, 2017 at 3:37 PM Joe Hamman ***@***.***> wrote: @rabernat <https://github.com/rabernat> - do you have any thoughts on this? @pydata/xarray <https://github.com/orgs/pydata/teams/xarray> - I'm trying to decide if this is worth spending any more time on. What sort of coverage would we want before we merge this first PR? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1457 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1rVPL6fCRqwi1Szmtq09chkah9odks5sNpwQgaJpZM4N74gy> .

rabernat · 2017-07-14T16:17:07Z

I think this a great start!

I would really like to see a performance test for open_mfdataset, since this is my own personal bottleneck.

Regarding the dependence on hardware, I/O speeds, etc, we should be able to resolve this by running on specific instance types on a cloud platform. We could configure environments with local SSD storage, network storage, etc, in order to cover different scenarios.

TomAugspurger · 2017-07-14T16:21:29Z

About hardware, we should be able to run these on the machine running the pandas benchmarks. Once it's merged I should be able to add it easily to https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml and the benchmarks will be run and published (to https://tomaugspurger.github.io/asv-collection/ right now; not the permanent home)

jhamman · 2017-07-21T19:27:49Z

Thanks @TomAugspurger - see asv-runner/asv-runner#1.

All, I added a series of multi-file benchmarks. I think for a first PR, this is ready to fly and we can add more benchmarks as needed.

rabernat

This is awesome! Just a tiny suggestion...

rabernat · 2017-07-21T19:33:56Z

asv_bench/benchmarks/dataset_io.py

+                          format=self.format)
+
+    def time_load_dataset_scipy(self):
+        xr.open_mfdataset(self.filenames_list, engine='scipy').load()


I would create a separate function just for the open_mfdataset call, without load, i.e.

def time_open_dataset_scipy(self): ds = xr.open_mfdataset(self.filenames_list, engine='scipy')

This would allow us to track the cost of reading the indices and aligning. (This is my personal bottleneck with big datasets.)

Then load could be called in a different function.

rabernat · 2017-07-25T14:38:36Z

I will merge by the end of the day if no one has any more comments.

TomAugspurger · 2017-07-27T14:21:30Z

These are now being run and published to https://tomaugspurger.github.io/asv-collection/xarray/

I'm plan to find a more permanent home to publish the results rather than my personal github pages site, but that may take a while before I can get to it.

shoyer · 2017-07-27T16:31:14Z

Awesome, thanks @TomAugspurger !

wesm · 2017-07-27T16:39:34Z

cool, are these numbers coming off the pandabox?

TomAugspurger · 2017-07-27T18:45:36Z

Yep, thanks again for setting that up.

…

On Thu, Jul 27, 2017 at 11:39 AM, Wes McKinney ***@***.***> wrote: cool, are these numbers coming off the pandabox? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1457 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIk24wmNhChH3nCVT3AGqR_Q6EHa9ks5sSL1IgaJpZM4N74gy> .

jhamman · 2017-07-27T19:54:01Z

Yes! Thanks @wesm and @TomAugspurger.

Joe Hamman added 3 commits June 12, 2017 16:15

initial commit of asv config to xarray repo

02aa035

add some basic benchmarks for dataset io

33ef97d

cleanup benchmarks.__init__.py

5f8a66f

jhamman added topic-performance topic-testing labels Jun 16, 2017

jhamman mentioned this pull request Jun 16, 2017

If a NetCDF file is chunked on disk, open it with compatible dask chunks #1440

Closed

max-sixty reviewed Jun 16, 2017

View reviewed changes

Joe Hamman added 3 commits July 14, 2017 16:43

intermediate refactor for i/o tests

3437ad2

Merge branch 'master' of github.com:pydata/xarray into feature/benchmark

8eef92e

add mf_dataset benchmarks

5b0b5da

jhamman mentioned this pull request Jul 21, 2017

add xarray to asv-runner asv-runner/asv-runner#1

Merged

rabernat reviewed Jul 21, 2017

View reviewed changes

add benchmarks for open-only using open_mfdataset

6c05808

rabernat approved these changes Jul 21, 2017

View reviewed changes

jhamman merged commit 96e6e8f into pydata:master Jul 26, 2017

jhamman deleted the feature/benchmark branch July 26, 2017 16:17

jhamman modified the milestone: 0.10 Aug 4, 2017

fmaussion mentioned this pull request Oct 6, 2017

Consider adding performance tests OGGM/oggm#313

Closed

jhamman mentioned this pull request Dec 9, 2017

Needs performance check / improvements in value assignment of DataArray #1771

Open

jbusecke mentioned this pull request Sep 17, 2020

Adding basic benchmarking to CI [FEATURE] xgcm/xgcm#238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/benchmark #1457

Feature/benchmark #1457

jhamman commented Jun 16, 2017 •

edited

Loading

Zac-HD commented Jun 16, 2017

shoyer commented Jun 16, 2017

Zac-HD commented Jun 16, 2017

max-sixty commented Jun 16, 2017 •

edited

Loading

max-sixty Jun 16, 2017

jhamman commented Jun 16, 2017

jhamman commented Jul 13, 2017

shoyer commented Jul 14, 2017 via email

rabernat commented Jul 14, 2017

TomAugspurger commented Jul 14, 2017

jhamman commented Jul 21, 2017 •

edited

Loading

rabernat left a comment

rabernat Jul 21, 2017

jhamman Jul 21, 2017

rabernat commented Jul 25, 2017

TomAugspurger commented Jul 27, 2017

shoyer commented Jul 27, 2017

wesm commented Jul 27, 2017

TomAugspurger commented Jul 27, 2017 via email

jhamman commented Jul 27, 2017

Feature/benchmark #1457

Feature/benchmark #1457

Conversation

jhamman commented Jun 16, 2017 • edited Loading

Zac-HD commented Jun 16, 2017

shoyer commented Jun 16, 2017

Zac-HD commented Jun 16, 2017

max-sixty commented Jun 16, 2017 • edited Loading

max-sixty Jun 16, 2017

Choose a reason for hiding this comment

jhamman commented Jun 16, 2017

jhamman commented Jul 13, 2017

shoyer commented Jul 14, 2017 via email

rabernat commented Jul 14, 2017

TomAugspurger commented Jul 14, 2017

jhamman commented Jul 21, 2017 • edited Loading

rabernat left a comment

Choose a reason for hiding this comment

rabernat Jul 21, 2017

Choose a reason for hiding this comment

jhamman Jul 21, 2017

Choose a reason for hiding this comment

rabernat commented Jul 25, 2017

TomAugspurger commented Jul 27, 2017

shoyer commented Jul 27, 2017

wesm commented Jul 27, 2017

TomAugspurger commented Jul 27, 2017 via email

jhamman commented Jul 27, 2017

jhamman commented Jun 16, 2017 •

edited

Loading

max-sixty commented Jun 16, 2017 •

edited

Loading

jhamman commented Jul 21, 2017 •

edited

Loading