Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update integrate_1d to use variadic autodiff stuff internally in preparation for closures #2397

Merged
merged 28 commits into from
Mar 31, 2021

Conversation

bbbales2
Copy link
Member

Summary

This should make it easy to hook up integrate_1d with closures (#2384)

Release notes

  • integrate_1d internal interface updated in preparation for closures

Checklist

  • Math issue Implement closures #2197

  • Copyright holder: Columbia University

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@bbbales2
Copy link
Member Author

@nhuurre this doesn't change the integrate_1d external interface but it should make it easy to hook up in the closure pull.

@bbbales2 bbbales2 mentioned this pull request Feb 27, 2021
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.45 3.42 1.01 0.74% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.49% faster
eight_schools/eight_schools.stan 0.11 0.11 1.02 1.71% faster
gp_regr/gp_regr.stan 0.16 0.16 0.95 -5.01% slower
irt_2pl/irt_2pl.stan 5.3 5.29 1.0 0.28% faster
performance.compilation 91.29 88.57 1.03 2.98% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.61 8.64 1.0 -0.37% slower
pkpd/one_comp_mm_elim_abs.stan 29.52 29.66 1.0 -0.47% slower
sir/sir.stan 128.15 131.45 0.97 -2.58% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.03 3.25% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.01 2.99 1.0 0.45% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.4 0.94 -6.47% slower
arK/arK.stan 1.78 1.81 0.98 -1.94% slower
arma/arma.stan 0.71 0.74 0.96 -3.81% slower
garch/garch.stan 0.57 0.56 1.01 1.43% faster
Mean result: 0.994570523186

Jenkins Console Log
Blue Ocean
Commit hash: 132cd2c


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2 bbbales2 mentioned this pull request Mar 3, 2021
5 tasks
@SteveBronder
Copy link
Collaborator

Is this waiting for a review?

@bbbales2
Copy link
Member Author

bbbales2 commented Mar 7, 2021

@SteveBronder yeah have at it. It's some changes to make it easy to do closures with integrate_1d.

@bbbales2
Copy link
Member Author

Binds removed!

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small comments overall code looks good!

One thing, I think we can use a double nested reverse pass to avoid copying the args every iteration. Can you take a look at the code here to see if it makes sense? It's passing all the tests but I'm not super familiar with nested autodiff so I'm not sure if this would break in some edge case we are not testing right now

feature/variadic-integrate-1d...review/variadic-integrate-1d#diff-79b7fa556075dab8ffc8caa4be8fc909b1efbcfe4e63d254793a966f63391a1cR143

stan/math/prim/functor/integrate_1d.hpp Outdated Show resolved Hide resolved
stan/math/prim/functor/integrate_1d.hpp Outdated Show resolved Hide resolved
stan/math/rev/functor/integrate_1d.hpp Show resolved Hide resolved
stan/math/rev/functor/integrate_1d.hpp Outdated Show resolved Hide resolved
@@ -57,6 +58,8 @@ inline double integrate(const F& f, double a, double b,
bool used_two_integrals = false;
size_t levels;
double Q = 0.0;
// if a or b is infinite, set xc argument to NaN (see docs above for user
// function for xc info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] We can do this some other time but it would be nice to make the errors in the if (used_two_integrals) into a function that's just called in the places we use two integrals

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split these into lambda functions. That look better or is used_two_integrals clearer?

stan/math/prim/functor/integrate_1d.hpp Show resolved Hide resolved
stan/math/rev/functor/integrate_1d.hpp Outdated Show resolved Hide resolved
stan/math/rev/functor/integrate_1d.hpp Outdated Show resolved Hide resolved
stan/math/rev/functor/integrate_1d.hpp Outdated Show resolved Hide resolved
Comment on lines 59 to 66
apply(
[&](auto &&... args) {
accumulate_adjoints(adjoints.data(),
std::forward<decltype(args)>(args)...);
},
std::move(args_tuple_local_copy));

gradient = adjoints.coeff(n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to make this comment in my review, but optionally we can definitely figure out which arg has the adjoint value we need here in a clever way that doesn't require copying all of them. We could have some function to get the Nth var in a tuple but I'm also fine with not doing that in this PR

Copy link
Collaborator

@SteveBronder SteveBronder Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have some internal function like

template <typename... Args>
double get_nth_adjoint(size_t n, const std::tuple<Args...>& tuple_arg) {
   size_t accum_vars = 0;
   bool stop_checking = false;
   // for_each goes off from left to right
   std::array<double, sizeof...(Args)> possible_adjs = 
    for_each([&accum_vars, &stop_checking](auto&& arg){
      if (stop_checking) return 0.0;
      size_t num_vars = count_vars(arg);
      // Need to keep moving along
      if ((accum_vars + num_vars) < nth || stop_checking) {
         accum_vars += num_vars;
         return 0.0;
      } else { // We reached the first arg that passes
         stop_checking = true;
         // I'm tired but do the logic here to get the nth value from that particular arg
         return get_the_adj(arg, some_index_calculation);
      } 
   }, tuple_arg);
   // Loop over possible_adjs until we hit a nonzero value (or they are all zero and return a zero)
   return get_nonzero_value(possible_adjs);
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's awkward. I also want to leave it for now.

The way to speed this up is writing our own quadratures or talking to the Boost people about an interface where we integrate multiple functions on the same domain together. Now we compute all three of these integrals totally separately:

\int f(x, a, b) dx
\int df(x, a, b)/da
\int df(x, a, b)/db

But anytime we compute df(x, a, b)/da we also get df(x, a, b)/db, and so the efficiency gains would be taking advantage of that (and what we're doing here is throwing away a ton of gradient info).

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.39 3.38 1.0 0.27% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.92 -8.65% slower
eight_schools/eight_schools.stan 0.12 0.11 1.08 7.77% faster
gp_regr/gp_regr.stan 0.16 0.16 1.01 1.16% faster
irt_2pl/irt_2pl.stan 5.4 5.37 1.01 0.55% faster
performance.compilation 90.49 88.9 1.02 1.75% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.55 8.62 0.99 -0.89% slower
pkpd/one_comp_mm_elim_abs.stan 29.61 29.73 1.0 -0.41% slower
sir/sir.stan 125.64 123.57 1.02 1.65% faster
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -1.07% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.21 3.27 0.98 -1.61% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.41 0.38 1.09 8.22% faster
arK/arK.stan 1.98 2.02 0.98 -1.68% slower
arma/arma.stan 0.95 0.95 1.0 0.34% faster
garch/garch.stan 0.51 0.51 1.01 0.75% faster
Mean result: 1.00691268244

Jenkins Console Log
Blue Ocean
Commit hash: a6cebfc


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member Author

One thing, I think we can use a double nested reverse pass to avoid copying the args every iteration.

I think this is right. I didn't like doing a weird thing with nested autodiff across functions like that, so I just put the contents of gradient_of_f inline and got rid of that function.

SteveBronder
SteveBronder previously approved these changes Mar 29, 2021
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good! I had a few small comments I just put into a PR that you can merge if you like. I also figured out how to do the thing where we only grab the one adjoint we need instead of making a vector and copying all the adjoints into it each iteration.

feature/variadic-integrate-1d...review/integrate-1d-variadic-2

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.34 3.47 0.96 -3.89% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 0.59% faster
eight_schools/eight_schools.stan 0.11 0.11 0.95 -5.34% slower
gp_regr/gp_regr.stan 0.16 0.16 1.01 1.1% faster
irt_2pl/irt_2pl.stan 5.39 5.43 0.99 -0.71% slower
performance.compilation 92.01 88.91 1.03 3.36% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.68 8.86 0.98 -2.05% slower
pkpd/one_comp_mm_elim_abs.stan 30.55 31.17 0.98 -2.05% slower
sir/sir.stan 128.96 128.33 1.0 0.49% faster
gp_regr/gen_gp_data.stan 0.03 0.04 0.98 -2.04% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.98 2.98 1.0 -0.17% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.38 1.03 2.82% faster
arK/arK.stan 1.99 1.99 1.0 -0.14% slower
arma/arma.stan 0.63 0.65 0.98 -2.35% slower
garch/garch.stan 0.51 0.51 0.99 -1.27% slower
Mean result: 0.992789341343

Jenkins Console Log
Blue Ocean
Commit hash: 77164cc


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member Author

@SteveBronder I merged the stuff. There was another way to do the nth adjoint thing that was less code so I went with that so we should be fast

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the changes I made to get so this should be good now

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.41 3.39 1.01 0.59% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.7% slower
eight_schools/eight_schools.stan 0.11 0.11 1.02 1.74% faster
gp_regr/gp_regr.stan 0.16 0.16 0.98 -1.67% slower
irt_2pl/irt_2pl.stan 5.35 5.37 1.0 -0.39% slower
performance.compilation 91.12 88.98 1.02 2.35% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.77 8.57 1.02 2.21% faster
pkpd/one_comp_mm_elim_abs.stan 31.14 30.75 1.01 1.25% faster
sir/sir.stan 130.88 129.2 1.01 1.28% faster
gp_regr/gen_gp_data.stan 0.04 0.03 1.02 2.12% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.04 2.96 1.02 2.43% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.41 0.98 -2.51% slower
arK/arK.stan 1.98 1.97 1.01 0.56% faster
arma/arma.stan 0.64 0.65 0.98 -2.03% slower
garch/garch.stan 0.51 0.52 0.99 -0.7% slower
Mean result: 1.00334018087

Jenkins Console Log
Blue Ocean
Commit hash: 49ba955


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder SteveBronder merged commit 0956b10 into develop Mar 31, 2021
@rok-cesnovar rok-cesnovar deleted the feature/variadic-integrate-1d branch March 31, 2021 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants