Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less restrictive monotone constraints #2305

Conversation

CharlesAuguste
Copy link
Contributor

@CharlesAuguste CharlesAuguste commented Aug 1, 2019

+@aldanor @redditur

Currently, LightGBM is too strict in the way it enforces monotonic constraints. This PR introduces two methods; one just as computationally efficient as the current implementation, and another which is up to twice as slow, which enforces monotonic constraints in a looser way thereby improving performance of the resulting model.

Moreover, we introduce a novel way of penalizing monotonic splits which aims to reduce the loss in performance brought about by greedily splitting at the top of a decision tree with disregard to the constraints that this split may impose lower down the tree.

In the attached report we show that these methods can produce models with significantly improved performance and, therefore, recommend replacing the current approach LightGBM takes to enforcing monotonic constraints. It contains some details about the methods we implemented, and why we think they should replace the current one, including some results we generated on a public dataset to compare the methods. Please let us know if you find something unclear in the report or if you would like more information about anything.

Broken files
About the implementation, the monotone constraints should be fully working in the tree learner serial_tree_learner.cpp and serial_tree_learner.h. However, we barely modified the rest of the tree learners. Therefore, the following files will compile, but are broken:

  • feature_parallel_tree_learner.cpp
  • voting_parallel_tree_learner.cpp
  • parallel_tree_learner.h

And the following files won’t compile and broken too:

  • gpu_tree_learner.h
  • gpu_tree_learner.cpp

If you agree to go forward with the pull request, we think that given the work we have already done, it shouldn’t be too hard to adapt everything to the other tree learners. However, as we are not familiar with them, and would represent a lot of work for us alone so we hoped you would help for the implementation.
We think there are 3 options:

  • If you do not wish to work on monotone constraints, but still want to merge our work, then we can just make sure that the other tree learners will work with the current LightGBM constraining method, as they currently do. This can easily be done by copy-pasting a few functions in feature_histogram.hpp, and adding back what we removed from the files leaf_splits.hpp and split_info.hpp, but that would not be very clean. If you wish to go with this option, please advise what you think the best way to go with this is;
  • If you are ready to work with our code, but don't want to spend too much time on it, then we can try to make the tree learners work for with the fast method only;
  • If you are ready to work with our code and are ready to spend some time on it, when we can try to make both our slow and fast methods work for all tree learners.

Interactions between parameters
Finally, here is the summary of how the new parameters interact with the existing program:

  • When there are no constraints, the program should behave the same with or without the code we added;
  • For “monotone_precise_mode” to be set to True or for "monotone_penalty" to be non-0, actual constraints have to exist;
  • When there are constraints and “monotone_precise_mode” is set to False, then the way of handling missing values and categorical features is the same as before;
  • When there are constraints and “monotone_precise_mode” is set to True, then the missing value handle has to be deactivated (we believe that it would be possible to make it working, but we did not work on it because it is of no interest to us). Also, even if “monotone_precise_mode” is set to True, the categorical features won’t be split using the smart method (we believe that it should be possible to modify FindBestThresholdCategorical in a very similar way to what we did with FindBestThresholdNumerical in feature_histogram.hpp, but once again this is of no interest to us).

PR-monotone-constraints-report.pdf

@msftclas
Copy link

msftclas commented Aug 1, 2019

CLA assistant check
All CLA requirements met.

@CharlesAuguste CharlesAuguste changed the title Less-restrictive-monotone-constraints Less restrictive monotone constraints Aug 2, 2019
@CharlesAuguste
Copy link
Contributor Author

Has anyone started to / is anyone planning to look into this PR?

It took us some time to come up with and code our methods in LightGBM, and we do think this can be a major improvement (to those using monotone constraints at least) for the library. So we really are serious about this PR.

This might be a bit heavier than the usual PR, so please let me know if you want me to clarify anything, or if you have ideas on how to improve the code. Thanks!

@aldanor
Copy link

aldanor commented Aug 13, 2019

@StrikerRUS @guolinke Wondering what are your thoughts about this?

We think this is a novel method of handling monotone constraints, more efficient than in any of the existing GBM libraries including lightgbm/xgboost/catboost. We chose LightGBM because that’s what we currently use most frequently.

The presented implementation is backed by months of extensive research, testing and benchmarking, both on public and private data.

If we decide to go forward with this, here are the steps:

  • Someone needs to get familiar with core concepts and ideas presented in the pdf
  • Someone needs to read through the implementation of these ideas for serial tree learner
  • For non-serial learners, there’s two options: leave the old method and is, or implement the new method(s) for all of them. This will be mostly copy/paste but will need to be done preferably by someone very familiar with those parts of code, like gpu learner.

We’d be glad to discuss any of the above or provide any help with familiarising reviewers with this PR.

@StrikerRUS
Copy link
Collaborator

@aldanor I think this is great contribution! I'm so sorry, but this is beyond the area of my expertise. The appropriate pings will be @guolinke and @chivee. And of course, everyone interested in this is welcome to the discussion!

@guolinke
Copy link
Collaborator

Sorry, i am on vacation. Will check this when have time.

@guolinke
Copy link
Collaborator

@CharlesAuguste @aldanor
We are very welcome for this PR! Please go head, and ping me directly if you need any helps.

// add parent node information
std::vector<int> node_parent_;
// Keeps track of the monotone splits above the leaf
std::vector<bool> leaf_is_in_monotone_subtree_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to store these states into tree model? Or other structures only for MC?
Since some users don't use MC, this will change the model format and cause the compatibility problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another suggestion: it seems there are lots of changes in FeatureHistogram. This will hurt the readability of current codes. Could we decouple MC with other parts? Like an independent class, which could be called in FeatureHistogram.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will hurt the readability of current codes. Could we decouple MC with other parts?

We were very careful to implement this in a way that limited the impact on readability and, in my opinion, I think that we did a good job in this regard, given the complexity of what's being implemented. In fact, in my opinion, the current implementation of monotonic constraints isn't particularly readable. If anything, we've made it more readable, although, obviously, more complex.

If you have any specific examples where you feel readability has been impacted let us know and we can discuss it on an example-by-example basis.

Could we decouple MC with other parts?

We could, but I'm not convinced this would make things substantially more readable, and it would be some effort on our part to make such changes. It could even hurt readability due to increased indirection.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually I mean the maintainability. All implementation of MC is in one place, not distributed in the project. And i think good maintainability is also more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not be able to work on this during this week and next week, sorry about that. But I will make the necessary changes to improve the code after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke about the new states in the Tree class which would cause compatibility issues. First, the states are absolutely necessary to the functioning of monotone constraints as we implemented them, there is no way we can do without. And I think that these are sensible states that should belong to the tree class (I could even see these reused for other purposes).

An alternative I can think about would be to create an alternative class which would look like tree, but with these states only, and this class would be used whenever the Tree class is currently used. But that would create some redundancies, and would probably not be good for readability.

Since these fields currently won't be used if monotone constraints are not used, is it not possible to solve the compatibility problem instead? I am not sure when these issues would arise, but maybe there is a way to specify somewhere that if format don't match, then these fields should remain void, or be filled with dummy values. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesAuguste I see. could we have these states in tree class, but not save and load them into model file? If this is possible, it is okay as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double-check the code, it seems the new states are not in model file. So I think it is okay.

@guolinke
Copy link
Collaborator

And I think we can have the fast method in this PR. And have another PR for the slow method, after this PR is merged.

@CharlesAuguste
Copy link
Contributor Author

CharlesAuguste commented Sep 5, 2019

Sorry for the delay, I should be able to work on this again in the next few days. I can indeed split this PR. However, there are some code and structures that we created because of the slow method, but that are now used by every method. For example, we have a vector of doubles for determining constraints when splitting instead of just a double, because, for the slow method, constraints can depend on where the split happens. For readability, every method uses this vector of constraints, with the vector being only one element for the fast method for example. Can I leave these structures in the PR of the fast method? Or do I have to rework the code to make it consistent for the fast method only?

@guolinke
Copy link
Collaborator

guolinke commented Sep 6, 2019

@CharlesAuguste
You can use the convenient way for you.
Also refer to #2341,
I plan to separate monotone constraints, forced splits and cost efficient splits from the tree_learner.
I think your PR could help for the monotone constraints .

@guolinke
Copy link
Collaborator

@CharlesAuguste
you can follow the style of pr #2407 , which refactors the cost effective gradient boosting.

@CharlesAuguste
Copy link
Contributor Author

@guolinke I was able to do some refactoring. Can you let me know if this is going in the right direction? I am having trouble writing clean and understandable code, but I am ready to follow your advice to improve the code of the PR. Moreover, I removed some functions related only to the Slow method, to make this PR simpler, as you requested.

There still is a piece of code that I don't know how to refactor; the functions "GoUpToFindLeavesToUpdate" and "GoDownToFindLeavesToUpdate". The functions are used exclusively for monotone constraints, so I think you want me to move them to another file. However, when the code runs with monotone constraints, the stack of function calls goes like "Split -> GoUpToFindLeavesToUpdate -> GoDownToFindLeavesToUpdate -> UpdateBestSplitsFromHistograms -> ComputeBestSplitForFeature". The functions Split and ComputeBestSplitForFeature (at the start and end of the stack) absolutely belong to serial_tree_learner.cpp, and should stay in the file. My functions are called only in the presence of monotone constraints, and I do not know what the best way to move them somewhere else is, as they use parameter and methods from the SerialTreeLearner class. Can you please let me know how you would like me to do that? Please do let me know if something is unclear, and what else I can do to refactor the code and make it better. Thanks!

#endif // TIMETAG

double EPS = 1e-12;
Copy link
Collaborator

@guolinke guolinke Sep 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a kEpsilon you can use directly.

@guolinke
Copy link
Collaborator

Thanks so much! It is much clearer now. Please ping me if you met any problems.
I quick through the code of GoUpToFindLeavesToUpdate and GoDownToFindLeavesToUpdate. It seems there are kinds of static methods, and related to MC. Maybe move them to the MC class as the static methods?

@CharlesAuguste
Copy link
Contributor Author

I indeed think it is a good idea to consider them as static methods of MC, but as I mentioned, they also need access to some non-static methods of SerialTreeLearner where they currently are. So should I add SerialTreeLearner as an argument of GoUpToFindLeavesToUpdate and GoDownToFindLeavesToUpdate for them to be able to call the SerialTreeLearner methods, or is that not good?

@guolinke
Copy link
Collaborator

Do you mean the UpdateBestSplitsFromHistograms ? I only find this method is called in these two functions.
Is that possible to move UpdateBestSplitsFromHistograms out of this function?

@guolinke
Copy link
Collaborator

BTW, we don't want to involve additional overhead when the user doesn't enable MC.
So, like we do in #2407, when the function is disabled, the nullptr is used to avoid the unnecessary runs.

@CharlesAuguste
Copy link
Contributor Author

I meant GoUpToFindLeavesToUpdate calls GoDownToFindLeavesToUpdate which calls UpdateBestSplitsFromHistograms which calls ComputeBestSplitForFeature. I created all 4 methods. The first 3 are only about monotone constraints and do not belong to this file. The 4th one (ComputeBestSplitForFeature) is just basically some code that was duplicated in the version of master LightGBM I pulled, so I encapsulated the duplicated code into that method, so it does belong in the serial_tree_learner.cpp file, and is used even when there are no monotone constraints.

Currently the first 3 methods can easily call the 4th one because they all belong to the same class. However, if I move the first 3 somewhere else and make them static, then they still need to call the 4th one, and I am not sure what the best way to do that is. I think I can pass SerialTreeLearner as an argument of these methods while making them static, but I am not sure if that is a good practice.

@guolinke
Copy link
Collaborator

guolinke commented Sep 17, 2019

@CharlesAuguste
I don't think it is a good practice too. It is better to avoid passing a class and edit its contents by its non-constant method.

Could you move the ComputeBestSplitForFeature to the MC, with some arguments from tree_learner?

updated:
It seems this cannot work. It actually is the method belongs to tree learner.

@guolinke
Copy link
Collaborator

@CharlesAuguste it seems you can change the ComputeBestSplitForFeature to the const function, even the static function. In that cases, I think it is safe to call it.

@CharlesAuguste
Copy link
Contributor Author

@guolinke I was able to work a bit more on the PR. Hopefully you will find the code clearer now. I am not sure where exactly you want me to use the nullptr to disable things when monotone constraints are not used. Can you point where you want me to make the changes in the code? Let me know what you think and how I can improve this PR. Thanks

// if there is a monotone split above, we need to make sure the new
// values don't clash with existing constraints in the subtree,
// and if they do, the existing splits need to be updated
if (tree->leaf_is_in_monotone_subtree(*right_leaf) && !config_->monotone_constraints.empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be check config_->monotone_constraints.empty() first

bests[tid] = new_split;
}
}

Copy link
Collaborator

@guolinke guolinke Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid the extra calls in serial_tree_learner and feature_histogram as possible, when mc is not set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, but I don't understand what functions in particular you are referring to. I think the code, right now, when there are no constraints, is working in a very similar way to they way it used to work. But I am also happy to make the changes you request if you can be more specific about this.

@CharlesAuguste
Copy link
Contributor Author

@guolinke I will work on that over the next few days! Thanks

@CharlesAuguste
Copy link
Contributor Author

CharlesAuguste commented Jan 6, 2020

@guolinke @shiyu1994 I did the changes I could on @guolinke 's request, but I need more guidance on the ones I commented. Please let me know how to improve the PR. Thanks

@StrikerRUS
Copy link
Collaborator

Just out of curiosity: can this PR be split into sequence of smaller PRs? It's very hard to review PRs with a such large diff size. If this PR is splitable, splitting it will increase the speed and quality of reviews and in consequence speed up merging the proposed changes. As a good example I can refer to dmlc/xgboost#5104.

@CharlesAuguste
Copy link
Contributor Author

CharlesAuguste commented Jan 15, 2020

You are right, I indeed think it would make sense to split this PR into smaller PRs. This will be require a significant effort but we do think this PR is a valuable addition to the library (and hope you agree with us on that!), and I understand that such a big PR is very hard to review for you so I am ready to spend more time on it. Here is how I could proceed:

  • 1st PR with just refactoring the c++ code. In this PR, I would change the structure around monotone constraints to have them as close as possible to the structures this PR. In this PR, the behavior of the library would not be changed at all, just some refactoring would be done (for example adding the files monotone_constraint.cpp and monotone_constraints.hpp);
  • 2nd PR with just the algorithm for the Fast method added. In this PR I would implement the simplest algorithm, and try not to over-complicate it with stuff belonging to the Slow method. I will try to make as little changes as possible to feature_histogram.hpp in this pull request.
  • 3rd PR with the monotone penalty. This PR should be pretty short, just a few simple functions, and a new parameter for the library users.
  • 4th PR with the Slow method implemented on top of the existing code.

Do you agree with the plan above? Do you see any way to improve it? Let me know what you think. I will start working on that in the next few days. Thanks

guolinke added a commit that referenced this pull request Feb 10, 2020
* Move monotone constraints to the monotone_constraints files.

* Add checks for debug mode.

* Refactored FindBestSplitsFromHistograms.

* Add headers.

* fix

* Update data_parallel_tree_learner.cpp

* simplify ComputeBestSplitForFeature

* Fix min / max issue.

* Remove duplicated check.

Co-authored-by: Guolin Ke <[email protected]>
StrikerRUS pushed a commit that referenced this pull request Mar 23, 2020
#2770)

* Add util functions.

* Added monotone_constraints_method as a parameter.

* Add the intermediate constraining method.

* Updated tests.

* Minor fixes.

* Typo.

* Linting.

* Ran the parameter generator for the doc.

* Removed usage of the FeatureMonotone function.

* more fixes

* Fix.

* Remove duplicated code.

* Add debug checks.

* Typo.

* Bug fix.

* Disable the use of intermediate monotone constraints and feature sampling at the same time.

* Added an alias for monotone constraining method.

* Use the right variable to get the number of threads.

* Fix DEBUG checks.

* Add back check to determine if histogram is splittable.

* Added forgotten override keywords.

* Perform monotone constraint update only when necessary.

* Small refactor of FastLeafConstraints.

* Post rebase commit.

* Small refactor.

* Typo.

* Added comment and slightly improved logic of monotone constraints.

* Forgot a const.

* Vectors that are to be modified need to be pointers.

* Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation.

* Remove overload of GoUpToFindLeavesToUpdate.

* Stop memory leaking.

* Fix cpplint issues.

* Fix checks.

* Fix more cpplint issues.

* Refactor config monotone constraints method.

* Typos.

* Remove useless empty lines.

* Add new line to separate includes.

* Replace unsigned ind by size_t.

* Reduce number of trials in tests to decrease CI time.

* Specify monotone constraints better in tests.

* Removed outer loop in test of monotone constraints.

* Added categorical features to the monotone constraints tests.

* Add blank line.

* Regenerate parameters automatically.

* Speed up ShouldKeepGoingLeftRight.

Co-authored-by: Charles Auguste <[email protected]>
Co-authored-by: guolinke <[email protected]>
jameslamb pushed a commit to jameslamb/LightGBM that referenced this pull request Mar 24, 2020
…, microsoft#2717)  (microsoft#2770)

* Add util functions.

* Added monotone_constraints_method as a parameter.

* Add the intermediate constraining method.

* Updated tests.

* Minor fixes.

* Typo.

* Linting.

* Ran the parameter generator for the doc.

* Removed usage of the FeatureMonotone function.

* more fixes

* Fix.

* Remove duplicated code.

* Add debug checks.

* Typo.

* Bug fix.

* Disable the use of intermediate monotone constraints and feature sampling at the same time.

* Added an alias for monotone constraining method.

* Use the right variable to get the number of threads.

* Fix DEBUG checks.

* Add back check to determine if histogram is splittable.

* Added forgotten override keywords.

* Perform monotone constraint update only when necessary.

* Small refactor of FastLeafConstraints.

* Post rebase commit.

* Small refactor.

* Typo.

* Added comment and slightly improved logic of monotone constraints.

* Forgot a const.

* Vectors that are to be modified need to be pointers.

* Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation.

* Remove overload of GoUpToFindLeavesToUpdate.

* Stop memory leaking.

* Fix cpplint issues.

* Fix checks.

* Fix more cpplint issues.

* Refactor config monotone constraints method.

* Typos.

* Remove useless empty lines.

* Add new line to separate includes.

* Replace unsigned ind by size_t.

* Reduce number of trials in tests to decrease CI time.

* Specify monotone constraints better in tests.

* Removed outer loop in test of monotone constraints.

* Added categorical features to the monotone constraints tests.

* Add blank line.

* Regenerate parameters automatically.

* Speed up ShouldKeepGoingLeftRight.

Co-authored-by: Charles Auguste <[email protected]>
Co-authored-by: guolinke <[email protected]>
@StrikerRUS StrikerRUS mentioned this pull request May 11, 2020
@guolinke guolinke mentioned this pull request Aug 10, 2020
10 tasks
@jameslamb
Copy link
Collaborator

@CharlesAuguste thank you again for all your hard work on this! Now that we have #3264, can this pull request be closed?

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Sep 5, 2020

@jameslamb

Now that we have #3264, can this pull request be closed?

According to the following @CharlesAuguste's words,

2 things in the original PR that are missing in here:

  • The code was more optimized for time in the original PR, and the method is a bit slower here. Though the optimizations made the code significantly more complicated, and I think the time saved was not much. I can implement the optimization in another PR later, as I think this PR is complicated enough as is.
  • In the original PR, once a tree is build, there was a final leaves refitting, that I have not implemented here yet. I can either implement it here or in another PR later. This can slightly improve performance.

there may be one or two more PRs related to the monotone constraints.

@jameslamb
Copy link
Collaborator

@StrikerRUS I understand there may be more PRs in the future, but that doesn't mean this one needs to stay open, right? If we're never going to merge this specific PR, it should just be closed.

@StrikerRUS
Copy link
Collaborator

@jameslamb
Yeah, maybe you are right! I was thinking about this PR as about "TODO issue" because all changes from those PR1, PR2, ... are implemented here originally.

@CharlesAuguste
Copy link
Contributor Author

@StrikerRUS @jameslamb I confirm this PR will not be merged. So I agree it can be closed. Thanks

@jameslamb
Copy link
Collaborator

@StrikerRUS @jameslamb I confirm this PR will not be merged. So I agree it can be closed. Thanks

thanks very much!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants