-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less restrictive monotone constraints #2305
Less restrictive monotone constraints #2305
Conversation
Has anyone started to / is anyone planning to look into this PR? It took us some time to come up with and code our methods in LightGBM, and we do think this can be a major improvement (to those using monotone constraints at least) for the library. So we really are serious about this PR. This might be a bit heavier than the usual PR, so please let me know if you want me to clarify anything, or if you have ideas on how to improve the code. Thanks! |
@StrikerRUS @guolinke Wondering what are your thoughts about this? We think this is a novel method of handling monotone constraints, more efficient than in any of the existing GBM libraries including lightgbm/xgboost/catboost. We chose LightGBM because that’s what we currently use most frequently. The presented implementation is backed by months of extensive research, testing and benchmarking, both on public and private data. If we decide to go forward with this, here are the steps:
We’d be glad to discuss any of the above or provide any help with familiarising reviewers with this PR. |
Sorry, i am on vacation. Will check this when have time. |
@CharlesAuguste @aldanor |
// add parent node information | ||
std::vector<int> node_parent_; | ||
// Keeps track of the monotone splits above the leaf | ||
std::vector<bool> leaf_is_in_monotone_subtree_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to store these states into tree model? Or other structures only for MC?
Since some users don't use MC, this will change the model format and cause the compatibility problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another suggestion: it seems there are lots of changes in FeatureHistogram. This will hurt the readability of current codes. Could we decouple MC with other parts? Like an independent class, which could be called in FeatureHistogram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will hurt the readability of current codes. Could we decouple MC with other parts?
We were very careful to implement this in a way that limited the impact on readability and, in my opinion, I think that we did a good job in this regard, given the complexity of what's being implemented. In fact, in my opinion, the current implementation of monotonic constraints isn't particularly readable. If anything, we've made it more readable, although, obviously, more complex.
If you have any specific examples where you feel readability has been impacted let us know and we can discuss it on an example-by-example basis.
Could we decouple MC with other parts?
We could, but I'm not convinced this would make things substantially more readable, and it would be some effort on our part to make such changes. It could even hurt readability due to increased indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually I mean the maintainability. All implementation of MC is in one place, not distributed in the project. And i think good maintainability is also more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not be able to work on this during this week and next week, sorry about that. But I will make the necessary changes to improve the code after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guolinke about the new states in the Tree class which would cause compatibility issues. First, the states are absolutely necessary to the functioning of monotone constraints as we implemented them, there is no way we can do without. And I think that these are sensible states that should belong to the tree class (I could even see these reused for other purposes).
An alternative I can think about would be to create an alternative class which would look like tree, but with these states only, and this class would be used whenever the Tree class is currently used. But that would create some redundancies, and would probably not be good for readability.
Since these fields currently won't be used if monotone constraints are not used, is it not possible to solve the compatibility problem instead? I am not sure when these issues would arise, but maybe there is a way to specify somewhere that if format don't match, then these fields should remain void, or be filled with dummy values. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CharlesAuguste I see. could we have these states in tree class, but not save and load them into model file? If this is possible, it is okay as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just double-check the code, it seems the new states are not in model file. So I think it is okay.
And I think we can have the fast method in this PR. And have another PR for the slow method, after this PR is merged. |
Sorry for the delay, I should be able to work on this again in the next few days. I can indeed split this PR. However, there are some code and structures that we created because of the slow method, but that are now used by every method. For example, we have a vector of doubles for determining constraints when splitting instead of just a double, because, for the slow method, constraints can depend on where the split happens. For readability, every method uses this vector of constraints, with the vector being only one element for the fast method for example. Can I leave these structures in the PR of the fast method? Or do I have to rework the code to make it consistent for the fast method only? |
@CharlesAuguste |
@CharlesAuguste |
@guolinke I was able to do some refactoring. Can you let me know if this is going in the right direction? I am having trouble writing clean and understandable code, but I am ready to follow your advice to improve the code of the PR. Moreover, I removed some functions related only to the Slow method, to make this PR simpler, as you requested. There still is a piece of code that I don't know how to refactor; the functions "GoUpToFindLeavesToUpdate" and "GoDownToFindLeavesToUpdate". The functions are used exclusively for monotone constraints, so I think you want me to move them to another file. However, when the code runs with monotone constraints, the stack of function calls goes like "Split -> GoUpToFindLeavesToUpdate -> GoDownToFindLeavesToUpdate -> UpdateBestSplitsFromHistograms -> ComputeBestSplitForFeature". The functions Split and ComputeBestSplitForFeature (at the start and end of the stack) absolutely belong to serial_tree_learner.cpp, and should stay in the file. My functions are called only in the presence of monotone constraints, and I do not know what the best way to move them somewhere else is, as they use parameter and methods from the SerialTreeLearner class. Can you please let me know how you would like me to do that? Please do let me know if something is unclear, and what else I can do to refactor the code and make it better. Thanks! |
#endif // TIMETAG | ||
|
||
double EPS = 1e-12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a kEpsilon
you can use directly.
Thanks so much! It is much clearer now. Please ping me if you met any problems. |
I indeed think it is a good idea to consider them as static methods of MC, but as I mentioned, they also need access to some non-static methods of SerialTreeLearner where they currently are. So should I add SerialTreeLearner as an argument of GoUpToFindLeavesToUpdate and GoDownToFindLeavesToUpdate for them to be able to call the SerialTreeLearner methods, or is that not good? |
Do you mean the |
BTW, we don't want to involve additional overhead when the user doesn't enable MC. |
I meant GoUpToFindLeavesToUpdate calls GoDownToFindLeavesToUpdate which calls UpdateBestSplitsFromHistograms which calls ComputeBestSplitForFeature. I created all 4 methods. The first 3 are only about monotone constraints and do not belong to this file. The 4th one (ComputeBestSplitForFeature) is just basically some code that was duplicated in the version of master LightGBM I pulled, so I encapsulated the duplicated code into that method, so it does belong in the serial_tree_learner.cpp file, and is used even when there are no monotone constraints. Currently the first 3 methods can easily call the 4th one because they all belong to the same class. However, if I move the first 3 somewhere else and make them static, then they still need to call the 4th one, and I am not sure what the best way to do that is. I think I can pass SerialTreeLearner as an argument of these methods while making them static, but I am not sure if that is a good practice. |
@CharlesAuguste Could you move the updated: |
@CharlesAuguste it seems you can change the |
@guolinke I was able to work a bit more on the PR. Hopefully you will find the code clearer now. I am not sure where exactly you want me to use the nullptr to disable things when monotone constraints are not used. Can you point where you want me to make the changes in the code? Let me know what you think and how I can improve this PR. Thanks |
…le a slower but better constraining method.
// if there is a monotone split above, we need to make sure the new | ||
// values don't clash with existing constraints in the subtree, | ||
// and if they do, the existing splits need to be updated | ||
if (tree->leaf_is_in_monotone_subtree(*right_leaf) && !config_->monotone_constraints.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be check config_->monotone_constraints.empty()
first
bests[tid] = new_split; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid the extra calls in serial_tree_learner and feature_histogram as possible, when mc is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, but I don't understand what functions in particular you are referring to. I think the code, right now, when there are no constraints, is working in a very similar way to they way it used to work. But I am also happy to make the changes you request if you can be more specific about this.
@guolinke I will work on that over the next few days! Thanks |
@guolinke @shiyu1994 I did the changes I could on @guolinke 's request, but I need more guidance on the ones I commented. Please let me know how to improve the PR. Thanks |
Just out of curiosity: can this PR be split into sequence of smaller PRs? It's very hard to review PRs with a such large diff size. If this PR is splitable, splitting it will increase the speed and quality of reviews and in consequence speed up merging the proposed changes. As a good example I can refer to dmlc/xgboost#5104. |
You are right, I indeed think it would make sense to split this PR into smaller PRs. This will be require a significant effort but we do think this PR is a valuable addition to the library (and hope you agree with us on that!), and I understand that such a big PR is very hard to review for you so I am ready to spend more time on it. Here is how I could proceed:
Do you agree with the plan above? Do you see any way to improve it? Let me know what you think. I will start working on that in the next few days. Thanks |
* Move monotone constraints to the monotone_constraints files. * Add checks for debug mode. * Refactored FindBestSplitsFromHistograms. * Add headers. * fix * Update data_parallel_tree_learner.cpp * simplify ComputeBestSplitForFeature * Fix min / max issue. * Remove duplicated check. Co-authored-by: Guolin Ke <[email protected]>
#2770) * Add util functions. * Added monotone_constraints_method as a parameter. * Add the intermediate constraining method. * Updated tests. * Minor fixes. * Typo. * Linting. * Ran the parameter generator for the doc. * Removed usage of the FeatureMonotone function. * more fixes * Fix. * Remove duplicated code. * Add debug checks. * Typo. * Bug fix. * Disable the use of intermediate monotone constraints and feature sampling at the same time. * Added an alias for monotone constraining method. * Use the right variable to get the number of threads. * Fix DEBUG checks. * Add back check to determine if histogram is splittable. * Added forgotten override keywords. * Perform monotone constraint update only when necessary. * Small refactor of FastLeafConstraints. * Post rebase commit. * Small refactor. * Typo. * Added comment and slightly improved logic of monotone constraints. * Forgot a const. * Vectors that are to be modified need to be pointers. * Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation. * Remove overload of GoUpToFindLeavesToUpdate. * Stop memory leaking. * Fix cpplint issues. * Fix checks. * Fix more cpplint issues. * Refactor config monotone constraints method. * Typos. * Remove useless empty lines. * Add new line to separate includes. * Replace unsigned ind by size_t. * Reduce number of trials in tests to decrease CI time. * Specify monotone constraints better in tests. * Removed outer loop in test of monotone constraints. * Added categorical features to the monotone constraints tests. * Add blank line. * Regenerate parameters automatically. * Speed up ShouldKeepGoingLeftRight. Co-authored-by: Charles Auguste <[email protected]> Co-authored-by: guolinke <[email protected]>
…, microsoft#2717) (microsoft#2770) * Add util functions. * Added monotone_constraints_method as a parameter. * Add the intermediate constraining method. * Updated tests. * Minor fixes. * Typo. * Linting. * Ran the parameter generator for the doc. * Removed usage of the FeatureMonotone function. * more fixes * Fix. * Remove duplicated code. * Add debug checks. * Typo. * Bug fix. * Disable the use of intermediate monotone constraints and feature sampling at the same time. * Added an alias for monotone constraining method. * Use the right variable to get the number of threads. * Fix DEBUG checks. * Add back check to determine if histogram is splittable. * Added forgotten override keywords. * Perform monotone constraint update only when necessary. * Small refactor of FastLeafConstraints. * Post rebase commit. * Small refactor. * Typo. * Added comment and slightly improved logic of monotone constraints. * Forgot a const. * Vectors that are to be modified need to be pointers. * Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation. * Remove overload of GoUpToFindLeavesToUpdate. * Stop memory leaking. * Fix cpplint issues. * Fix checks. * Fix more cpplint issues. * Refactor config monotone constraints method. * Typos. * Remove useless empty lines. * Add new line to separate includes. * Replace unsigned ind by size_t. * Reduce number of trials in tests to decrease CI time. * Specify monotone constraints better in tests. * Removed outer loop in test of monotone constraints. * Added categorical features to the monotone constraints tests. * Add blank line. * Regenerate parameters automatically. * Speed up ShouldKeepGoingLeftRight. Co-authored-by: Charles Auguste <[email protected]> Co-authored-by: guolinke <[email protected]>
@CharlesAuguste thank you again for all your hard work on this! Now that we have #3264, can this pull request be closed? |
According to the following @CharlesAuguste's words,
there may be one or two more PRs related to the monotone constraints. |
@StrikerRUS I understand there may be more PRs in the future, but that doesn't mean this one needs to stay open, right? If we're never going to merge this specific PR, it should just be closed. |
@jameslamb |
@StrikerRUS @jameslamb I confirm this PR will not be merged. So I agree it can be closed. Thanks |
thanks very much! |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
+@aldanor @redditur
Currently, LightGBM is too strict in the way it enforces monotonic constraints. This PR introduces two methods; one just as computationally efficient as the current implementation, and another which is up to twice as slow, which enforces monotonic constraints in a looser way thereby improving performance of the resulting model.
Moreover, we introduce a novel way of penalizing monotonic splits which aims to reduce the loss in performance brought about by greedily splitting at the top of a decision tree with disregard to the constraints that this split may impose lower down the tree.
In the attached report we show that these methods can produce models with significantly improved performance and, therefore, recommend replacing the current approach LightGBM takes to enforcing monotonic constraints. It contains some details about the methods we implemented, and why we think they should replace the current one, including some results we generated on a public dataset to compare the methods. Please let us know if you find something unclear in the report or if you would like more information about anything.
Broken files
About the implementation, the monotone constraints should be fully working in the tree learner serial_tree_learner.cpp and serial_tree_learner.h. However, we barely modified the rest of the tree learners. Therefore, the following files will compile, but are broken:
And the following files won’t compile and broken too:
If you agree to go forward with the pull request, we think that given the work we have already done, it shouldn’t be too hard to adapt everything to the other tree learners. However, as we are not familiar with them, and would represent a lot of work for us alone so we hoped you would help for the implementation.
We think there are 3 options:
Interactions between parameters
Finally, here is the summary of how the new parameters interact with the existing program:
PR-monotone-constraints-report.pdf