-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less restrictive monotone constraints #2305
Changes from all commits
1c47fac
d56ff9b
b65ace1
cb9917f
8fea762
d412ca6
37010ce
3ff354d
d8563da
10c5cbb
d69e4ff
e4ec6a0
3ea72af
692af0d
7aaec4a
21b8e32
c8cea6f
1b0713d
742c8e0
e50b141
c49640e
5e20b4b
61ad8d6
f020bfb
1e30c19
d2de571
e1125a9
dbe74f2
f1717b6
b0202e1
54b5d47
033de6f
7272a35
dbc3f07
9593c3c
6040cbd
3f10afe
ed61bab
f6eb56a
eb2c412
8d6ef50
ceae2bf
b632791
d291b99
c0f1cc0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,7 +60,7 @@ class Tree { | |
int Split(int leaf, int feature, int real_feature, uint32_t threshold_bin, | ||
double threshold_double, double left_value, double right_value, | ||
int left_cnt, int right_cnt, double left_weight, double right_weight, | ||
float gain, MissingType missing_type, bool default_left); | ||
float gain, MissingType missing_type, bool default_left, bool feature_is_monotone); | ||
|
||
/*! | ||
* \brief Performing a split on tree leaves, with categorical feature | ||
|
@@ -80,9 +80,14 @@ class Tree { | |
* \param gain Split gain | ||
* \return The index of new leaf. | ||
*/ | ||
int SplitCategorical(int leaf, int feature, int real_feature, const uint32_t* threshold_bin, int num_threshold_bin, | ||
const uint32_t* threshold, int num_threshold, double left_value, double right_value, | ||
int left_cnt, int right_cnt, double left_weight, double right_weight, float gain, MissingType missing_type); | ||
|
||
int SplitCategorical(int leaf, int feature, int real_feature, | ||
const uint32_t *threshold_bin, int num_threshold_bin, | ||
const uint32_t *threshold, int num_threshold, | ||
double left_value, double right_value, int left_cnt, | ||
int right_cnt, double left_weight, double right_weight, | ||
float gain, MissingType missing_type, | ||
bool feature_is_monotone); | ||
|
||
/*! \brief Get the output of one leaf */ | ||
inline double LeafOutput(int leaf) const { return leaf_value_[leaf]; } | ||
|
@@ -124,6 +129,24 @@ class Tree { | |
inline int PredictLeafIndex(const double* feature_values) const; | ||
inline int PredictLeafIndexByMap(const std::unordered_map<int, double>& feature_values) const; | ||
|
||
// Get node parent | ||
inline int node_parent(int node_idx) const; | ||
// Get leaf parent | ||
inline int leaf_parent(int node_idx) const; | ||
|
||
// Get children | ||
inline int left_child(int node_idx) const; | ||
inline int right_child(int node_idx) const; | ||
|
||
// Get if the feature is in a monotone subtree | ||
inline bool leaf_is_in_monotone_subtree(int leaf_idx) const; | ||
|
||
inline double internal_value(int node_idx) const; | ||
|
||
inline uint32_t threshold_in_bin(int node_idx) const; | ||
|
||
// Get the feature corresponding to the split | ||
inline int split_feature_inner(int node_idx) const; | ||
|
||
inline void PredictContrib(const double* feature_values, int num_features, double* output); | ||
|
||
|
@@ -302,8 +325,10 @@ class Tree { | |
} | ||
} | ||
|
||
inline void Split(int leaf, int feature, int real_feature, double left_value, double right_value, int left_cnt, int right_cnt, | ||
double left_weight, double right_weight, float gain); | ||
inline void Split(int leaf, int feature, int real_feature, double left_value, | ||
double right_value, int left_cnt, int right_cnt, double left_weight, | ||
double right_weight,float gain, bool feature_is_monotone); | ||
|
||
/*! | ||
* \brief Find leaf index of which record belongs by features | ||
* \param feature_values Feature value of this record | ||
|
@@ -402,12 +427,22 @@ class Tree { | |
std::vector<int> leaf_depth_; | ||
double shrinkage_; | ||
int max_depth_; | ||
// add parent node information | ||
std::vector<int> node_parent_; | ||
// Keeps track of the monotone splits above the leaf | ||
std::vector<bool> leaf_is_in_monotone_subtree_; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we really need to store these states into tree model? Or other structures only for MC? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. another suggestion: it seems there are lots of changes in FeatureHistogram. This will hurt the readability of current codes. Could we decouple MC with other parts? Like an independent class, which could be called in FeatureHistogram. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We were very careful to implement this in a way that limited the impact on readability and, in my opinion, I think that we did a good job in this regard, given the complexity of what's being implemented. In fact, in my opinion, the current implementation of monotonic constraints isn't particularly readable. If anything, we've made it more readable, although, obviously, more complex. If you have any specific examples where you feel readability has been impacted let us know and we can discuss it on an example-by-example basis.
We could, but I'm not convinced this would make things substantially more readable, and it would be some effort on our part to make such changes. It could even hurt readability due to increased indirection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, actually I mean the maintainability. All implementation of MC is in one place, not distributed in the project. And i think good maintainability is also more readable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will not be able to work on this during this week and next week, sorry about that. But I will make the necessary changes to improve the code after that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @guolinke about the new states in the Tree class which would cause compatibility issues. First, the states are absolutely necessary to the functioning of monotone constraints as we implemented them, there is no way we can do without. And I think that these are sensible states that should belong to the tree class (I could even see these reused for other purposes). An alternative I can think about would be to create an alternative class which would look like tree, but with these states only, and this class would be used whenever the Tree class is currently used. But that would create some redundancies, and would probably not be good for readability. Since these fields currently won't be used if monotone constraints are not used, is it not possible to solve the compatibility problem instead? I am not sure when these issues would arise, but maybe there is a way to specify somewhere that if format don't match, then these fields should remain void, or be filled with dummy values. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @CharlesAuguste I see. could we have these states in tree class, but not save and load them into model file? If this is possible, it is okay as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just double-check the code, it seems the new states are not in model file. So I think it is okay. |
||
}; | ||
|
||
inline void Tree::Split(int leaf, int feature, int real_feature, | ||
double left_value, double right_value, int left_cnt, int right_cnt, | ||
double left_weight, double right_weight, float gain) { | ||
double left_weight, double right_weight, float gain, bool feature_is_monotone) { | ||
int new_node_idx = num_leaves_ - 1; | ||
|
||
// Update if there is a monotone split above the leaf | ||
if (feature_is_monotone || leaf_is_in_monotone_subtree_[leaf]) { | ||
leaf_is_in_monotone_subtree_[leaf] = true; | ||
leaf_is_in_monotone_subtree_[num_leaves_] = true; | ||
} | ||
// update parent info | ||
int parent = leaf_parent_[leaf]; | ||
if (parent >= 0) { | ||
|
@@ -421,6 +456,7 @@ inline void Tree::Split(int leaf, int feature, int real_feature, | |
// add new node | ||
split_feature_inner_[new_node_idx] = feature; | ||
split_feature_[new_node_idx] = real_feature; | ||
node_parent_[new_node_idx] = parent; | ||
|
||
split_gain_[new_node_idx] = gain; | ||
// add two new leaves | ||
|
@@ -529,6 +565,41 @@ inline int Tree::GetLeafByMap(const std::unordered_map<int, double>& feature_val | |
return ~node; | ||
} | ||
|
||
inline int Tree::node_parent(int node_idx) const{ | ||
return node_parent_[node_idx]; | ||
} | ||
|
||
inline int Tree::left_child(int node_idx) const{ | ||
return left_child_[node_idx]; | ||
} | ||
|
||
inline int Tree::right_child(int node_idx) const{ | ||
return right_child_[node_idx]; | ||
} | ||
|
||
inline int Tree::split_feature_inner(int node_idx) const{ | ||
return split_feature_inner_[node_idx]; | ||
} | ||
|
||
inline int Tree::leaf_parent(int node_idx) const{ | ||
return leaf_parent_[node_idx]; | ||
} | ||
|
||
inline uint32_t Tree::threshold_in_bin(int node_idx) const{ | ||
#ifdef DEBUG | ||
CHECK(node_idx >= 0); | ||
#endif | ||
return threshold_in_bin_[node_idx]; | ||
} | ||
|
||
inline bool Tree::leaf_is_in_monotone_subtree(int leaf_idx) const { | ||
return leaf_is_in_monotone_subtree_[leaf_idx]; | ||
} | ||
|
||
inline double Tree::internal_value(int node_idx) const { | ||
return internal_value_[node_idx]; | ||
} | ||
|
||
|
||
} // namespace LightGBM | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ | |
* This file is auto generated by LightGBM\helpers\parameter_generator.py from LightGBM\include\LightGBM\config.h file. | ||
*/ | ||
#include<LightGBM/config.h> | ||
#include <LightGBM/utils/log.h> | ||
namespace LightGBM { | ||
std::unordered_map<std::string, std::string> Config::alias_table({ | ||
{"config_file", "config"}, | ||
|
@@ -80,6 +81,8 @@ std::unordered_map<std::string, std::string> Config::alias_table({ | |
{"lambda", "lambda_l2"}, | ||
{"min_split_gain", "min_gain_to_split"}, | ||
{"rate_drop", "drop_rate"}, | ||
{"monotone_splits_penalty", "monotone_penalty"}, | ||
{"monotone_constraints_precise_mode", "monotone_precise_mode"}, | ||
{"topk", "top_k"}, | ||
{"mc", "monotone_constraints"}, | ||
{"monotone_constraint", "monotone_constraints"}, | ||
|
@@ -199,6 +202,8 @@ std::unordered_set<std::string> Config::parameter_set({ | |
"lambda_l2", | ||
"min_gain_to_split", | ||
"drop_rate", | ||
"monotone_penalty", | ||
"monotone_precise_mode", | ||
"max_drop", | ||
"skip_drop", | ||
"xgboost_dart_mode", | ||
|
@@ -399,8 +404,21 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str | |
|
||
if (GetString(params, "monotone_constraints", &tmp_str)) { | ||
monotone_constraints = Common::StringToArray<int8_t>(tmp_str, ','); | ||
Log::Warning("The constraining method was just changed, which could significantly affect results of the algorithm"); | ||
} | ||
|
||
GetDouble(params, "monotone_penalty", &monotone_penalty); | ||
bool constraints_exist = false; | ||
for (auto it = monotone_constraints.begin(); it != monotone_constraints.end(); | ||
it++) { | ||
if (*it != 0) { | ||
constraints_exist = true; | ||
} | ||
} | ||
CHECK(monotone_penalty == 0 || constraints_exist); | ||
CHECK(max_depth <= 0 || monotone_penalty < max_depth); | ||
CHECK(monotone_penalty >= 0.0); | ||
|
||
if (GetString(params, "feature_contri", &tmp_str)) { | ||
feature_contri = Common::StringToArray<double>(tmp_str, ','); | ||
} | ||
|
@@ -476,6 +494,10 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str | |
|
||
GetBool(params, "use_missing", &use_missing); | ||
|
||
GetBool(params, "monotone_precise_mode", &monotone_precise_mode); | ||
CHECK(!monotone_precise_mode || !use_missing); | ||
CHECK(!monotone_precise_mode || constraints_exist); | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this file is auto generated, you cannot edit it by hand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I understand, I locally tested if these checks worked and they did. What do you mean by the file being auto generated? I am not very familiar with how the library is built I am sorry. |
||
GetBool(params, "zero_as_missing", &zero_as_missing); | ||
|
||
GetBool(params, "two_round", &two_round); | ||
|
@@ -607,6 +629,8 @@ std::string Config::SaveMembersToString() const { | |
str_buf << "[lambda_l2: " << lambda_l2 << "]\n"; | ||
str_buf << "[min_gain_to_split: " << min_gain_to_split << "]\n"; | ||
str_buf << "[drop_rate: " << drop_rate << "]\n"; | ||
str_buf << "[monotone_penalty: " << monotone_penalty << "]\n"; | ||
str_buf << "[monotone_precise_mode: " << monotone_precise_mode << "]\n"; | ||
str_buf << "[max_drop: " << max_drop << "]\n"; | ||
str_buf << "[skip_drop: " << skip_drop << "]\n"; | ||
str_buf << "[xgboost_dart_mode: " << xgboost_dart_mode << "]\n"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
feature_is_monotone
needed?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it is not needed as we don't support categorical features. But that makes it more consistent with the standard split, and in the future maybe we will make categorical variables supported for monotonic constraints. Should I remove it?