Skip to content

Commit

Permalink
Improving monotone constraints ("Fast" method; linked to microsoft#2305
Browse files Browse the repository at this point in the history
…, microsoft#2717)  (microsoft#2770)

* Add util functions.

* Added monotone_constraints_method as a parameter.

* Add the intermediate constraining method.

* Updated tests.

* Minor fixes.

* Typo.

* Linting.

* Ran the parameter generator for the doc.

* Removed usage of the FeatureMonotone function.

* more fixes

* Fix.

* Remove duplicated code.

* Add debug checks.

* Typo.

* Bug fix.

* Disable the use of intermediate monotone constraints and feature sampling at the same time.

* Added an alias for monotone constraining method.

* Use the right variable to get the number of threads.

* Fix DEBUG checks.

* Add back check to determine if histogram is splittable.

* Added forgotten override keywords.

* Perform monotone constraint update only when necessary.

* Small refactor of FastLeafConstraints.

* Post rebase commit.

* Small refactor.

* Typo.

* Added comment and slightly improved logic of monotone constraints.

* Forgot a const.

* Vectors that are to be modified need to be pointers.

* Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation.

* Remove overload of GoUpToFindLeavesToUpdate.

* Stop memory leaking.

* Fix cpplint issues.

* Fix checks.

* Fix more cpplint issues.

* Refactor config monotone constraints method.

* Typos.

* Remove useless empty lines.

* Add new line to separate includes.

* Replace unsigned ind by size_t.

* Reduce number of trials in tests to decrease CI time.

* Specify monotone constraints better in tests.

* Removed outer loop in test of monotone constraints.

* Added categorical features to the monotone constraints tests.

* Add blank line.

* Regenerate parameters automatically.

* Speed up ShouldKeepGoingLeftRight.

Co-authored-by: Charles Auguste <[email protected]>
Co-authored-by: guolinke <[email protected]>
  • Loading branch information
3 people authored and jameslamb committed Mar 24, 2020
1 parent 45075d1 commit a88cf30
Show file tree
Hide file tree
Showing 11 changed files with 588 additions and 46 deletions.
10 changes: 10 additions & 0 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,16 @@ Learning Control Parameters

- you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature

- ``monotone_constraints_method`` :raw-html:`<a id="monotone_constraints_method" title="Permalink to this parameter" href="#monotone_constraints_method">&#x1F517;&#xFE0E;</a>`, default = ``basic``, type = string, aliases: ``monotone_constraining_method``, ``mc_method``

- used only if ``monotone_constraints`` is set

- monotone constraints method

- ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions

- ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results

- ``feature_contri`` :raw-html:`<a id="feature_contri" title="Permalink to this parameter" href="#feature_contri">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-double, aliases: ``feature_contrib``, ``fc``, ``fp``, ``feature_penalty``

- used to control feature's split gain, will use ``gain[i] = max(0, feature_contri[i]) * gain[i]`` to replace the split gain of i-th feature
Expand Down
7 changes: 7 additions & 0 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,13 @@ struct Config {
// desc = you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature
std::vector<int8_t> monotone_constraints;

// alias = monotone_constraining_method, mc_method
// desc = used only if ``monotone_constraints`` is set
// desc = monotone constraints method
// descl2 = ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions
// descl2 = ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results
std::string monotone_constraints_method = "basic";

// type = multi-double
// alias = feature_contrib, fc, fp, feature_penalty
// default = None
Expand Down
24 changes: 22 additions & 2 deletions include/LightGBM/tree.h
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,28 @@ class Tree {

inline double split_gain(int split_idx) const { return split_gain_[split_idx]; }

inline double internal_value(int node_idx) const {
return internal_value_[node_idx];
}

inline bool IsNumericalSplit(int node_idx) const {
return !GetDecisionType(decision_type_[node_idx], kCategoricalMask);
}

inline int left_child(int node_idx) const { return left_child_[node_idx]; }

inline int right_child(int node_idx) const { return right_child_[node_idx]; }

inline int split_feature_inner(int node_idx) const {
return split_feature_inner_[node_idx];
}

inline int leaf_parent(int leaf_idx) const { return leaf_parent_[leaf_idx]; }

inline uint32_t threshold_in_bin(int node_idx) const {
return threshold_in_bin_[node_idx];
}

/*! \brief Get the number of data points that fall at or below this node*/
inline int data_count(int node) const { return node >= 0 ? internal_count_[node] : leaf_count_[~node]; }

Expand Down Expand Up @@ -436,7 +458,6 @@ inline void Tree::Split(int leaf, int feature, int real_feature,
// add new node
split_feature_inner_[new_node_idx] = feature;
split_feature_[new_node_idx] = real_feature;

split_gain_[new_node_idx] = gain;
// add two new leaves
left_child_[new_node_idx] = ~leaf;
Expand Down Expand Up @@ -544,7 +565,6 @@ inline int Tree::GetLeafByMap(const std::unordered_map<int, double>& feature_val
return ~node;
}


} // namespace LightGBM

#endif // LightGBM_TREE_H_
11 changes: 11 additions & 0 deletions src/io/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,17 @@ void Config::CheckParamConflict() {
force_col_wise = true;
force_row_wise = false;
}
if (is_parallel && monotone_constraints_method == std::string("intermediate")) {
// In distributed mode, local node doesn't have histograms on all features, cannot perform "intermediate" monotone constraints.
Log::Warning("Cannot use \"intermediate\" monotone constraints in parallel learning, auto set to \"basic\" method.");
monotone_constraints_method = "basic";
}
if (feature_fraction_bynode != 1.0 && monotone_constraints_method == std::string("intermediate")) {
// "intermediate" monotone constraints need to recompute splits. If the features are sampled when computing the
// split initially, then the sampling needs to be recorded or done once again, which is currently not supported
Log::Warning("Cannot use \"intermediate\" monotone constraints with feature fraction different from 1, auto set monotone constraints to \"basic\" method.");
monotone_constraints_method = "basic";
}
}

std::string Config::ToString() const {
Expand Down
6 changes: 6 additions & 0 deletions src/io/config_auto.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"topk", "top_k"},
{"mc", "monotone_constraints"},
{"monotone_constraint", "monotone_constraints"},
{"monotone_constraining_method", "monotone_constraints_method"},
{"mc_method", "monotone_constraints_method"},
{"feature_contrib", "feature_contri"},
{"fc", "feature_contri"},
{"fp", "feature_contri"},
Expand Down Expand Up @@ -215,6 +217,7 @@ const std::unordered_set<std::string>& Config::parameter_set() {
"max_cat_to_onehot",
"top_k",
"monotone_constraints",
"monotone_constraints_method",
"feature_contri",
"forcedsplits_filename",
"refit_decay_rate",
Expand Down Expand Up @@ -414,6 +417,8 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
monotone_constraints = Common::StringToArray<int8_t>(tmp_str, ',');
}

GetString(params, "monotone_constraints_method", &monotone_constraints_method);

if (GetString(params, "feature_contri", &tmp_str)) {
feature_contri = Common::StringToArray<double>(tmp_str, ',');
}
Expand Down Expand Up @@ -633,6 +638,7 @@ std::string Config::SaveMembersToString() const {
str_buf << "[max_cat_to_onehot: " << max_cat_to_onehot << "]\n";
str_buf << "[top_k: " << top_k << "]\n";
str_buf << "[monotone_constraints: " << Common::Join(Common::ArrayCast<int8_t, int>(monotone_constraints), ",") << "]\n";
str_buf << "[monotone_constraints_method: " << monotone_constraints_method << "]\n";
str_buf << "[feature_contri: " << Common::Join(feature_contri, ",") << "]\n";
str_buf << "[forcedsplits_filename: " << forcedsplits_filename << "]\n";
str_buf << "[refit_decay_rate: " << refit_decay_rate << "]\n";
Expand Down
3 changes: 2 additions & 1 deletion src/io/tree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ Tree::~Tree() {

int Tree::Split(int leaf, int feature, int real_feature, uint32_t threshold_bin,
double threshold_double, double left_value, double right_value,
int left_cnt, int right_cnt, double left_weight, double right_weight, float gain, MissingType missing_type, bool default_left) {
int left_cnt, int right_cnt, double left_weight, double right_weight, float gain,
MissingType missing_type, bool default_left) {
Split(leaf, feature, real_feature, left_value, right_value, left_cnt, right_cnt, left_weight, right_weight, gain);
int new_node_idx = num_leaves_ - 1;
decision_type_[new_node_idx] = 0;
Expand Down
13 changes: 12 additions & 1 deletion src/treelearner/leaf_splits.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ class LeafSplits {
}

/*!
* \brief Init split on current leaf on partial data.
* \param leaf Index of current leaf
* \param data_partition current data partition
Expand All @@ -45,6 +44,18 @@ class LeafSplits {
sum_hessians_ = sum_hessians;
}

/*!
* \brief Init split on current leaf on partial data.
* \param leaf Index of current leaf
* \param sum_gradients
* \param sum_hessians
*/
void Init(int leaf, double sum_gradients, double sum_hessians) {
leaf_index_ = leaf;
sum_gradients_ = sum_gradients;
sum_hessians_ = sum_hessians;
}

/*!
* \brief Init splits on current leaf, it will traverse all data to sum up the results
* \param gradients
Expand Down
Loading

0 comments on commit a88cf30

Please sign in to comment.