A scikit-learn
compatible Python 3 package for Sufficient Dimension Reduction.
This class implements a set of methods to perform Sufficient Dimension Reduction .
Sufficient Dimension Reduction(SDR) is a general framework which aims to capture all the relevant information in high dimensional data. This capture of information is based on the notion that a combination of the predictors provides all the relevant information on the response, so that the rest of the predictors can be ignored.
The different SDR methods implemented in this class are :
dcov-sdr
: Sufficient Dimension Reduction via Distance Covariancemdd-sdr
: Sufficient Dimension Reduction via Martingale Difference Divergencesir
: Sliced Inverse Regressionsave
: Sliced Average Variance Estimationdr
: Directional Regressionphd
: Principal Hessian Directionsiht
: Iterative Hessian Transformation
User defined functions can also be maximised by the method explained in [1]. For more details on how to use the implemented SDR methods and how to use user defined functions, have a look at the sudire example notebook
The sudire
class also allows for estimation of the central subspace by optimizing an objective function . The optimization is performed using the Interior Point Optimizer (IPOPT) which is part of the COIN-OR project
Remarks:
- all the methods contained in this package have been designed for continuous data. Categorical or textual data first needs to be one hot encoded or embedded.
The code is aligned to scikit-learn
, such that modules such as GridSearchCV
can flawlessly be applied to it.
The sudire
folder contains
- The estimator (
sudire.py
) - Plotting functions for fitted sudire objects (
sudire_plot.py
) - Ancillary functions for sufficient dimension reduction (
_sudire_utils.py
)
- From
sklearn.base
:BaseEstimator
,TransformerMixin
,RegressorMixin
- From
sklearn.utils
:_BaseComposition
copy
- From
scipy.stats
:trim_mean
- From
scipy.linalg
:inv
,sqrtm
cython
- From
ipopt
:minimize_ipopt
numpy
- From
statsmodels.regression.linear_model
:OLS
statsmodels.robust
sufdirmeth
, function or string. one of the elements in the list of implemented SDR methods or user defined function.n_components
, int. dimension of the central subspace.trimming
, float. trimming percentage for projection index, to be entered as pct/100optimizer_options
: dict with options to pass on to the ipopt optimizer.tol
: int: relative convergence tolerance.max_iter
: int. Maximal number of iterations.constr_viol_tol
: Desired threshold for the constraint violation.
optimizer_constraints
: dict or list of dicts, further constraints to be passed on to the optimizer function.center
, str. How to center the data. options accepted are options fromdirepack
'sVersatileScaler
.center_data
, bool.scale_data
, bool. Note: if set toFalse
, convergence to correct optimum is not a given. Will throw a warning.whiten_data
, bool.compression
, bool. IfTrue
, an internal SVD compression step is used for flat data tables (p > n). Speds up the calculations.copy
, bool. Whether to make a deep copy of the input data or not.verbose
, bool. Set toTrue
prints the iteration number.return_scaling_object
, bool. Note: parameters concerning the data can also be passed to thefit
method.
Attributes always provided
x_loadings_
: Estimated basis of the central subsapcex_scores_
: The projected X data.x_loc_
: location estimate for Xx_sca_
: scale estimate for Xols_obj
: fitted OLS objectedy_loc_
: y location estimatey_sca_
: y scale estimate
Attributes created only when corresponding input flags are True
:
whitening_
: whitened data matrix (usually denoted K)scaling_object_
: scaling object fromVersatileScaler
fit(X, *args, **kwargs)
: fit modelpredict(X)
: make predictions based on fittransform(X)
: project X onto latent spacegetattr()
: get list of attributessetattr(*kwargs)
: set individual attribute of sprm object
The fit
function takes several optional input arguments for user defined objective functions.
- Sufficient Dimension Reduction via Distance Covariance, Wenhui Sheng and Xiangrong Yin in: Journal of Computational and Graphical Statistics (2016), 25, issue 1, pages 91-104.
- A martingale-difference-divergence-based estimation of central mean subspace, Yu Zhang, Jicai Liu, Yuesong Wu and Xiangzhong Fang, in: Statistics and Its Interface (2019), 12, number 3, pages 489-501.
- Sliced Inverse Regression for Dimension Reduction Li K-C, Journal of the American Statistical Association (1991), 86, 316-327.
- Sliced Inverse Regression for Dimension Reduction: Comment, R.D. Cook, and Sanford Weisberg, Journal of the American Statistical Association (1991), 86, 328-332.
- On directional regression for dimension reduction , B. Li and S.Wang, Journal of the American Statistical Association (2007), 102:997–1008.
- On principal hessian directions for data visualization and dimension reduction:Another application of stein’s lemma, K.-C. Li. , Journal of the American Statistical Association(1992)., 87,1025–1039.
- Dimension Reduction for Conditional Mean in Regression, R. D. Cook and B. Li., The Annals of Statistics(2002)30(2):455–474.
- Robust Sufficient Dimension Reduction Via Ball Covariance Jia Zhang and Xin Chen, Computational Statistics and Data Analysis 140 (2019) 144–154