-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVD imputation #16
SVD imputation #16
Conversation
Codecov Report
@@ Coverage Diff @@
## master #16 +/- ##
=========================================
+ Coverage 96.58% 96.69% +0.1%
=========================================
Files 11 12 +1
Lines 234 272 +38
=========================================
+ Hits 226 263 +37
- Misses 8 9 +1
Continue to review full report at Codecov.
|
Bump! Any update for this PR? |
* inspired by SVD imputation (invenia#16)
* inspired by SVD imputation (invenia#16)
* inspired by SVD imputation (invenia#16)
May I extend your svd branch to adjust current API? This branch show me insights how to test multivariate imputation and how to write those method to fit Let TODO list to be completed in future (It's been too old) and let's just push PR if test passed. Reference
|
I've been reluctant to push this in because we need to do some refactoring of the API. In general, we have several competing interests for how the imputation API should work.
I think you're probably right that we should probably just update this PR to get it working well enough to tag a release. It'll just mean that the refactoring will require more work. |
What you are mentioned needs too much works. This PR have been a year and if we wait those list to be implemented it would be forever. Some methods would not be applicable for all types, let's just restrict supported types only for Arrays and extend it if possible. |
That seems like a bit of an exaggeration, but sure.
That's largely what we've been trying to do, but folks have been confused about which permutations are supported in the past, so simply leaving it as a method error doesn't seem ideal. This PR was largely an experiment on my part and I tended to have mixed results in terms of performance. Is there an application where you're wanting to use this SVD method or is it just that you'd like to have more methods generally available in this package? |
My answer is both. I want to impute some time series for preprocessing my research data. I use Julia as my main project language. I could use R or Python only for preprocessing, but I prefer integrated structure from preprocessing to postprocessing. That's why I want to use this package because this seems only usable package for imputation Then, I found this and I liked this project because it is so simple to use. However, there are only simple methods (univariate imputation only) to use. Because, I wish this package become bigger project like mice in R, so I thought it would be great if several imputation methods such as SVD, kNN, bPCA are implemented in this repo. |
…l Loosen tests a little.
Alright, most of the TODO items are resolved. I'm fine to merge this as is, but I'll likely need to revisit this during some up-coming refactoring. |
An implementation of SVD imputation which uses an EM based algorithm.
Steps:
svd
is computed for the initialized dataset and a low rank approximation is generatedCurrently, the rank of the approximations increases gradually, but I'm open to other suggestions (references). This PR also includes a couple smoke tests to self document what types of data this method would work well on. For example, datasets with a large number of correlated variables where a small subset of the eigen values explain most of the variance.
TODO:
Closes #7