-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kNN Imputation #54
kNN Imputation #54
Conversation
Codecov Report
@@ Coverage Diff @@
## master #54 +/- ##
==========================================
- Coverage 96.69% 95.97% -0.72%
==========================================
Files 12 13 +1
Lines 272 298 +26
==========================================
+ Hits 263 286 +23
- Misses 9 12 +3
Continue to review full report at Codecov.
|
|
In my opinion, broken tests should be solved in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing! I've made a few comments that should be addressed before I can merge this.
Since you seem to be comparing Impute.jl against R anyways, would you mind using RCall (or PyCall) to compare your code against an existing implementation (e.g., https://www.rdocumentation.org/packages/bnstruct/versions/1.0.6/topics/knn.impute, https://github.com/iskandr/fancyimpute)? |
I will add |
Okay, I've merged the svd implementation, if you want to rebase. I'm guessing most of the conflicts will get automerged. |
Okay, I merged with master. I will rebase it. Thank you so much for merging other branches. |
3389ac4
to
1173008
Compare
I rebased it into single commit. After passing CI, you can merge it. |
I rewrited code, however, I found some problems.
|
c67e2fd
to
fe14cb7
Compare
Can you provide a reproducible example for this?
That's correct, the random variable tests from |
I figured out why. Mean imputation to transposed array is wrong. First, it gives wrong result. Second, if doing imputation on small number of columns like iris dataset and all columns are missing, mean value also be missing. I fixed most of minor things. As you pointed out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, just a couple more style / cleanup changes and then I think this is good as a first pass.
Travis was down. Would you retrigger? If you feel okay, please let me know. I will rebase it. |
Test fails due to #61 |
Yeah, we should also drop appveyor anyways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for bearing with the review process. If there are any remaining issues they can be fixed in a future release. Might be good in the future to do a comparison table of the different methods on different datasets w/ missings.
Inspired by SVD imputation (#16), I implemented KNN imputation which closes #4
Reference