-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel PCA: C++ Algorithm Implementation #5987
base: branch-24.10
Are you sure you want to change the base?
Conversation
// All the positive eigenvalues that are too small (with a value smaller than the maximum | ||
// eigenvalue multiplied by for double precision 1e-12 (2e-7 for float)) are set to zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When removing zero eigen values, we treat values below a certain threshold as zero. See reference from sklearn: https://github.com/scikit-learn/scikit-learn/blob/b72e81af473c079ae95314efbca86557a836defa/sklearn/utils/validation.py#L1901-L1903
/ok to test |
/ok to test |
public: | ||
MLCommon::Matrix::KernelParams kernel; | ||
size_t n_training_samples = 0; | ||
bool copy = true; // TODO unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left these TODO's in here for future compatability and since paramsPCATemplate
does the same for copy
.
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I've given it a look over and I think it's shaping up well. Most of the things are minor, but important to maintenance and quality. Looking forward to having this feature.
Small question- what data scales are you targeting? The unfortunate thing about the naive implementation of kernel algorithms is that they require n^2 space, whereas there are some ways we can make them scale by computing more on the fly (and even sparsifying the kernel gramm by using nearest neighbors methods). Our Kernel APIs also allow for caching and tiling so if we can use an iterative solver, we can scale to much larger datasets without having to resort to multiple GPUs.
CUML_KERNEL void subtractMeanKernel( | ||
T* mat, const T* row_means, const T* col_means, T overall_mean, int n_rows, int n_cols) | ||
{ | ||
const int row = blockIdx.x * blockDim.x + threadIdx.x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have primitives to do this in raft and we prefer using them wherever possible to centralize impls.
const value_t* sqrt_vals, | ||
size_t n_training_samples, | ||
size_t n_components) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please utilize primitives in RAFT to do this instead of writing raw kernels.
|
||
#include <raft/core/handle.hpp> | ||
#include <raft/distance/kernels.cuh> | ||
#include <raft/linalg/detail/cublas_wrappers.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there are still a few places in cuML that are using these wrappers but these are in a detail namespace for a reason. We should only be using public API functions from RAFT (and exposing addiitonal ones if needed).
// Step 2: Compute overall mean | ||
value_t overall_mean; | ||
thrust::device_ptr<value_t> d_kernel_mat_ptr(kernel_mat.data()); | ||
value_t sum = thrust::reduce(thrust_policy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use corresponding primitives in RAFT to do this. In general we centralize computations like this in RAFT so that we can optimize and update them when needed (since we have control over RAFT APIs / implementations but not Thrust).
rmm::device_uvector<value_t> fitted_kernel_mat(prms.n_training_samples * prms.n_training_samples, | ||
stream); | ||
auto thrust_policy = rmm::exec_policy(stream); | ||
thrust::fill(thrust_policy, kernel_mat.begin(), kernel_mat.end(), 0.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use raft::matrix::fill
for this.
// Step 3: Compute overall mean | ||
value_t overall_mean; | ||
thrust::device_ptr<value_t> d_kernel_mat_ptr(fitted_kernel_mat.data()); | ||
value_t sum = thrust::reduce(thrust_policy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use raft for reductions.
|
||
#include <cuml/decomposition/params.hpp> | ||
|
||
#include <raft/linalg/detail/cublas_wrappers.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I knkow I mentioned earlier not to use these wrappers, and it actually looks like the wrappers aren't being used in your code anywhere. Can you please go through these imports (across your PR) and make sure you are only importanting what is being used?
int algo = 1; | ||
std::vector<float> data_h = {1.0, 2.0, 5.0, 4.0, 2.0, 1.0}; | ||
|
||
raft::distance::kernels::KernelParams lin_kern = {raft::distance::kernels::LINEAR, 0, 0, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding the values here, I think we can use our existing PCA as a source of ground truth and validate our results match when we compute the kernel gramm + PCA. That would at least remove a level of hardcoding. This is really hard to maintain. Ideally we'll eventually update our existing PCA tests to compare against a naive PCA solver (like simple manual power iteration) so that we can remove the hardcoding there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean doing something like this?
import numpy as np
from sklearn.decomposition import PCA, KernelPCA
from sklearn.metrics.pairwise import rbf_kernel
X = np.array([
[1.0, 4.0],
[2.0, 2.0],
[5.0, 1.0],
[0.0, 3.0],
[3.0, 5.0],
[6.0, 2.0],
[7.0, 8.0],
[8.0, 1.0],
[9.0, 6.0],
[4.0, 7.0]
])
gamma = 0.5
K = rbf_kernel(X, gamma=gamma)
N = K.shape[0]
one_n = np.ones((N, N)) / N
K_centered = K - one_n @ K - K @ one_n + one_n @ K @ one_n
pca = PCA(n_components=2)
X_pca = pca.fit_transform(K_centered)
kpca = KernelPCA(kernel='rbf', gamma=gamma, n_components=2)
X_kpca = kpca.fit_transform(X)
# Results
print("PCA Transformed Data:", X_pca, "KPCA Transformed Data:", X_kpca, sep="\n",end="\n\n")
print("PCA explained_variance_:", pca.explained_variance_, "KPCA eigenvalues_:", kpca.eigenvalues_, sep="\n", end="\n\n")
print("PCA components_:", pca.components_, "KPCA eigenvectors_:", kpca.eigenvectors_, sep="\n", end="\n\n")
Looking at the results I see that transformed data and eigenvalues aren't the same between pca/kpca. I haven't had the chance to dig deeper, but I'm wondering if there could be differences in the two algorithms that makes this approach not feasible.
As an alternative, I could rewrite the tests, so we can remove about a lot of the duplicate code for better readability.
Maybe another way to make the test cleaner would be to read the input and expected output from a file. The file could be the output from a Python script using sklearn.KernelPCA.
Thank you for the review. I have started incorporating most of the requested changes, but still working one replacing one of the raw kernels and tests. I just hit a busy period, so I probably won't be able to make the updates until beginning of October. Regarding your question on targeted data scales: I observed the algorithm being memory bound in benchmarking. It would be great to be able to support larger matrices. Could we aim for this for V2 of the algorithm? I might need some guidance on the changes needed for kernel caching + tiling. |
Description
Worked with @garrisonhess to add C++ code for KernelPCA. This implementation of Kernel PCA supports three functions with float/double matrix input:
Feature request: #1317
Tests were performed on an EC2
g4dn.xlarge
instance with CUDA 12.2.Click here to see environment details
Notes for Reviewers
Cython wrapper PR contains more varied unit and manual tests: #5988
C++ unit test reference
Sklearn script to generate C++ test reference data
Output from script and C++ expected values are below. Note that C++ matrices are in column order while Python w. numpy uses row order.
Sklearn linear kernel:
C++ test reference:
Sklearn poly kernel:
C++ test reference:
Sklearn rbf kernel:
C++ test reference:
Definition of Done Criteria Checklist
C++ Checklist
Design
(optional) Public API contains an Scikit-learn-like stateful wrapper around the stateless APITesting
Prims: GTests with different inputsDocumentation
Unit test results
C++ Test Results