Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow test/query data to be used with Transform() API call #38

Open
Stevod opened this issue May 23, 2022 · 1 comment
Open

allow test/query data to be used with Transform() API call #38

Stevod opened this issue May 23, 2022 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@Stevod
Copy link

Stevod commented May 23, 2022

Currently, it is not obvious how to apply a fitted PaCMAP model to a test dataset. Although a transform() call is available, it is not obvious how that is used, nor the applicable syntax, so some documentation on that would be appreciated.

@hyhuang00 hyhuang00 self-assigned this May 23, 2022
@hyhuang00 hyhuang00 added the documentation Improvements or additions to documentation label May 26, 2022
@hyhuang00
Copy link
Collaborator

hyhuang00 commented May 28, 2022

Currently the documentation website is under construction, but the docstrings are already available within the source code. I will copy paste the documentation for the transform() method here as a reference:

Projects a high dimensional dataset into existing embedding space and return the embedding.

    Parameters
    ---------
    X: numpy.ndarray
        The new high-dimensional dataset that is being projected. 
        An embedding will get created based on parameters of the PaCMAP instance.

    basis: numpy.ndarray
        The original dataset that have already been applied during the `fit` or `fit_transform` process.
        If `save_tree == False`, then the basis is required to reconstruct the ANNOY tree instance.
        If `save_tree == True`, then it's unnecessary to provide the original dataset again.

    init: str, optional
        One of ['pca', 'random']. Initialization of the embedding, default='pca'.
        If 'pca', then the low dimensional embedding is initialized to the PCA mapped dataset. 
        The PCA instance will be the same one that was applied to the original dataset during the `fit` or `fit_transform` process. 
        If 'random', then the low dimensional embedding is initialized with a Gaussian distribution.

    save_pairs: bool, optional
        Whether to save the pairs that are sampled from the dataset. Useful for reproducing results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants