Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: k >= n When there k = 2 and n = 3 (exact=False) #80

Closed
sergeyf opened this issue Jun 5, 2023 · 1 comment
Closed

ValueError: k >= n When there k = 2 and n = 3 (exact=False) #80

sergeyf opened this issue Jun 5, 2023 · 1 comment

Comments

@sergeyf
Copy link

sergeyf commented Jun 5, 2023

Hello,

Thanks for the great package! Here is an example of a failure when there are enough samples, but the model complains that there are not. Works fine when exact=True

import numpy as np
import genieclust

X = np.zeros((3, 768))
k = 2
g = genieclust.Genie(n_clusters=k, gini_threshold=0.01, exact=False)
labels = g.fit_predict(X)

Error trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[328], line 7
      5 k = 2
      6 g = genieclust.Genie(n_clusters=k, gini_threshold=0.01, exact=False)
----> 7 labels = g.fit_predict(X)

File .../lib/python3.8/site-packages/genieclust/genie.py:548, in GenieBase.fit_predict(self, X, y)
    520 def fit_predict(self, X, y=None):
    521     """
    522     Perform cluster analysis of a dataset and return the predicted labels.
    523 
   (...)
    546 
    547     """
--> 548     self.fit(X)
    549     return self.labels_

File .../lib/python3.8/site-packages/genieclust/genie.py:1051, in Genie.fit(self, X, y)
    972 """
    973 Perform cluster analysis of a dataset.
    974 
   (...)
   1047 
   1048 """
   1049 cur_state = self._check_params()  # re-check, they might have changed
-> 1051 cur_state = self._get_mst(X, cur_state)
   1053 if cur_state["verbose"]:
   1054     print("[genieclust] Determining clusters with Genie++.", file=sys.stderr)

File .../lib/python3.8/site-packages/genieclust/genie.py:511, in GenieBase._get_mst(self, X, cur_state)
    509     cur_state = self._get_mst_exact(X, cur_state)
    510 else:
--> 511     cur_state = self._get_mst_approx(X, cur_state)
    513 # this might be an "intrinsic" dimensionality:
    514 self.n_features_  = cur_state["n_features"]

File .../lib/python3.8/site-packages/genieclust/genie.py:484, in GenieBase._get_mst_approx(self, X, cur_state)
    480     d_core = internal.get_d_core(nn_dist, nn_ind, cur_state["M"])
    483 if mst_dist is None or mst_ind is None:
--> 484     mst_dist, mst_ind = internal.mst_from_nn(
    485         nn_dist,
    486         nn_ind,
    487         d_core,
    488         stop_disconnected=False,
    489         verbose=cur_state["verbose"])
    490     # We can have a forest here...
    492 self.n_samples_   = n_samples

File .../lib/python3.8/site-packages/genieclust/internal.pyx:294, in genieclust.internal.__pyx_fuse_0mst_from_nn()

File .../lib/python3.8/site-packages/genieclust/internal.pyx:381, in genieclust.internal.mst_from_nn()

ValueError: k >= n
@gagolews
Copy link
Owner

gagolews commented Jun 7, 2023

Thanks for the report, the fix is on the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants