Skip to content

Commit

Permalink
close #84
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed Sep 22, 2023
1 parent 38b0610 commit 1c08302
Show file tree
Hide file tree
Showing 53 changed files with 419 additions and 417 deletions.
2 changes: 2 additions & 0 deletions .devel/pytest/test_compare_partitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ def test_compare_partitions():
assert -1e-9<pair_sets_index(x, y)<1.0+1e-9
assert -1e-9<pair_sets_index(x, y, True)<1.0+1e-9

assert normalized_clustering_accuracy(x, y) == normalized_clustering_accuracy(confusion_matrix(x, y))

y = x.copy()
y[:5] = 1
compare_with_sklearn(x, y)
Expand Down
4 changes: 2 additions & 2 deletions .devel/pytest/test_disjoint_sets.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ def test_DisjointSets():
d = DisjointSets(n)
assert all([i==d.find(i) for i in range(n)])

for k in range(int(np.random.randint(0, n-2, 1))):
for k in range(int(np.random.randint(0, n-2))):
i = np.random.randint(0, n)
j = np.random.randint(0, n)
if d.find(i) == d.find(j): continue
Expand All @@ -31,7 +31,7 @@ def test_GiniDisjointSets():
d = GiniDisjointSets(n)
assert all([i==d.find(i) for i in range(n)])

for k in range(int(np.random.randint(0, n-2, 1))):
for k in range(int(np.random.randint(0, n-2))):
i = np.random.randint(0, n)
j = np.random.randint(0, n)
if d.find(i) == d.find(j): continue
Expand Down
11 changes: 8 additions & 3 deletions .devel/sphinx/news.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,17 @@
`adjusted_asymmetric_accuracy` -> `normalized_clustering_accuracy`,
`normalized_accuracy` -> `normalized_pivoted_accuracy`.

* [BACKWARD INCOMPATIBILITY] [Python] `compare_partitions2` has been removed,
as `compare_partitions` and other partition similarity scores
now support both pairs of label vectors `(x, y)` and confusion matrices
`(x=C, y=None)`.

* [Python and R] New parameter to `pair_sets_index`: `clipped`.

* [R] In `normalizing_permutation` and external cluster validity measures,
the input matrix can now be of the type `double`.
* In `normalizing_permutation` and external cluster validity measures,
the input matrices can now be of the type `double`.

* [BUGFIX] [Python] #80: fixed adjustment for `nmslib_n_neighbors`
* [BUGFIX] [Python] #80: Fixed adjustment for `nmslib_n_neighbors`
in small samples.

* [BUGFIX] [Python] #82: `cluster_validity` submodule not imported.
Expand Down
4 changes: 2 additions & 2 deletions .devel/sphinx/rapi/compare_partitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ normalizing_permutation(x, y = NULL)
|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `x` | an integer vector of length n (or an object coercible to) representing a K-partition of an n-set (e.g., a reference partition), or a confusion matrix with K rows and L columns (see [`table(x, y)`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/table.html)) |
| `y` | an integer vector of length n (or an object coercible to) representing an L-partition of the same set (e.g., the output of a clustering algorithm we wish to compare with `x`), or NULL (if x is an K\*L confusion matrix) |
| `simplified` | whether to assume E=1 in the definition of the pair sets index index, i.e., use Eq. (20) instead of (18); see (Rezaei, Franti, 2016). |
| `simplified` | whether to assume E=1 in the definition of the pair sets index index, i.e., use Eq. (20) in (Rezaei, Franti, 2016) instead of Eq. (18) |
| `clipped` | whether the result should be clipped to the unit interval, i.e., \[0, 1\] |

## Details
Expand All @@ -53,7 +53,7 @@ Each index except `mi_score()` (which computes the mutual information score) out

`normalized_pivoted_accuracy()` is defined as $(Accuracy(C_\sigma)-1/max(K,L))/(1-1/max(K,L))$, where $C_\sigma$ is a version of the confusion matrix for given `x` and `y` with columns permuted based on the solution to the maximal linear sum assignment problem. The $Accuracy(C_\sigma)$ part is sometimes referred to as set-matching classification rate or pivoted accuracy.

`pair_sets_index()` gives the pair sets index (PSI) (Rezaei, Franti, 2016). Pairing is based on the solution to the linear sum assignment problem of a transformed version of the confusion matrix. Its simplified version assumes E=1 in the definition of the index, i.e., uses Eq. (20) instead of (18).
`pair_sets_index()` gives the pair sets index (PSI) (Rezaei, Franti, 2016). Pairing is based on the solution to the linear sum assignment problem of a transformed version of the confusion matrix. For non-square matrices, missing rows/columns are assumed to be filled with 0s. The simplified PSI assumes E=1 in the definition of the index, i.e., uses Eq. (20) in the said paper instead of Eq. (18).

`rand_score()` gives the Rand score (the \"probability\" of agreement between the two partitions) and `adjusted_rand_score()` is its version corrected for chance, see (Hubert, Arabie, 1985), its expected value is 0.0 given two independent partitions. Due to the adjustment, the resulting index might also be negative for some inputs.

Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/weave/timings_g2mg.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def register_result(
labels_true):
#########################################################
partsims = [
genieclust.compare_partitions.compare_partitions2(labels_pred, l)
genieclust.compare_partitions.compare_partitions(labels_pred, l)
for l in labels_true
]
partsims = {
Expand Down
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: genieclust
Type: Package
Title: Fast and Robust Hierarchical Clustering with Noise Points Detection
Version: 1.1.4.9003
Date: 2023-09-19
Version: 1.1.4.9004
Date: 2023-09-22
Authors@R: c(
person("Marek", "Gagolewski",
role = c("aut", "cre", "cph"),
Expand Down
11 changes: 8 additions & 3 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,17 @@
`adjusted_asymmetric_accuracy` -> `normalized_clustering_accuracy`,
`normalized_accuracy` -> `normalized_pivoted_accuracy`.

* [BACKWARD INCOMPATIBILITY] [Python] `compare_partitions2` has been removed,
as `compare_partitions` and other partition similarity scores
now support both pairs of label vectors `(x, y)` and confusion matrices
`(x=C, y=None)`.

* [Python and R] New parameter to `pair_sets_index`: `clipped`.

* [R] In `normalizing_permutation` and external cluster validity measures,
the input matrix can now be of the type `double`.
* In `normalizing_permutation` and external cluster validity measures,
the input matrices can now be of the type `double`.

* [BUGFIX] [Python] #80: fixed adjustment for `nmslib_n_neighbors`
* [BUGFIX] [Python] #80: Fixed adjustment for `nmslib_n_neighbors`
in small samples.

* [BUGFIX] [Python] #82: `cluster_validity` submodule not imported.
Expand Down
7 changes: 4 additions & 3 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,9 @@
#' (Rezaei, Franti, 2016).
#' Pairing is based on the solution to the linear sum assignment problem
#' of a transformed version of the confusion matrix.
#' Its simplified version assumes E=1 in the definition of the index,
#' i.e., uses Eq. (20) instead of (18).
#' For non-square matrices, missing rows/columns are assumed to be filled with 0s.
#' The simplified PSI assumes E=1 in the definition of the index,
#' i.e., uses Eq. (20) in the said paper instead of Eq. (18).
#'
#' \code{rand_score()} gives the Rand score (the "probability" of agreement
#' between the two partitions) and
Expand Down Expand Up @@ -129,7 +130,7 @@
#' or NULL (if x is an K*L confusion matrix)
#'
#' @param simplified whether to assume E=1 in the definition of the pair sets index index,
#' i.e., use Eq. (20) instead of (18); see (Rezaei, Franti, 2016).
#' i.e., use Eq. (20) in (Rezaei, Franti, 2016) instead of Eq. (18)
#'
#' @param clipped whether the result should be clipped to the unit interval, i.e., [0, 1]
#'
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
VERSION: '1.1.4.9003',
VERSION: '1.1.4.9004',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
Expand Down
7 changes: 3 additions & 4 deletions docs/genieclust.html
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@
<a class="sidebar-brand" href="index.html">genieclust</a>
</span>
<div class="sidebar-brand">
1.1.4.9003
1.1.4.9004
</div>
<form class="sidebar-search-container" method="get" action="search.html" role="search">
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
Expand Down Expand Up @@ -328,7 +328,6 @@ <h1>Python Package <cite>genieclust</cite> Reference<a class="headerlink" href="
<li class="toctree-l2"><a class="reference internal" href="genieclust_compare_partitions.html#genieclust.compare_partitions.adjusted_mi_score"><code class="docutils literal notranslate"><span class="pre">adjusted_mi_score()</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="genieclust_compare_partitions.html#genieclust.compare_partitions.adjusted_rand_score"><code class="docutils literal notranslate"><span class="pre">adjusted_rand_score()</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="genieclust_compare_partitions.html#genieclust.compare_partitions.compare_partitions"><code class="docutils literal notranslate"><span class="pre">compare_partitions()</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="genieclust_compare_partitions.html#genieclust.compare_partitions.compare_partitions2"><code class="docutils literal notranslate"><span class="pre">compare_partitions2()</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="genieclust_compare_partitions.html#genieclust.compare_partitions.confusion_matrix"><code class="docutils literal notranslate"><span class="pre">confusion_matrix()</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="genieclust_compare_partitions.html#genieclust.compare_partitions.fm_score"><code class="docutils literal notranslate"><span class="pre">fm_score()</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="genieclust_compare_partitions.html#genieclust.compare_partitions.mi_score"><code class="docutils literal notranslate"><span class="pre">mi_score()</span></code></a></li>
Expand Down Expand Up @@ -403,7 +402,7 @@ <h1>Python Package <cite>genieclust</cite> Reference<a class="headerlink" href="
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-09-19T17:30:17+1000.
Last updated on 2023-09-22T11:29:44+1000.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand All @@ -418,7 +417,7 @@ <h1>Python Package <cite>genieclust</cite> Reference<a class="headerlink" href="

</aside>
</div>
</div><script src="_static/documentation_options.js?v=b3339ebc"></script>
</div><script src="_static/documentation_options.js?v=78ba0e71"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/scripts/furo.js?v=32e29ea5"></script>
Expand Down
6 changes: 3 additions & 3 deletions docs/genieclust_cluster_validity.html
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@
<a class="sidebar-brand" href="index.html">genieclust</a>
</span>
<div class="sidebar-brand">
1.1.4.9003
1.1.4.9004
</div>
<form class="sidebar-search-container" method="get" action="search.html" role="search">
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
Expand Down Expand Up @@ -968,7 +968,7 @@
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-09-19T17:30:17+1000.
Last updated on 2023-09-22T11:29:44+1000.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down Expand Up @@ -1010,7 +1010,7 @@

</aside>
</div>
</div><script src="_static/documentation_options.js?v=b3339ebc"></script>
</div><script src="_static/documentation_options.js?v=78ba0e71"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/scripts/furo.js?v=32e29ea5"></script>
Expand Down
Loading

0 comments on commit 1c08302

Please sign in to comment.