Skip to content

Commit

Permalink
more
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed Oct 26, 2023
1 parent 26e55e2 commit de99caf
Show file tree
Hide file tree
Showing 42 changed files with 69 additions and 65 deletions.
2 changes: 2 additions & 0 deletions .devel/pypi_howto.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,6 @@ https://pypi.org/help/#apitoken

```
twine upload dist/*
username: __token__
password: pypi-----generate-new-token-via-pypi
```
10 changes: 6 additions & 4 deletions .devel/sphinx/bibliography.bib
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ @article{clustering-benchmarks
volume = {20},
pages = {101270},
doi = {10.1016/j.softx.2022.101270},
url = {https://clustering-benchmarks.gagolewski.com},
url = {https://clustering-benchmarks.gagolewski.com/},
}

@book{datawranglingpy,
Expand Down Expand Up @@ -66,7 +66,8 @@ @article{genieclust
year = {2021},
doi = {10.1016/j.softx.2021.100722},
volume = {15},
pages = {100722}
pages = {100722},
url = {https://genieclust.gagolewski.com/}
}

@article{genieins,
Expand All @@ -76,7 +77,8 @@ @article{genieins
year = {2016},
volume = {363},
doi = {10.1016/j.ins.2016.05.003},
pages = {8--23}
pages = {8--23},
url = {https://arxiv.org/pdf/2209.05757}
}

@article{genieowa,
Expand All @@ -96,7 +98,7 @@ @article{cvi
year = {2021},
pages = {620--636},
volume = {581},
doi = {10.1016/j.ins.2021.10.004},
url = {https://arxiv.org/pdf/2208.01261}
}

@phdthesis{cenaphd,
Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/rapi/gclust.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,5 +146,5 @@ adjusted_rand_score(y_test, y_pred)
pair_sets_index(y_test, y_pred)
## [1] 0.9049708
# Fast for low-dimensional Euclidean spaces:
h <- gclust(emst_mlpack(X))
# h <- gclust(emst_mlpack(X))
```
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: genieclust
Type: Package
Title: Fast and Robust Hierarchical Clustering with Noise Points Detection
Version: 1.1.5
Date: 2023-10-18
Version: 1.1.5-2
Date: 2023-10-26
Authors@R: c(
person("Marek", "Gagolewski",
role = c("aut", "cre", "cph"),
Expand Down
2 changes: 1 addition & 1 deletion R/gclust.R
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@
#' pair_sets_index(y_test, y_pred)
#'
#' # Fast for low-dimensional Euclidean spaces:
#' h <- gclust(emst_mlpack(X))
#' # h <- gclust(emst_mlpack(X))
#'
#' @rdname gclust
#' @export
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/genieclust.html
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ <h1>Python Package <cite>genieclust</cite> Reference<a class="headerlink" href="
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_cluster_validity.html
Original file line number Diff line number Diff line change
Expand Up @@ -968,7 +968,7 @@
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_compare_partitions.html
Original file line number Diff line number Diff line change
Expand Up @@ -1038,7 +1038,7 @@
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_genie.html
Original file line number Diff line number Diff line change
Expand Up @@ -710,7 +710,7 @@ <h1>genieclust.Genie<a class="headerlink" href="#genieclust-genie" title="Link t
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_gic.html
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ <h1>genieclust.GIc<a class="headerlink" href="#genieclust-gic" title="Link to th
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_inequality.html
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,7 @@
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_internal.html
Original file line number Diff line number Diff line change
Expand Up @@ -621,7 +621,7 @@
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_plots.html
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genieclust_tools.html
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,7 @@
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -597,7 +597,7 @@ <h2>W</h2>
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
18 changes: 9 additions & 9 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -274,8 +274,8 @@ <h1><em>genieclust</em>: Fast and Robust Hierarchical Clustering with Noise Poin
<div><p><strong>Genie finds meaningful clusters quickly – even on large data sets.</strong></p>
</div></blockquote>
<img alt="Genie" class="img-right-align-always" src="_images/genie_toy_example.png" style="width: 128px;" />
<p>The <em>genieclust</em> package <span id="id1">[<a class="reference internal" href="z_bibliography.html#id10" title="Gagolewski, M. (2021). genieclust: Fast and robust hierarchical clustering. SoftwareX, 15:100722. DOI: 10.1016/j.softx.2021.100722.">11</a>]</span> for Python and R implements
a robust and outlier resistant clustering algorithm called <em>Genie</em> <span id="id2">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. DOI: 10.1016/j.ins.2016.05.003.">16</a>]</span>.</p>
<p>The <em>genieclust</em> package <span id="id1">[<a class="reference internal" href="z_bibliography.html#id10" title="Gagolewski, M. (2021). genieclust: Fast and robust hierarchical clustering. SoftwareX, 15:100722. URL: https://genieclust.gagolewski.com/, DOI: 10.1016/j.softx.2021.100722.">11</a>]</span> for Python and R implements
a robust and outlier resistant clustering algorithm called <em>Genie</em> <span id="id2">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. URL: https://arxiv.org/pdf/2209.05757, DOI: 10.1016/j.ins.2016.05.003.">16</a>]</span>.</p>
<p>The idea behind <em>Genie</em> is beautifully simple. First, make each individual
point the sole member of its own cluster. Then, keep merging pairs
of the closest clusters, one after another. However, to <strong>prevent
Expand Down Expand Up @@ -354,7 +354,7 @@ <h2>Package Features<a class="headerlink" href="#package-features" title="Link t
<ul class="simple">
<li><p><em>Genie++</em> – a reimplementation of the original Genie algorithm
from the R package <a class="reference external" href="https://cran.r-project.org/web/packages/genie"><em>genie</em></a>
<span id="id9">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. DOI: 10.1016/j.ins.2016.05.003.">16</a>]</span>: much faster than the original one;
<span id="id9">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. URL: https://arxiv.org/pdf/2209.05757, DOI: 10.1016/j.ins.2016.05.003.">16</a>]</span>: much faster than the original one;
supports approximate disconnected MSTs;</p></li>
<li><p><em>Genie+HDBSCAN*</em> – a robustified (Geniefied) retake on the <em>HDBSCAN*</em>
<span id="id10">[<a class="reference internal" href="z_bibliography.html#id15" title="Campello, R.J.G.B., Moulavi, D., and Sander, J. (2013). Density-based clustering based on hierarchical density estimates. Lecture Notes in Computer Science, 7819:160–172. DOI: 10.1007/978-3-642-37456-2_14.">2</a>]</span> method that detects noise points in data and
Expand All @@ -368,13 +368,13 @@ <h2>Package Features<a class="headerlink" href="#package-features" title="Link t
<ul class="simple">
<li><p>inequality measures: the normalised Gini, Bonferroni,
and De Vergottini indices;</p></li>
<li><p>external cluster validity measures (see <span id="id14">[<a class="reference internal" href="z_bibliography.html#id7" title="Gagolewski, M. (2022). A framework for benchmarking clustering algorithms. SoftwareX, 20:101270. URL: https://clustering-benchmarks.gagolewski.com, DOI: 10.1016/j.softx.2022.101270.">12</a>, <a class="reference internal" href="z_bibliography.html#id5" title="Gagolewski, M. (2023). Normalised clustering accuracy: An asymmetric external cluster validity measure. under review (preprint). URL: https://arxiv.org/pdf/2209.02935.pdf, DOI: 10.48550/arXiv.2209.02935.">15</a>]</span>
<li><p>external cluster validity measures (see <span id="id14">[<a class="reference internal" href="z_bibliography.html#id7" title="Gagolewski, M. (2022). A framework for benchmarking clustering algorithms. SoftwareX, 20:101270. URL: https://clustering-benchmarks.gagolewski.com/, DOI: 10.1016/j.softx.2022.101270.">12</a>, <a class="reference internal" href="z_bibliography.html#id5" title="Gagolewski, M. (2023). Normalised clustering accuracy: An asymmetric external cluster validity measure. under review (preprint). URL: https://arxiv.org/pdf/2209.02935.pdf, DOI: 10.48550/arXiv.2209.02935.">15</a>]</span>
for discussion):
normalised clustering accuracy (NCA) and partition similarity scores such as
normalised pivoted accuracy (NPA), pair sets index (PSI) <span id="id15">[<a class="reference internal" href="z_bibliography.html#id27" title="Rezaei, M. and Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8):2173–2186. DOI: 10.1109/TKDE.2016.2551240.">32</a>]</span>,
adjusted/unadjusted Rand, adjusted/unadjusted Fowlkes–Mallows (FM),
adjusted/normalised/unadjusted mutual information (MI) indices;</p></li>
<li><p>internal cluster validity measures (see <span id="id16">[<a class="reference internal" href="z_bibliography.html#id13" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2021). Are cluster validity measures (in)valid? Information Sciences, 581:620–636. DOI: 10.1016/j.ins.2021.10.004.">17</a>]</span> for discussion):
<li><p>internal cluster validity measures (see <span id="id16">[<a class="reference internal" href="z_bibliography.html#id13" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2021). Are cluster validity measures (in)valid? Information Sciences, 581:620–636. URL: https://arxiv.org/pdf/2208.01261.">17</a>]</span> for discussion):
the Caliński–Harabasz, Silhouette, Ball–Hall, Davies–Bouldin,
generalised Dunn indices, etc.;</p></li>
<li><p><em>(Python only)</em> union-find (disjoint sets) data structures (with
Expand All @@ -394,12 +394,12 @@ <h2>Contributing<a class="headerlink" href="#contributing" title="Link to this h
<p>Contributors:
<a class="reference external" href="http://bartoszuk.rexamine.com">Maciej Bartoszuk</a> and
<a class="reference external" href="https://cena.rexamine.com">Anna Cena</a>
(<em>genieclust</em>’s predecessor <a class="reference external" href="https://cran.r-project.org/web/packages/genie"><em>genie</em></a> <span id="id17">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. DOI: 10.1016/j.ins.2016.05.003.">16</a>]</span>
and some internal cluster validity measures <a class="reference external" href="https://github.com/gagolews/optim_cvi"><em>CVI</em></a> <span id="id18">[<a class="reference internal" href="z_bibliography.html#id13" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2021). Are cluster validity measures (in)valid? Information Sciences, 581:620–636. DOI: 10.1016/j.ins.2021.10.004.">17</a>]</span>);
(<em>genieclust</em>’s predecessor <a class="reference external" href="https://cran.r-project.org/web/packages/genie"><em>genie</em></a> <span id="id17">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. URL: https://arxiv.org/pdf/2209.05757, DOI: 10.1016/j.ins.2016.05.003.">16</a>]</span>
and some internal cluster validity measures <a class="reference external" href="https://github.com/gagolews/optim_cvi"><em>CVI</em></a> <span id="id18">[<a class="reference internal" href="z_bibliography.html#id13" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2021). Are cluster validity measures (in)valid? Information Sciences, 581:620–636. URL: https://arxiv.org/pdf/2208.01261.">17</a>]</span>);
<a class="reference external" href="https://github.com/pmla/">Peter M. Larsen</a>
(an <a class="reference external" href="https://github.com/scipy/scipy/blob/main/scipy/optimize/rectangular_lsap/rectangular_lsap.cpp">implementation</a>
of the shortest augmenting path algorithm for the rectangular assignment problem
which we use for computing some external cluster validity measures <span id="id19">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. DOI: 10.1016/j.ins.2016.05.003.">16</a>, <a class="reference internal" href="z_bibliography.html#id27" title="Rezaei, M. and Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8):2173–2186. DOI: 10.1109/TKDE.2016.2551240.">32</a>]</span>).</p>
which we use for computing some external cluster validity measures <span id="id19">[<a class="reference internal" href="z_bibliography.html#id11" title="Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. URL: https://arxiv.org/pdf/2209.05757, DOI: 10.1016/j.ins.2016.05.003.">16</a>, <a class="reference internal" href="z_bibliography.html#id27" title="Rezaei, M. and Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8):2173–2186. DOI: 10.1109/TKDE.2016.2551240.">32</a>]</span>).</p>
<div class="toctree-wrapper compound">
</div>
<div class="toctree-wrapper compound">
Expand Down Expand Up @@ -501,7 +501,7 @@ <h2>Contributing<a class="headerlink" href="#contributing" title="Link to this h
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/news.html
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,7 @@ <h2>0.1a2 (2018-05-23)<a class="headerlink" href="#a2-2018-05-23" title="Link to
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/py-modindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ <h1>Python Module Index</h1>
Some rights reserved. Licensed under <a href='https://creativecommons.org/licenses/by-nc-nd/4.0/'>CC BY-NC-ND 4.0</a>.
Built with <a href="https://sphinx-doc.org/">Sphinx</a>
and a customised <a href="https://github.com/pradyunsg/furo">Furo</a> theme.
Last updated on 2023-10-18T18:07:50+1100.
Last updated on 2023-10-26T14:00:36+1100.
This site will never display any ads: it is a non-profit project.
It does not collect any data.
</div>
Expand Down
Loading

0 comments on commit de99caf

Please sign in to comment.