diff --git a/DESCRIPTION b/DESCRIPTION index 089e68d..65216d9 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: dbscan -Version: 1.1-4.1 -Date: 2019-xx-xx +Version: 1.1-5 +Date: 2019-10-22 Title: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms Authors@R: c(person("Michael", "Hahsler", role = c("aut", "cre", "cph"), @@ -14,7 +14,8 @@ Description: A fast reimplementation of several density-based algorithms of the clustering structure) clustering algorithms HDBSCAN (hierarchical DBSCAN) and the LOF (local outlier factor) algorithm. The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN - and fixed-radius NN search is also provided. + and fixed-radius NN search is also provided. + See Hahsler M, Piekenbrock M and Doran D (2019) . Imports: Rcpp (>= 1.0.0), graphics, diff --git a/NEWS.md b/NEWS.md index 970851a..d3bfaa7 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# dbscan 1.1-4.1 (2019-xx-xx) +# dbscan 1.1-5 (2019-10-22) ## New Features * kNN and frNN gained parameter query to query neighbors for points not in the data. diff --git a/inst/CITATION b/inst/CITATION new file mode 100644 index 0000000..6cf6331 --- /dev/null +++ b/inst/CITATION @@ -0,0 +1,19 @@ +bibentry(bibtype = "Article", + title = "{dbscan}: Fast Density-Based Clustering with {R}", + author = c(person(given = "Michael", + family = "Hahsler", + email = "mhahsler@lyle.smu.edu"), + person(given = "Matthew", + family = "Piekenbrock"), + person(given = "Derek", + family = "Doran", + email = "derek.doran@wright.edu")), + journal = "Journal of Statistical Software", + year = "2019", + volume = "91", + number = "1", + pages = "1--30", + doi = "10.18637/jss.v091.i01", + header = "To cite dbscan in publications use:" +) + diff --git a/man/dbscan.Rd b/man/dbscan.Rd index 4ab0777..a777acf 100644 --- a/man/dbscan.Rd +++ b/man/dbscan.Rd @@ -42,7 +42,7 @@ dbscan(x, eps, minPts = 5, weights = NULL, borderPoints = TRUE, ...) \emph{Note:} use \code{dbscan::dbscan} to call this implementation when you also use package \pkg{fpc}. -This implementation of DBSCAN implements the original algorithm as described by +This implementation of DBSCAN (Hahsler et al, 2019) implements the original algorithm as described by Ester et al (1996). DBSCAN estimates the density around each data point by counting the number of points in a user-specified eps-neighborhood and applies a used-specified minPts thresholds to identify core, border and noise points. In a second step, core points are joined into a cluster if they are density-reachable (i.e., there is a chain of core points where one falls inside the eps-neighborhood of the next). Finally, border points are assigned to clusters. The algorithm only needs parameters \code{eps} and \code{minPts}. @@ -72,7 +72,10 @@ cluster will be reported as members of the noise cluster 0. \item{cluster }{A integer vector with cluster assignments. Zero indicates noise points.} } \references{ -Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. \emph{Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).} +Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. + \emph{Journal of Statistical Software}, 91(1), 1-30. \doi{10.18637/jss.v091.i01} + +Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. \emph{Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)}, 226-231. Campello, R. J. G. B.; Moulavi, D.; Sander, J. (2013). Density-Based Clustering Based on Hierarchical Density Estimates. \emph{Proceedings of the 17th diff --git a/man/hdbscan.Rd b/man/hdbscan.Rd index 88373f7..f3a7546 100644 --- a/man/hdbscan.Rd +++ b/man/hdbscan.Rd @@ -35,10 +35,10 @@ hdbscan(x, minPts, xdist = NULL, } \details{ -Computes the hierarchical cluster tree representing density estimates along with the stability-based flat cluster extraction +This fast implementation of HDBSCAN (Hahsler et al, 2019) computes the hierarchical cluster tree representing density estimates along with the stability-based flat cluster extraction proposed by Campello et al. (2013). HDBSCAN essentially computes the hierarchy of all DBSCAN* clusterings, and then uses a stability-based extraction method to find optimal cuts in the hierarchy, thus producing a flat solution. -Additional, related algorithms including the "Global-Local Outlier Score from Hierarchies" (GLOSH) (see section 6 of Campello et al. 2015) outlier scores and ability to cluster based on instance-level constraints (see section 5.3 of Campello et al. 2015) are supported. The algorithms only need the parameter \code{minPts}. +Additional, related algorithms including the "Global-Local Outlier Score from Hierarchies" (GLOSH) (see section 6 of Campello et al., 2015) outlier scores and ability to cluster based on instance-level constraints (see section 5.3 of Campello et al. 2015) are supported. The algorithms only need the parameter \code{minPts}. Note that \code{minPts} not only acts as a minimum cluster size to detect, but also as a "smoothing" factor of the density estimates implicitly computed from HDBSCAN. } @@ -54,11 +54,12 @@ Note that \code{minPts} not only acts as a minimum cluster size to detect, but a %% ... } \references{ -Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. \emph{Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).} +Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. + \emph{Journal of Statistical Software}, 91(1), 1-30. \doi{10.18637/jss.v091.i01} -Campello, R. J. G. B.; Moulavi, D.; Sander, J. (2013). Density-Based Clustering Based on Hierarchical Density Estimates. \emph{Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery in Databases, PAKDD 2013,} Lecture Notes in Computer Science 7819, p. 160. +Campello RJGB, Moulavi D, Sander J (2013). Density-Based Clustering Based on Hierarchical Density Estimates. \emph{Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery in Databases, PAKDD 2013,} Lecture Notes in Computer Science 7819, p. 160. -Campello, Ricardo JGB, et al. "Hierarchical density estimates for data clustering, visualization, and outlier detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 10.1 (2015): 5. +Campello RJGB, Moulavi D, Zimek A, Sander J (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. \emph{ACM Transactions on Knowledge Discovery from Data (TKDD),} 10(5):1-51. } \seealso{ diff --git a/man/optics.Rd b/man/optics.Rd index 2beffc8..34e4da7 100644 --- a/man/optics.Rd +++ b/man/optics.Rd @@ -45,7 +45,7 @@ extractXi(object, xi, minimum = FALSE, correctPredecessors = TRUE) details on how to control the search strategy.} } \details{ -This implementation of OPTICS implements the original algorithm as described by +This implementation of OPTICS (Hahsler et al, 2019) implements the original algorithm as described by Ankerst et al (1999). OPTICS is an ordering algorithm using similar concepts to DBSCAN. However, for OPTICS \code{eps} is only an upper limit for the neighborhood size used to reduce @@ -101,9 +101,12 @@ See \code{\link{frNN}} for more information on the parameters related to nearest \item{clusters_xi }{ data.frame containing the start and end of each cluster found in the OPTICS ordering. } } \references{ -Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. 49--60. +Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. + \emph{Journal of Statistical Software}, 91(1), 1-30. \doi{10.18637/jss.v091.i01} -Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure Extracted from OPTICS Plots. Lernen, Wissen, Daten, Analysen (LWDA 2018). pp. 318--329. +Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. 49-60. + +Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure Extracted from OPTICS Plots. Lernen, Wissen, Daten, Analysen (LWDA 2018). pp. 318-329. } \author{ diff --git a/vignettes/dbscan.Rnw b/vignettes/dbscan.Rnw index 8c6623b..5a11797 100644 --- a/vignettes/dbscan.Rnw +++ b/vignettes/dbscan.Rnw @@ -127,6 +127,13 @@ This article presents an overview of the \proglang{R} package~\pkg{dbscan} focusing on DBSCAN and OPTICS, outlining its operation and experimentally compares its performance with implementations in other open-source implementations. We first review the concept of density-based clustering and present the DBSCAN and OPTICS algorithms in Section~\ref{sec:dbc}. This section concludes with a short review of existing software packages that implement these algorithms. Details about \pkg{dbscan}, with examples of its use, are presented in Section~\ref{sec:dbscan}. A performance evaluation is presented in Section~\ref{sec:eval}. Concluding remarks are offered in Section~\ref{sec:conc}. +A version of this article describing the package \pkg{dbscan} was published as \cite{hahsler2019dbscan} and should be cited. + +<>= +options(useFancyQuotes = FALSE) +citation("dbscan") +@ + \section{Density-based clustering}\label{sec:dbc} Density-based clustering is now a well-studied field. Conceptually, the idea behind density-based clustering is simple: given a set of data points, define a structure that accurately reflects the underlying density~\citep{sander2011density}. An important distinction between density-based clustering and alternative approaches to cluster analysis, such as the use of \emph{(Gaussian) mixture models}~\citep[see][]{jain1999review}, is that the latter represents a \emph{parametric} approach in which the observed data are assumed to have been produced by mixture of either Gaussian or other parametric families of distributions. While certainly useful in many applications, parametric approaches naturally assume clusters will exhibit some type convex (generally hyper-spherical or hyper-elliptical) shape. Other approaches, such as $k$-means clustering (where the $k$ parameter signifies the user-specified number of clusters to find), share this common theme of `minimum variance', where the underlying assumption is made that ideal clusters are found by minimizing some measure of intra-cluster variance (often referred to as cluster cohesion) and maximizing the inter-cluster variance (cluster separation)~\citep{arbelaitz2013extensive}. Conversely, the label density-based clustering is used for methods which do not assume parametric distributions, are capable of finding arbitrarily-shaped clusters, handle varying amounts of noise, and require no prior knowledge regarding how to set the number of clusters $k$. This methodology is best expressed in the DBSCAN algorithm, which we discuss next. diff --git a/vignettes/dbscan.bib b/vignettes/dbscan.bib index 99a9254..27abb2f 100644 --- a/vignettes/dbscan.bib +++ b/vignettes/dbscan.bib @@ -1,904 +1,916 @@ -@inproceedings{ester1996density, - title={A density-based algorithm for discovering clusters in large spatial databases with noise.}, - author={Ester, Martin and Kriegel, Hans-Peter and Sander, J{\"o}rg and Xu, Xiaowei and others}, - booktitle={Kdd}, - volume={96}, - number={34}, - pages={226--231}, - year={1996} -} - -@Manual{dbscan-R, - title = {dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms}, - author = {Michael Hahsler and Matthew Piekenbrock}, - note = {R package version 0.9-8.2}, - year={2016} -} -%% Original OPTICS paper -%% ----------------------------------------------------------------------------- -@inproceedings{ankerst1999optics, - title={OPTICS: ordering points to identify the clustering structure}, - author={Ankerst, Mihael and Breunig, Markus M and Kriegel, Hans-Peter and Sander, J{\"o}rg}, - booktitle={ACM Sigmod Record}, - volume={28}, - number={2}, - pages={49--60}, - year={1999}, - organization={ACM} -} - -% OPTICS cluster extraction improvements -% ----------------------------------------------------------------------------- -@inproceedings{DBLP:conf/lwa/SchubertG18, - author = {Erich Schubert and - Michael Gertz}, - title = {Improving the Cluster Structure Extracted from {OPTICS} Plots}, - booktitle = {Lernen, Wissen, Daten, Analysen (LWDA 2018)}, - series = {{CEUR} Workshop Proceedings}, - volume = {2191}, - pages = {318--329}, - publisher = {CEUR-WS.org}, - year = {2018} -} - -% Original LOF paper -% ----------------------------------------------------------------------------- -@inproceedings{breunig2000lof, - title={LOF: identifying density-based local outliers}, - author={Breunig, Markus M and Kriegel, Hans-Peter and Ng, Raymond T and Sander, J{\"o}rg}, - booktitle={ACM sigmod record}, - volume={29}, - number={2}, - pages={93--104}, - year={2000}, - organization={ACM} -} - -% 2003 Reachability <--> Dendrograms Conversions Paper -% ----------------------------------------------------------------------------- -@inproceedings{sander2003automatic, - title={Automatic extraction of clusters from hierarchical clustering representations}, - author={Sander, J{\"o}rg and Qin, Xuejie and Lu, Zhiyong and Niu, Nan and Kovarsky, Alex}, - booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining}, - pages={75--87}, - year={2003}, - organization={Springer} -} - -% Original BIRCH paper -% ----------------------------------------------------------------------------- -@inproceedings{zhang96, - title={BIRCH: an efficient data clustering method for very large databases}, - author={Zhang, Tian and Ramakrishnan, Raghu and Livny, Miron}, - booktitle={ACM Sigmod Record}, - volume={25}, - number={2}, - pages={103--114}, - year={1996}, - organization={ACM} -} - -% GDBSCAN Paper (Generalized DBSCAN, by Sanders) -% ----------------------------------------------------------------------------- -@article{sander1998density, - title={Density-based clustering in spatial databases: The algorithm gdbscan and its applications}, - author={Sander, J{\"o}rg and Ester, Martin and Kriegel, Hans-Peter and Xu, Xiaowei}, - journal={Data mining and knowledge discovery}, - volume={2}, - number={2}, - pages={169--194}, - year={1998}, - publisher={Springer} -} - -% HDBSCAN* Newest Paper -% ----------------------------------------------------------------------------- -@article{campello2015hierarchical, - title={Hierarchical density estimates for data clustering, visualization, and outlier detection}, - author={Campello, Ricardo JGB and Moulavi, Davoud and Zimek, Arthur and Sander, Joerg}, - journal={ACM Transactions on Knowledge Discovery from Data (TKDD)}, - volume={10}, - number={1}, - pages={5}, - year={2015}, - publisher={ACM} -} - -% First HDBSCAN* introduction paper, later revised in 2015. The newer one is better. -% ----------------------------------------------------------------------------- -@inproceedings{campello2013density, - title={Density-based clustering based on hierarchical density estimates}, - author={Campello, Ricardo JGB and Moulavi, Davoud and Sander, J{\"o}rg}, - booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining}, - pages={160--172}, - year={2013}, - organization={Springer} -} - -% The new-ish 'Standard Methodology' paper of that 'tackles the methodological drawbacks' % of internal clustering validation -% ----------------------------------------------------------------------------- -@article{gurrutxaga2011towards, - title={Towards a standard methodology to evaluate internal cluster validity indices}, - author={Gurrutxaga, Ibai and Muguerza, Javier and Arbelaitz, Olatz and P{\'e}rez, Jes{\'u}s M and Mart{\'\i}n, Jos{\'e} I}, - journal={Pattern Recognition Letters}, - volume={32}, - number={3}, - pages={505--515}, - year={2011}, - publisher={Elsevier} -} - -% Original ABACUS - Workaround implementation of mixture modeling for finding -% arbitrary shapes -% ----------------------------------------------------------------------------- -@article{gegick2011abacus, - title={ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification}, - author={Gegick, M}, - year={2011}, - publisher={SIAM} -} - - -% Original Silhouette Index Paper -% ----------------------------------------------------------------------------- -@article{rousseeuw1987silhouettes, - title={Silhouettes: a graphical aid to the interpretation and validation of cluster analysis}, - author={Rousseeuw, Peter J}, - journal={Journal of computational and applied mathematics}, - volume={20}, - pages={53--65}, - year={1987}, - publisher={Elsevier} -} - -% Extensive Comparative Study of IVMS -% ----------------------------------------------------------------------------- -@article{arbelaitz2013extensive, - title={An extensive comparative study of cluster validity indices}, - author={Arbelaitz, Olatz and Gurrutxaga, Ibai and Muguerza, Javier and P{\'e}rez, Jes{\'u}S M and Perona, I{\~n}Igo}, - journal={Pattern Recognition}, - volume={46}, - number={1}, - pages={243--256}, - year={2013}, - publisher={Elsevier} -} - -% Graph Theory measures for Internal Cluster Validation -% ----------------------------------------------------------------------------- -@article{pal1997cluster, - title={Cluster validation using graph theoretic concepts}, - author={Pal, Nikhil R and Biswas, J}, - journal={Pattern Recognition}, - volume={30}, - number={6}, - pages={847--857}, - year={1997}, - publisher={Elsevier} -} - -% Rankings of research papers by citation count; used for showing DBSCAN -% popularity -% ----------------------------------------------------------------------------- -@misc{acade96:online, -author = {{Microsoft Academic Search}}, -title = {Top publications in data mining}, -howpublished = {\url{http://academic.research.microsoft.com/RankList?entitytype=1&topDomainID=2&subDomainID=7&last=0&start=1&end=100}}, -month = {}, -year = {2016}, -note = {(Accessed on 08/29/2016)} -} - - -@misc{PyCluste54:online, -author = {Novikov, Andrei}, -title = {PyClustering: PyClustering library}, -howpublished = {\url{http://pythonhosted.org/pyclustering/}}, -year = {2016}, -note = {v.0.6.6} -} - - -% Hartigans convex density estimation model -% ----------------------------------------------------------------------------- -@article{hartigan1987estimation, - title={Estimation of a convex density contour in two dimensions}, - author={Hartigan, JA}, - journal={Journal of the American Statistical Association}, - volume={82}, - number={397}, - pages={267--270}, - year={1987}, - publisher={Taylor \& Francis} -} - -% Bentleys Original KDTree Paper -% ----------------------------------------------------------------------------- -@article{bentley1975multidimensional, - title={Multidimensional binary search trees used for associative searching}, - author={Bentley, Jon Louis}, - journal={Communications of the ACM}, - volume={18}, - number={9}, - pages={509--517}, - year={1975}, - publisher={ACM} -} - -% Original CLARANS paper -% ----------------------------------------------------------------------------- -@article{ng2002clarans, - title={CLARANS: A method for clustering objects for spatial data mining}, - author={Ng, Raymond T. and Han, Jiawei}, - journal={IEEE transactions on knowledge and data engineering}, - volume={14}, - number={5}, - pages={1003--1016}, - year={2002}, - publisher={IEEE} -} - -% Original DENCLUE paper -% ----------------------------------------------------------------------------- -@inproceedings{hinneburg1998efficient, - title={An efficient approach to clustering in large multimedia databases with noise}, - author={Hinneburg, Alexander and Keim, Daniel A}, - booktitle={KDD}, - volume={98}, - pages={58--65}, - year={1998} -} - -% Original Chameleon Paper -% ----------------------------------------------------------------------------- -@article{karypis1999chameleon, - title={Chameleon: Hierarchical clustering using dynamic modeling}, - author={Karypis, George and Han, Eui-Hong and Kumar, Vipin}, - journal={Computer}, - volume={32}, - number={8}, - pages={68--75}, - year={1999}, - publisher={IEEE} -} - -% Original CURE algorithm -% ----------------------------------------------------------------------------- -@inproceedings{guha1998cure, - title={CURE: an efficient clustering algorithm for large databases}, - author={Guha, Sudipto and Rastogi, Rajeev and Shim, Kyuseok}, - booktitle={ACM SIGMOD Record}, - volume={27}, - number={2}, - pages={73--84}, - year={1998}, - organization={ACM} -} - -% R statistical computing language citation -% ----------------------------------------------------------------------------- -@article{team2013r, - title={R: A language and environment for statistical computing}, - author={Team, R Core and others}, - year={2013}, - publisher={Vienna, Austria} -} - -% WEKA -% ----------------------------------------------------------------------------- -@article{hall2009weka, - title={The WEKA data mining software: an update}, - author={Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H}, - journal={ACM SIGKDD explorations newsletter}, - volume={11}, - number={1}, - pages={10--18}, - year={2009}, - publisher={ACM} -} - -% SPMF Java Machine Learning Library -% ----------------------------------------------------------------------------- -@article{fournier2014spmf, - title={SPMF: a Java open-source pattern mining library.}, - author={Fournier-Viger, Philippe and Gomariz, Antonio and Gueniche, Ted and Soltani, Azadeh and Wu, Cheng-Wei and Tseng, Vincent S and others}, - journal={Journal of Machine Learning Research}, - volume={15}, - number={1}, - pages={3389--3393}, - year={2014} -} - -% Python Scikit Learn -% ----------------------------------------------------------------------------- -@article{pedregosa2011scikit, - title={Scikit-learn: Machine learning in Python}, - author={Pedregosa, Fabian and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others}, - journal={Journal of Machine Learning Research}, - volume={12}, - number={Oct}, - pages={2825--2830}, - year={2011} -} - -% MATLAB TOMCAT Toolkit -% ----------------------------------------------------------------------------- -@article{daszykowski2007tomcat, - title={TOMCAT: A MATLAB toolbox for multivariate calibration techniques}, - author={Daszykowski, Micha{\l} and Serneels, Sven and Kaczmarek, Krzysztof and Van Espen, Piet and Croux, Christophe and Walczak, Beata}, - journal={Chemometrics and intelligent laboratory systems}, - volume={85}, - number={2}, - pages={269--277}, - year={2007}, - publisher={Elsevier} -} - -% OPTICS code for TOMCAT -% ----------------------------------------------------------------------------- -@article{daszykowski2002looking, - title={Looking for natural patterns in analytical data. 2. Tracing local density with OPTICS}, - author={Daszykowski, Michael and Walczak, Beata and Massart, Desire L}, - journal={Journal of chemical information and computer sciences}, - volume={42}, - number={3}, - pages={500--507}, - year={2002}, - publisher={ACS Publications} -} - -% Java ML library -% ----------------------------------------------------------------------------- -@comment{ Abeel, T.; de Peer, Y. V. & Saeys, Y. Java-ML: A Machine Learning - Library, Journal of Machine Learning Research, 2009, 10, 931-934 } -@book{abeel2009journal, -author = "Abeel, T. ; de Peer and Y. V. and Saeys, Y. Java-ML: A Machine Learning Library", -title = "Journal of Machine Learning Research", -publisher = "10", -pages = "931--934", -year = 2009 -} - - -% ELKI -% ----------------------------------------------------------------------------- -@article{DBLP:journals/pvldb/SchubertKEZSZ15, - author = {Erich Schubert and - Alexander Koos and - Tobias Emrich and - Andreas Z{\"{u}}fle and - Klaus Arthur Schmid and - Arthur Zimek}, - title = {A Framework for Clustering Uncertain Data}, - journal = {{PVLDB}}, - volume = {8}, - number = {12}, - pages = {1976--1979}, - year = {2015}, - url = {http://www.vldb.org/pvldb/vol8/p1976-schubert.pdf}, - timestamp = {Mon, 30 May 2016 12:01:10 +0200}, - biburl = {http://dblp.uni-trier.de/rec/bib/journals/pvldb/SchubertKEZSZ15}, - bibsource = {dblp computer science bibliography, http://dblp.org} -} - -% BIRCH CRAN records -% ----------------------------------------------------------------------------- -@misc{CRANPack84:online, author={CRAN}, title = {CRAN - Package birch}, howpublished = {\url{https://cran.r-project.org/web/packages/birch/index.html}}, month = {}, year = {2016}, note = {(Accessed on 09/16/2016)} } - -% Spectral Clustering -% ---------------------------------------------------------------------------- -@inproceedings{dhillon2004kernel, - title={Kernel k-means: spectral clustering and normalized cuts}, - author={Dhillon, Inderjit S and Guan, Yuqiang and Kulis, Brian}, - booktitle={Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining}, - pages={551--556}, - year={2004}, - organization={ACM} -} - - -% Disjoint-set data structure (2 citations) -% ----------------------------------------------------------------------------- -@misc{cormen2001introduction, - title={Introduction to algorithms second edition}, - author={Cormen, Thomas H and Leiserson, Charles E and Rivest, Ronald L and Stein, Clifford}, - year={2001}, - publisher={The MIT Press} -} -@inproceedings{patwary2010experiments, - title={Experiments on union-find algorithms for the disjoint-set data structure}, - author={Patwary, Md Mostofa Ali and Blair, Jean and Manne, Fredrik}, - booktitle={International Symposium on Experimental Algorithms}, - pages={411--423}, - year={2010}, - organization={Springer} -} - -% SUBCLU high-dimensional density based clustering -% ----------------------------------------------------------------------------- -@inproceedings{kailing2004density, - title={Density-connected subspace clustering for high-dimensional data}, - author={Kailing, Karin and Kriegel, Hans-Peter and Kr{\"o}ger, Peer}, - booktitle={Proc. SDM}, - volume={4}, - year={2004}, - organization={SIAM} -} - -% DBSCAN KDD Test of Time award -% ----------------------------------------------------------------------------- -@misc{SIGKDDNe30:online, -author = {SIGKDD}, -title = {SIGKDD News : 2014 SIGKDD Test of Time Award}, -howpublished = {\url{http://www.kdd.org/News/view/2014-sigkdd-test-of-time-award}}, -month = {}, -year = {2014}, -note = {(Accessed on 10/10/2016)} -} - -% Raftery and Fraley's model-based clustering paper -% ----------------------------------------------------------------------------- -@article{fraley2002model, - title={Model-based clustering, discriminant analysis, and density estimation}, - author={Fraley, Chris and Raftery, Adrian E}, - journal={Journal of the American statistical Association}, - volume={97}, - number={458}, - pages={611--631}, - year={2002}, - publisher={Taylor \& Francis} -} - -% FPC: Flexible Procedures for Clustering -% ----------------------------------------------------------------------------- -@Manual{fpc, -title = {fpc: Flexible Procedures for Clustering}, -author = {Christian Hennig}, -year = {2015}, -note = {R package version 2.1-10}, -url = {https://CRAN.R-project.org/package=fpc}, -} - -% From the ELKI Benchmarking page -% ----------------------------------------------------------------------------- -@article{kriegel2016black, - title={The (black) art of runtime evaluation: Are we comparing algorithms or implementations?}, - author={Kriegel, Hans-Peter and Schubert, Erich and Zimek, Arthur}, - journal={Knowledge and Information Systems}, - pages={1--38}, - year={2016}, - publisher={Springer} -} - -% ANN Library -% ----------------------------------------------------------------------------- -@manual{mount1998ann, - title={ANN: library for approximate nearest neighbour searching}, - author={Mount, David M and Arya, Sunil}, - year={2010}, - url = {http://www.cs.umd.edu/~mount/ANN/}, -} - -% Rcpp -% ----------------------------------------------------------------------------- -@article{eddelbuettel2011rcpp, - title={Rcpp: Seamless R and C++ integration}, - author={Eddelbuettel, Dirk and Fran{\c{c}}ois, Romain and Allaire, J and Chambers, John and Bates, Douglas and Ushey, Kevin}, - journal={Journal of Statistical Software}, - volume={40}, - number={8}, - pages={1--18}, - year={2011} -} - -% ST-DBCAN: SpatioTemporal DBSCAN -% ----------------------------------------------------------------------------- -@article{birant2007st, - title={ST-DBSCAN: An algorithm for clustering spatial--temporal data}, - author={Birant, Derya and Kut, Alp}, - journal={Data \& Knowledge Engineering}, - volume={60}, - number={1}, - pages={208--221}, - year={2007}, - publisher={Elsevier} -} - -% DBSCAN History (small relative to actual number of extensions) -% ----------------------------------------------------------------------------- -@inproceedings{rehman2014dbscan, - title={DBSCAN: Past, present and future}, - author={Rehman, Saif Ur and Asghar, Sohail and Fong, Simon and Sarasvady, S}, - booktitle={Applications of Digital Information and Web Technologies (ICADIWT), 2014 Fifth International Conference on the}, - pages={232--238}, - year={2014}, - organization={IEEE} -} - - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% Miscellaneous % -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - - - -@article{Gupta2010, -abstract = {A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.}, -author = {Gupta, Gunjan and Liu, Alexander and Ghosh, Joydeep}, -doi = {10.1109/TCBB.2008.32}, -file = {:Users/mpiekenbrock/ResearchLibrary/Automated Hierarchical Density Shaving- A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets.pdf:pdf}, -isbn = {1557-9964}, -issn = {15455963}, -journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics}, -keywords = {Bioinformatics,Clustering,Data and knowledge visualization,Mining methods and algorithms}, -number = {2}, -pages = {223--237}, -pmid = {20431143}, -title = {{Automated hierarchical density shaving: A robust automated clustering and visualization framework for large biological data sets}}, -volume = {7}, -year = {2010} -} -@article{Ssets, - author = {P. Fr\"anti and O. Virmajoki}, - title = {Iterative shrinking method for clustering problems}, - journal = {Pattern Recognition}, - year = {2006}, - volume = {39}, - number = {5}, - pages = {761--765} -} - -% Path and Spiral based -@article{chang2008robust, - title={Robust path-based spectral clustering}, - author={Chang, Hong and Yeung, Dit-Yan}, - journal={Pattern Recognition}, - volume={41}, - number={1}, - pages={191--203}, - year={2008}, - publisher={Elsevier} -} - -% Compound dataset -@article{zahn1971graph, - title={Graph-theoretical methods for detecting and describing gestalt clusters}, - author={Zahn, Charles T}, - journal={IEEE Transactions on computers}, - volume={100}, - number={1}, - pages={68--86}, - year={1971}, - publisher={IEEE} -} - -% Aggregation dataset -@article{gionis2007clustering, - title={Clustering aggregation}, - author={Gionis, Aristides and Mannila, Heikki and Tsaparas, Panayiotis}, - journal={ACM Transactions on Knowledge Discovery from Data (TKDD)}, - volume={1}, - number={1}, - pages={4}, - year={2007}, - publisher={ACM} -} - -% R15 dataset -@article{veenman2002maximum, - title={A maximum variance cluster algorithm}, - author={Veenman, Cor J. and Reinders, Marcel J. T. and Backer, Eric}, - journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, - volume={24}, - number={9}, - pages={1273--1280}, - year={2002}, - publisher={IEEE} -} - -@inproceedings{reilly2010detection, - title={Detection and tracking of large number of targets in wide area surveillance}, - author={Reilly, Vladimir and Idrees, Haroon and Shah, Mubarak}, - booktitle={European Conference on Computer Vision}, - pages={186--199}, - year={2010}, - organization={Springer} -} - -@inproceedings{jain2005law, - title={Law, Data clustering: a user’s dilemma}, - author={Jain, Anil K and Martin, HC}, - booktitle={Proceedings of the First international conference on Pattern Recognition and Machine Intelligence}, - year={2005} -} - -@article{jain1999review, - author = {Jain, A. K. and Murty, M. N. and Flynn, P. J.}, - title = {Data Clustering: A Review}, - journal = {ACM Computuing Surveys}, - issue_date = {Sept. 1999}, - volume = {31}, - number = {3}, - month = sep, - year = {1999}, - issn = {0360-0300}, - pages = {264--323}, - numpages = {60}, - url = {http://doi.acm.org/10.1145/331499.331504}, - doi = {10.1145/331499.331504}, - acmid = {331504}, - publisher = {ACM}, - address = {New York, NY, USA}, -} - -% Flame data set -@article{fu2007flame, - title={FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data}, - author={Fu, Limin and Medico, Enzo}, - journal={BMC Bioinformatics}, - volume={8}, - number={1}, - pages={1}, - year={2007}, - publisher={BioMed Central} -} - -% Birch dataset -@article{Birchsets, - author = {T. Zhang and R. Ramakrishnan and M. Livny}, - title = {BIRCH: A new data clustering algorithm and its applications}, - journal = {Data Mining and Knowledge Discovery}, - year = {1997}, - volume = {1}, - number = {2}, - pages = {141--182} -} - -@inproceedings{kisilevich2010p, - title={P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos}, - author={Kisilevich, Slava and Mansmann, Florian and Keim, Daniel}, - booktitle={Proceedings of the 1st international conference and exhibition on computing for geospatial research \& application}, - pages={38}, - year={2010}, - organization={ACM} -} - -@inproceedings{celebi2005mining, - title={Mining biomedical images with density-based clustering}, - author={Celebi, M Emre and Aslandogan, Y Alp and Bergstresser, Paul R}, - booktitle={International Conference on Information Technology: Coding and Computing (ITCC'05)-Volume II}, - volume={1}, - pages={163--168}, - year={2005}, - organization={IEEE} -} - -@inproceedings{ertoz2003finding, - title={Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data.}, - author={Ert{\"o}z, Levent and Steinbach, Michael and Kumar, Vipin}, - booktitle={SDM}, - pages={47--58}, - year={2003}, - organization={SIAM} -} - -@article{Chen2014, -author = {Chen, W and Ji, M H and Wang, J M}, -doi = {10.3991/ijoe.v10i6.3881}, -file = {:Users/mpiekenbrock/ResearchLibrary/TDBSCAN.pdf:pdf}, -issn = {18612121}, -journal = {International Journal of Online Engineering}, -keywords = {Density-based clustering,Personal travel trajectory,T-DBSCAN,Trip segmentation}, -number = {6}, -pages = {19--24}, -title = {{T-DBSCAN: A spatiotemporal density clustering for GPS trajectory segmentation}}, -volume = {10}, -year = {2014} -} - - -@incollection{sander2011density, - title={Density-based clustering}, - author={Sander, Joerg}, - booktitle={Encyclopedia of Machine Learning}, - pages={270--273}, - year={2011}, - publisher={Springer} -} - - -% 88 citations -@article{verma2012comparative, - title={A comparative study of various clustering algorithms in data mining}, - author={Verma, Manish and Srivastava, Mauly and Chack, Neha and Diswar, Atul Kumar and Gupta, Nidhi}, - journal={International Journal of Engineering Research and Applications (IJERA)}, - volume={2}, - number={3}, - pages={1379--1384}, - year={2012} -} - -@inproceedings{roy2005approach, - title={An approach to find embedded clusters using density based techniques}, - author={Roy, Swarup and Bhattacharyya, DK}, - booktitle={International Conference on Distributed Computing and Internet Technology}, - pages={523--535}, - year={2005}, - organization={Springer} -} - -@inproceedings{chowdhury2010efficient, - title={An efficient method for subjectively choosing parameter ‘k’automatically in VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise) algorithm}, - author={Chowdhury, AK M Rasheduzzaman and Mollah, Md Elias and Rahman, Md Asikur}, - booktitle={Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on}, - volume={1}, - pages={38--41}, - year={2010}, - organization={IEEE} -} - -@inproceedings{ghanbarpour2014exdbscan, - title={EXDBSCAN: An extension of DBSCAN to detect clusters in multi-density datasets}, - author={Ghanbarpour, Asieh and Minaei, Behrooz}, - booktitle={Intelligent Systems (ICIS), 2014 Iranian Conference on}, - pages={1--5}, - year={2014}, - organization={IEEE} -} - -@inproceedings{vijayalakshmi2010improved, - title={Improved varied density based spatial clustering algorithm with noise}, - author={Vijayalakshmi, S and Punithavalli, M}, - booktitle={Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on}, - pages={1--4}, - year={2010}, - organization={IEEE} -} - -@article{Wang2013, -author = {Wang, Wei}, -file = {:Users/mpiekenbrock/Downloads/905067f5314e6073d4779c11572bd8c5.pdf:pdf}, -isbn = {978-0-9891305-0-9}, -keywords = {clustering algorithm,clustering techniques,data mining,derivative,global optimum k,similarity,similarity and minimizes intergroup,there are four basic,vdbscan}, -pages = {225--228}, -title = {{Improved VDBSCAN With Global Optimum K}}, -year = {2013} -} - -@article{parvez2012data, - title={Data set property based ‘K’in VDBSCAN Clustering Algorithm}, - author={Parvez, Abu Wahid Md Masud}, - journal={World of Computer Science and Information Technology Journal (WCSIT)}, - volume={2}, - number={3}, - pages={115--119}, - year={2012} -} - -@inproceedings{liu2007vdbscan, - title={VDBSCAN: varied density based spatial clustering of applications with noise}, - author={Liu, Peng and Zhou, Dong and Wu, Naijun}, - booktitle={2007 International conference on service systems and service management}, - pages={1--4}, - year={2007}, - organization={IEEE} -} - -@article{pei2009decode, - title={DECODE: a new method for discovering clusters of different densities in spatial data}, - author={Pei, Tao and Jasra, Ajay and Hand, David J and Zhu, A-Xing and Zhou, Chenghu}, - journal={Data Mining and Knowledge Discovery}, - volume={18}, - number={3}, - pages={337--369}, - year={2009}, - publisher={Springer} -} - -@article{duan2007local, - title={A local-density based spatial clustering algorithm with noise}, - author={Duan, Lian and Xu, Lida and Guo, Feng and Lee, Jun and Yan, Baopin}, - journal={Information Systems}, - volume={32}, - number={7}, - pages={978--986}, - year={2007}, - publisher={Elsevier} -} - -@inproceedings{li2007traffic, - title={Traffic density-based discovery of hot routes in road networks}, - author={Li, Xiaolei and Han, Jiawei and Lee, Jae-Gil and Gonzalez, Hector}, - booktitle={International Symposium on Spatial and Temporal Databases}, - pages={441--459}, - year={2007}, - organization={Springer} -} - -@article{tran2006knn, - title={KNN-kernel density-based clustering for high-dimensional multivariate data}, - author={Tran, Thanh N and Wehrens, Ron and Buydens, Lutgarde MC}, - journal={Computational Statistics \& Data Analysis}, - volume={51}, - number={2}, - pages={513--525}, - year={2006}, - publisher={Elsevier} -} - -@inproceedings{jiang2003dhc, - title={DHC: a density-based hierarchical clustering method for time series gene expression data}, - author={Jiang, Daxin and Pei, Jian and Zhang, Aidong}, - booktitle={Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on}, - pages={393--400}, - year={2003}, - organization={IEEE} -} - -@inproceedings{kriegel2005density, - title={Density-based clustering of uncertain data}, - author={Kriegel, Hans-Peter and Pfeifle, Martin}, - booktitle={Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining}, - pages={672--677}, - year={2005}, - organization={ACM} -} - -@book{agrawal1998automatic, - title={Automatic subspace clustering of high dimensional data for data mining applications}, - author={Agrawal, Rakesh and Gehrke, Johannes and Gunopulos, Dimitrios and Raghavan, Prabhakar}, - volume={27}, - number={2}, - year={1998}, - publisher={ACM} -} - -@inproceedings{cao2006density, - title={Density-Based Clustering over an Evolving Data Stream with Noise.}, - author={Cao, Feng and Ester, Martin and Qian, Weining and Zhou, Aoying}, - booktitle={SDM}, - volume={6}, - pages={328--339}, - year={2006}, - organization={SIAM} -} - -@inproceedings{chen2007density, - title={Density-based clustering for real-time stream data}, - author={Chen, Yixin and Tu, Li}, - booktitle={Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining}, - pages={133--142}, - year={2007}, - organization={ACM} -} - - -@article{kriegel:2011, - title={Density-based clustering}, - author={Kriegel, Hans-Peter and Kr{\"o}ger, Peer and Sander, J{\"o}rg and Zimek Arthur}, - journal={Wires Data and Knowledge Discovery}, - volume={1}, - number={}, - pages={231--240}, - year={2011}, - publisher={John Wiley \& Sons} -} - -@book{Aggarwal:2013, - author = {Aggarwal, Charu C. and Reddy, Chandan K.}, - title = {Data Clustering: Algorithms and Applications}, - year = {2013}, - isbn = {1466558210, 9781466558212}, - edition = {1st}, - publisher = {Chapman \& Hall/CRC}, -} - -@book{Kaufman:1990, - title = "Finding groups in data : an introduction to cluster analysis", - author = "Kaufman, Leonard and Rousseeuw, Peter J.", - series = "Wiley series in probability and mathematical statistics", - publisher = "Wiley", - address = "New York", - isbn = "0-471-87876-6", - year = 1990 -} +@Article{hahsler2019dbscan, + title = {{dbscan}: Fast Density-Based Clustering with {R}}, + author = {Michael Hahsler and Matthew Piekenbrock and Derek Doran}, + journal = {Journal of Statistical Software}, + year = {2019}, + volume = {91}, + number = {1}, + pages = {1--30}, + doi = {10.18637/jss.v091.i01}, + } + + +@inproceedings{ester1996density, + title={A density-based algorithm for discovering clusters in large spatial databases with noise.}, + author={Ester, Martin and Kriegel, Hans-Peter and Sander, J{\"o}rg and Xu, Xiaowei and others}, + booktitle={Kdd}, + volume={96}, + number={34}, + pages={226--231}, + year={1996} +} + +@Manual{dbscan-R, + title = {dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms}, + author = {Michael Hahsler and Matthew Piekenbrock}, + note = {R package version 0.9-8.2}, + year={2016} +} +%% Original OPTICS paper +%% ----------------------------------------------------------------------------- +@inproceedings{ankerst1999optics, + title={OPTICS: ordering points to identify the clustering structure}, + author={Ankerst, Mihael and Breunig, Markus M and Kriegel, Hans-Peter and Sander, J{\"o}rg}, + booktitle={ACM Sigmod Record}, + volume={28}, + number={2}, + pages={49--60}, + year={1999}, + organization={ACM} +} + +% OPTICS cluster extraction improvements +% ----------------------------------------------------------------------------- +@inproceedings{DBLP:conf/lwa/SchubertG18, + author = {Erich Schubert and + Michael Gertz}, + title = {Improving the Cluster Structure Extracted from {OPTICS} Plots}, + booktitle = {Lernen, Wissen, Daten, Analysen (LWDA 2018)}, + series = {{CEUR} Workshop Proceedings}, + volume = {2191}, + pages = {318--329}, + publisher = {CEUR-WS.org}, + year = {2018} +} + +% Original LOF paper +% ----------------------------------------------------------------------------- +@inproceedings{breunig2000lof, + title={LOF: identifying density-based local outliers}, + author={Breunig, Markus M and Kriegel, Hans-Peter and Ng, Raymond T and Sander, J{\"o}rg}, + booktitle={ACM sigmod record}, + volume={29}, + number={2}, + pages={93--104}, + year={2000}, + organization={ACM} +} + +% 2003 Reachability <--> Dendrograms Conversions Paper +% ----------------------------------------------------------------------------- +@inproceedings{sander2003automatic, + title={Automatic extraction of clusters from hierarchical clustering representations}, + author={Sander, J{\"o}rg and Qin, Xuejie and Lu, Zhiyong and Niu, Nan and Kovarsky, Alex}, + booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining}, + pages={75--87}, + year={2003}, + organization={Springer} +} + +% Original BIRCH paper +% ----------------------------------------------------------------------------- +@inproceedings{zhang96, + title={BIRCH: an efficient data clustering method for very large databases}, + author={Zhang, Tian and Ramakrishnan, Raghu and Livny, Miron}, + booktitle={ACM Sigmod Record}, + volume={25}, + number={2}, + pages={103--114}, + year={1996}, + organization={ACM} +} + +% GDBSCAN Paper (Generalized DBSCAN, by Sanders) +% ----------------------------------------------------------------------------- +@article{sander1998density, + title={Density-based clustering in spatial databases: The algorithm gdbscan and its applications}, + author={Sander, J{\"o}rg and Ester, Martin and Kriegel, Hans-Peter and Xu, Xiaowei}, + journal={Data mining and knowledge discovery}, + volume={2}, + number={2}, + pages={169--194}, + year={1998}, + publisher={Springer} +} + +% HDBSCAN* Newest Paper +% ----------------------------------------------------------------------------- +@article{campello2015hierarchical, + title={Hierarchical density estimates for data clustering, visualization, and outlier detection}, + author={Campello, Ricardo JGB and Moulavi, Davoud and Zimek, Arthur and Sander, Joerg}, + journal={ACM Transactions on Knowledge Discovery from Data (TKDD)}, + volume={10}, + number={1}, + pages={5}, + year={2015}, + publisher={ACM} +} + +% First HDBSCAN* introduction paper, later revised in 2015. The newer one is better. +% ----------------------------------------------------------------------------- +@inproceedings{campello2013density, + title={Density-based clustering based on hierarchical density estimates}, + author={Campello, Ricardo JGB and Moulavi, Davoud and Sander, J{\"o}rg}, + booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining}, + pages={160--172}, + year={2013}, + organization={Springer} +} + +% The new-ish 'Standard Methodology' paper of that 'tackles the methodological drawbacks' % of internal clustering validation +% ----------------------------------------------------------------------------- +@article{gurrutxaga2011towards, + title={Towards a standard methodology to evaluate internal cluster validity indices}, + author={Gurrutxaga, Ibai and Muguerza, Javier and Arbelaitz, Olatz and P{\'e}rez, Jes{\'u}s M and Mart{\'\i}n, Jos{\'e} I}, + journal={Pattern Recognition Letters}, + volume={32}, + number={3}, + pages={505--515}, + year={2011}, + publisher={Elsevier} +} + +% Original ABACUS - Workaround implementation of mixture modeling for finding +% arbitrary shapes +% ----------------------------------------------------------------------------- +@article{gegick2011abacus, + title={ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification}, + author={Gegick, M}, + year={2011}, + publisher={SIAM} +} + + +% Original Silhouette Index Paper +% ----------------------------------------------------------------------------- +@article{rousseeuw1987silhouettes, + title={Silhouettes: a graphical aid to the interpretation and validation of cluster analysis}, + author={Rousseeuw, Peter J}, + journal={Journal of computational and applied mathematics}, + volume={20}, + pages={53--65}, + year={1987}, + publisher={Elsevier} +} + +% Extensive Comparative Study of IVMS +% ----------------------------------------------------------------------------- +@article{arbelaitz2013extensive, + title={An extensive comparative study of cluster validity indices}, + author={Arbelaitz, Olatz and Gurrutxaga, Ibai and Muguerza, Javier and P{\'e}rez, Jes{\'u}S M and Perona, I{\~n}Igo}, + journal={Pattern Recognition}, + volume={46}, + number={1}, + pages={243--256}, + year={2013}, + publisher={Elsevier} +} + +% Graph Theory measures for Internal Cluster Validation +% ----------------------------------------------------------------------------- +@article{pal1997cluster, + title={Cluster validation using graph theoretic concepts}, + author={Pal, Nikhil R and Biswas, J}, + journal={Pattern Recognition}, + volume={30}, + number={6}, + pages={847--857}, + year={1997}, + publisher={Elsevier} +} + +% Rankings of research papers by citation count; used for showing DBSCAN +% popularity +% ----------------------------------------------------------------------------- +@misc{acade96:online, +author = {{Microsoft Academic Search}}, +title = {Top publications in data mining}, +howpublished = {\url{http://academic.research.microsoft.com/RankList?entitytype=1&topDomainID=2&subDomainID=7&last=0&start=1&end=100}}, +month = {}, +year = {2016}, +note = {(Accessed on 08/29/2016)} +} + + +@misc{PyCluste54:online, +author = {Novikov, Andrei}, +title = {PyClustering: PyClustering library}, +howpublished = {\url{http://pythonhosted.org/pyclustering/}}, +year = {2016}, +note = {v.0.6.6} +} + + +% Hartigans convex density estimation model +% ----------------------------------------------------------------------------- +@article{hartigan1987estimation, + title={Estimation of a convex density contour in two dimensions}, + author={Hartigan, JA}, + journal={Journal of the American Statistical Association}, + volume={82}, + number={397}, + pages={267--270}, + year={1987}, + publisher={Taylor \& Francis} +} + +% Bentleys Original KDTree Paper +% ----------------------------------------------------------------------------- +@article{bentley1975multidimensional, + title={Multidimensional binary search trees used for associative searching}, + author={Bentley, Jon Louis}, + journal={Communications of the ACM}, + volume={18}, + number={9}, + pages={509--517}, + year={1975}, + publisher={ACM} +} + +% Original CLARANS paper +% ----------------------------------------------------------------------------- +@article{ng2002clarans, + title={CLARANS: A method for clustering objects for spatial data mining}, + author={Ng, Raymond T. and Han, Jiawei}, + journal={IEEE transactions on knowledge and data engineering}, + volume={14}, + number={5}, + pages={1003--1016}, + year={2002}, + publisher={IEEE} +} + +% Original DENCLUE paper +% ----------------------------------------------------------------------------- +@inproceedings{hinneburg1998efficient, + title={An efficient approach to clustering in large multimedia databases with noise}, + author={Hinneburg, Alexander and Keim, Daniel A}, + booktitle={KDD}, + volume={98}, + pages={58--65}, + year={1998} +} + +% Original Chameleon Paper +% ----------------------------------------------------------------------------- +@article{karypis1999chameleon, + title={Chameleon: Hierarchical clustering using dynamic modeling}, + author={Karypis, George and Han, Eui-Hong and Kumar, Vipin}, + journal={Computer}, + volume={32}, + number={8}, + pages={68--75}, + year={1999}, + publisher={IEEE} +} + +% Original CURE algorithm +% ----------------------------------------------------------------------------- +@inproceedings{guha1998cure, + title={CURE: an efficient clustering algorithm for large databases}, + author={Guha, Sudipto and Rastogi, Rajeev and Shim, Kyuseok}, + booktitle={ACM SIGMOD Record}, + volume={27}, + number={2}, + pages={73--84}, + year={1998}, + organization={ACM} +} + +% R statistical computing language citation +% ----------------------------------------------------------------------------- +@article{team2013r, + title={R: A language and environment for statistical computing}, + author={Team, R Core and others}, + year={2013}, + publisher={Vienna, Austria} +} + +% WEKA +% ----------------------------------------------------------------------------- +@article{hall2009weka, + title={The WEKA data mining software: an update}, + author={Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H}, + journal={ACM SIGKDD explorations newsletter}, + volume={11}, + number={1}, + pages={10--18}, + year={2009}, + publisher={ACM} +} + +% SPMF Java Machine Learning Library +% ----------------------------------------------------------------------------- +@article{fournier2014spmf, + title={SPMF: a Java open-source pattern mining library.}, + author={Fournier-Viger, Philippe and Gomariz, Antonio and Gueniche, Ted and Soltani, Azadeh and Wu, Cheng-Wei and Tseng, Vincent S and others}, + journal={Journal of Machine Learning Research}, + volume={15}, + number={1}, + pages={3389--3393}, + year={2014} +} + +% Python Scikit Learn +% ----------------------------------------------------------------------------- +@article{pedregosa2011scikit, + title={Scikit-learn: Machine learning in Python}, + author={Pedregosa, Fabian and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others}, + journal={Journal of Machine Learning Research}, + volume={12}, + number={Oct}, + pages={2825--2830}, + year={2011} +} + +% MATLAB TOMCAT Toolkit +% ----------------------------------------------------------------------------- +@article{daszykowski2007tomcat, + title={TOMCAT: A MATLAB toolbox for multivariate calibration techniques}, + author={Daszykowski, Micha{\l} and Serneels, Sven and Kaczmarek, Krzysztof and Van Espen, Piet and Croux, Christophe and Walczak, Beata}, + journal={Chemometrics and intelligent laboratory systems}, + volume={85}, + number={2}, + pages={269--277}, + year={2007}, + publisher={Elsevier} +} + +% OPTICS code for TOMCAT +% ----------------------------------------------------------------------------- +@article{daszykowski2002looking, + title={Looking for natural patterns in analytical data. 2. Tracing local density with OPTICS}, + author={Daszykowski, Michael and Walczak, Beata and Massart, Desire L}, + journal={Journal of chemical information and computer sciences}, + volume={42}, + number={3}, + pages={500--507}, + year={2002}, + publisher={ACS Publications} +} + +% Java ML library +% ----------------------------------------------------------------------------- +@comment{ Abeel, T.; de Peer, Y. V. & Saeys, Y. Java-ML: A Machine Learning + Library, Journal of Machine Learning Research, 2009, 10, 931-934 } +@book{abeel2009journal, +author = "Abeel, T. ; de Peer and Y. V. and Saeys, Y. Java-ML: A Machine Learning Library", +title = "Journal of Machine Learning Research", +publisher = "10", +pages = "931--934", +year = 2009 +} + + +% ELKI +% ----------------------------------------------------------------------------- +@article{DBLP:journals/pvldb/SchubertKEZSZ15, + author = {Erich Schubert and + Alexander Koos and + Tobias Emrich and + Andreas Z{\"{u}}fle and + Klaus Arthur Schmid and + Arthur Zimek}, + title = {A Framework for Clustering Uncertain Data}, + journal = {{PVLDB}}, + volume = {8}, + number = {12}, + pages = {1976--1979}, + year = {2015}, + url = {http://www.vldb.org/pvldb/vol8/p1976-schubert.pdf}, + timestamp = {Mon, 30 May 2016 12:01:10 +0200}, + biburl = {http://dblp.uni-trier.de/rec/bib/journals/pvldb/SchubertKEZSZ15}, + bibsource = {dblp computer science bibliography, http://dblp.org} +} + +% BIRCH CRAN records +% ----------------------------------------------------------------------------- +@misc{CRANPack84:online, author={CRAN}, title = {CRAN - Package birch}, howpublished = {\url{https://cran.r-project.org/web/packages/birch/index.html}}, month = {}, year = {2016}, note = {(Accessed on 09/16/2016)} } + +% Spectral Clustering +% ---------------------------------------------------------------------------- +@inproceedings{dhillon2004kernel, + title={Kernel k-means: spectral clustering and normalized cuts}, + author={Dhillon, Inderjit S and Guan, Yuqiang and Kulis, Brian}, + booktitle={Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining}, + pages={551--556}, + year={2004}, + organization={ACM} +} + + +% Disjoint-set data structure (2 citations) +% ----------------------------------------------------------------------------- +@misc{cormen2001introduction, + title={Introduction to algorithms second edition}, + author={Cormen, Thomas H and Leiserson, Charles E and Rivest, Ronald L and Stein, Clifford}, + year={2001}, + publisher={The MIT Press} +} +@inproceedings{patwary2010experiments, + title={Experiments on union-find algorithms for the disjoint-set data structure}, + author={Patwary, Md Mostofa Ali and Blair, Jean and Manne, Fredrik}, + booktitle={International Symposium on Experimental Algorithms}, + pages={411--423}, + year={2010}, + organization={Springer} +} + +% SUBCLU high-dimensional density based clustering +% ----------------------------------------------------------------------------- +@inproceedings{kailing2004density, + title={Density-connected subspace clustering for high-dimensional data}, + author={Kailing, Karin and Kriegel, Hans-Peter and Kr{\"o}ger, Peer}, + booktitle={Proc. SDM}, + volume={4}, + year={2004}, + organization={SIAM} +} + +% DBSCAN KDD Test of Time award +% ----------------------------------------------------------------------------- +@misc{SIGKDDNe30:online, +author = {SIGKDD}, +title = {SIGKDD News : 2014 SIGKDD Test of Time Award}, +howpublished = {\url{http://www.kdd.org/News/view/2014-sigkdd-test-of-time-award}}, +month = {}, +year = {2014}, +note = {(Accessed on 10/10/2016)} +} + +% Raftery and Fraley's model-based clustering paper +% ----------------------------------------------------------------------------- +@article{fraley2002model, + title={Model-based clustering, discriminant analysis, and density estimation}, + author={Fraley, Chris and Raftery, Adrian E}, + journal={Journal of the American statistical Association}, + volume={97}, + number={458}, + pages={611--631}, + year={2002}, + publisher={Taylor \& Francis} +} + +% FPC: Flexible Procedures for Clustering +% ----------------------------------------------------------------------------- +@Manual{fpc, +title = {fpc: Flexible Procedures for Clustering}, +author = {Christian Hennig}, +year = {2015}, +note = {R package version 2.1-10}, +url = {https://CRAN.R-project.org/package=fpc}, +} + +% From the ELKI Benchmarking page +% ----------------------------------------------------------------------------- +@article{kriegel2016black, + title={The (black) art of runtime evaluation: Are we comparing algorithms or implementations?}, + author={Kriegel, Hans-Peter and Schubert, Erich and Zimek, Arthur}, + journal={Knowledge and Information Systems}, + pages={1--38}, + year={2016}, + publisher={Springer} +} + +% ANN Library +% ----------------------------------------------------------------------------- +@manual{mount1998ann, + title={ANN: library for approximate nearest neighbour searching}, + author={Mount, David M and Arya, Sunil}, + year={2010}, + url = {http://www.cs.umd.edu/~mount/ANN/}, +} + +% Rcpp +% ----------------------------------------------------------------------------- +@article{eddelbuettel2011rcpp, + title={Rcpp: Seamless R and C++ integration}, + author={Eddelbuettel, Dirk and Fran{\c{c}}ois, Romain and Allaire, J and Chambers, John and Bates, Douglas and Ushey, Kevin}, + journal={Journal of Statistical Software}, + volume={40}, + number={8}, + pages={1--18}, + year={2011} +} + +% ST-DBCAN: SpatioTemporal DBSCAN +% ----------------------------------------------------------------------------- +@article{birant2007st, + title={ST-DBSCAN: An algorithm for clustering spatial--temporal data}, + author={Birant, Derya and Kut, Alp}, + journal={Data \& Knowledge Engineering}, + volume={60}, + number={1}, + pages={208--221}, + year={2007}, + publisher={Elsevier} +} + +% DBSCAN History (small relative to actual number of extensions) +% ----------------------------------------------------------------------------- +@inproceedings{rehman2014dbscan, + title={DBSCAN: Past, present and future}, + author={Rehman, Saif Ur and Asghar, Sohail and Fong, Simon and Sarasvady, S}, + booktitle={Applications of Digital Information and Web Technologies (ICADIWT), 2014 Fifth International Conference on the}, + pages={232--238}, + year={2014}, + organization={IEEE} +} + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% Miscellaneous % +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + + +@article{Gupta2010, +abstract = {A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.}, +author = {Gupta, Gunjan and Liu, Alexander and Ghosh, Joydeep}, +doi = {10.1109/TCBB.2008.32}, +file = {:Users/mpiekenbrock/ResearchLibrary/Automated Hierarchical Density Shaving- A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets.pdf:pdf}, +isbn = {1557-9964}, +issn = {15455963}, +journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics}, +keywords = {Bioinformatics,Clustering,Data and knowledge visualization,Mining methods and algorithms}, +number = {2}, +pages = {223--237}, +pmid = {20431143}, +title = {{Automated hierarchical density shaving: A robust automated clustering and visualization framework for large biological data sets}}, +volume = {7}, +year = {2010} +} +@article{Ssets, + author = {P. Fr\"anti and O. Virmajoki}, + title = {Iterative shrinking method for clustering problems}, + journal = {Pattern Recognition}, + year = {2006}, + volume = {39}, + number = {5}, + pages = {761--765} +} + +% Path and Spiral based +@article{chang2008robust, + title={Robust path-based spectral clustering}, + author={Chang, Hong and Yeung, Dit-Yan}, + journal={Pattern Recognition}, + volume={41}, + number={1}, + pages={191--203}, + year={2008}, + publisher={Elsevier} +} + +% Compound dataset +@article{zahn1971graph, + title={Graph-theoretical methods for detecting and describing gestalt clusters}, + author={Zahn, Charles T}, + journal={IEEE Transactions on computers}, + volume={100}, + number={1}, + pages={68--86}, + year={1971}, + publisher={IEEE} +} + +% Aggregation dataset +@article{gionis2007clustering, + title={Clustering aggregation}, + author={Gionis, Aristides and Mannila, Heikki and Tsaparas, Panayiotis}, + journal={ACM Transactions on Knowledge Discovery from Data (TKDD)}, + volume={1}, + number={1}, + pages={4}, + year={2007}, + publisher={ACM} +} + +% R15 dataset +@article{veenman2002maximum, + title={A maximum variance cluster algorithm}, + author={Veenman, Cor J. and Reinders, Marcel J. T. and Backer, Eric}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + volume={24}, + number={9}, + pages={1273--1280}, + year={2002}, + publisher={IEEE} +} + +@inproceedings{reilly2010detection, + title={Detection and tracking of large number of targets in wide area surveillance}, + author={Reilly, Vladimir and Idrees, Haroon and Shah, Mubarak}, + booktitle={European Conference on Computer Vision}, + pages={186--199}, + year={2010}, + organization={Springer} +} + +@inproceedings{jain2005law, + title={Law, Data clustering: a user’s dilemma}, + author={Jain, Anil K and Martin, HC}, + booktitle={Proceedings of the First international conference on Pattern Recognition and Machine Intelligence}, + year={2005} +} + +@article{jain1999review, + author = {Jain, A. K. and Murty, M. N. and Flynn, P. J.}, + title = {Data Clustering: A Review}, + journal = {ACM Computuing Surveys}, + issue_date = {Sept. 1999}, + volume = {31}, + number = {3}, + month = sep, + year = {1999}, + issn = {0360-0300}, + pages = {264--323}, + numpages = {60}, + url = {http://doi.acm.org/10.1145/331499.331504}, + doi = {10.1145/331499.331504}, + acmid = {331504}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + +% Flame data set +@article{fu2007flame, + title={FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data}, + author={Fu, Limin and Medico, Enzo}, + journal={BMC Bioinformatics}, + volume={8}, + number={1}, + pages={1}, + year={2007}, + publisher={BioMed Central} +} + +% Birch dataset +@article{Birchsets, + author = {T. Zhang and R. Ramakrishnan and M. Livny}, + title = {BIRCH: A new data clustering algorithm and its applications}, + journal = {Data Mining and Knowledge Discovery}, + year = {1997}, + volume = {1}, + number = {2}, + pages = {141--182} +} + +@inproceedings{kisilevich2010p, + title={P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos}, + author={Kisilevich, Slava and Mansmann, Florian and Keim, Daniel}, + booktitle={Proceedings of the 1st international conference and exhibition on computing for geospatial research \& application}, + pages={38}, + year={2010}, + organization={ACM} +} + +@inproceedings{celebi2005mining, + title={Mining biomedical images with density-based clustering}, + author={Celebi, M Emre and Aslandogan, Y Alp and Bergstresser, Paul R}, + booktitle={International Conference on Information Technology: Coding and Computing (ITCC'05)-Volume II}, + volume={1}, + pages={163--168}, + year={2005}, + organization={IEEE} +} + +@inproceedings{ertoz2003finding, + title={Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data.}, + author={Ert{\"o}z, Levent and Steinbach, Michael and Kumar, Vipin}, + booktitle={SDM}, + pages={47--58}, + year={2003}, + organization={SIAM} +} + +@article{Chen2014, +author = {Chen, W and Ji, M H and Wang, J M}, +doi = {10.3991/ijoe.v10i6.3881}, +file = {:Users/mpiekenbrock/ResearchLibrary/TDBSCAN.pdf:pdf}, +issn = {18612121}, +journal = {International Journal of Online Engineering}, +keywords = {Density-based clustering,Personal travel trajectory,T-DBSCAN,Trip segmentation}, +number = {6}, +pages = {19--24}, +title = {{T-DBSCAN: A spatiotemporal density clustering for GPS trajectory segmentation}}, +volume = {10}, +year = {2014} +} + + +@incollection{sander2011density, + title={Density-based clustering}, + author={Sander, Joerg}, + booktitle={Encyclopedia of Machine Learning}, + pages={270--273}, + year={2011}, + publisher={Springer} +} + + +% 88 citations +@article{verma2012comparative, + title={A comparative study of various clustering algorithms in data mining}, + author={Verma, Manish and Srivastava, Mauly and Chack, Neha and Diswar, Atul Kumar and Gupta, Nidhi}, + journal={International Journal of Engineering Research and Applications (IJERA)}, + volume={2}, + number={3}, + pages={1379--1384}, + year={2012} +} + +@inproceedings{roy2005approach, + title={An approach to find embedded clusters using density based techniques}, + author={Roy, Swarup and Bhattacharyya, DK}, + booktitle={International Conference on Distributed Computing and Internet Technology}, + pages={523--535}, + year={2005}, + organization={Springer} +} + +@inproceedings{chowdhury2010efficient, + title={An efficient method for subjectively choosing parameter ‘k’automatically in VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise) algorithm}, + author={Chowdhury, AK M Rasheduzzaman and Mollah, Md Elias and Rahman, Md Asikur}, + booktitle={Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on}, + volume={1}, + pages={38--41}, + year={2010}, + organization={IEEE} +} + +@inproceedings{ghanbarpour2014exdbscan, + title={EXDBSCAN: An extension of DBSCAN to detect clusters in multi-density datasets}, + author={Ghanbarpour, Asieh and Minaei, Behrooz}, + booktitle={Intelligent Systems (ICIS), 2014 Iranian Conference on}, + pages={1--5}, + year={2014}, + organization={IEEE} +} + +@inproceedings{vijayalakshmi2010improved, + title={Improved varied density based spatial clustering algorithm with noise}, + author={Vijayalakshmi, S and Punithavalli, M}, + booktitle={Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on}, + pages={1--4}, + year={2010}, + organization={IEEE} +} + +@article{Wang2013, +author = {Wang, Wei}, +file = {:Users/mpiekenbrock/Downloads/905067f5314e6073d4779c11572bd8c5.pdf:pdf}, +isbn = {978-0-9891305-0-9}, +keywords = {clustering algorithm,clustering techniques,data mining,derivative,global optimum k,similarity,similarity and minimizes intergroup,there are four basic,vdbscan}, +pages = {225--228}, +title = {{Improved VDBSCAN With Global Optimum K}}, +year = {2013} +} + +@article{parvez2012data, + title={Data set property based ‘K’in VDBSCAN Clustering Algorithm}, + author={Parvez, Abu Wahid Md Masud}, + journal={World of Computer Science and Information Technology Journal (WCSIT)}, + volume={2}, + number={3}, + pages={115--119}, + year={2012} +} + +@inproceedings{liu2007vdbscan, + title={VDBSCAN: varied density based spatial clustering of applications with noise}, + author={Liu, Peng and Zhou, Dong and Wu, Naijun}, + booktitle={2007 International conference on service systems and service management}, + pages={1--4}, + year={2007}, + organization={IEEE} +} + +@article{pei2009decode, + title={DECODE: a new method for discovering clusters of different densities in spatial data}, + author={Pei, Tao and Jasra, Ajay and Hand, David J and Zhu, A-Xing and Zhou, Chenghu}, + journal={Data Mining and Knowledge Discovery}, + volume={18}, + number={3}, + pages={337--369}, + year={2009}, + publisher={Springer} +} + +@article{duan2007local, + title={A local-density based spatial clustering algorithm with noise}, + author={Duan, Lian and Xu, Lida and Guo, Feng and Lee, Jun and Yan, Baopin}, + journal={Information Systems}, + volume={32}, + number={7}, + pages={978--986}, + year={2007}, + publisher={Elsevier} +} + +@inproceedings{li2007traffic, + title={Traffic density-based discovery of hot routes in road networks}, + author={Li, Xiaolei and Han, Jiawei and Lee, Jae-Gil and Gonzalez, Hector}, + booktitle={International Symposium on Spatial and Temporal Databases}, + pages={441--459}, + year={2007}, + organization={Springer} +} + +@article{tran2006knn, + title={KNN-kernel density-based clustering for high-dimensional multivariate data}, + author={Tran, Thanh N and Wehrens, Ron and Buydens, Lutgarde MC}, + journal={Computational Statistics \& Data Analysis}, + volume={51}, + number={2}, + pages={513--525}, + year={2006}, + publisher={Elsevier} +} + +@inproceedings{jiang2003dhc, + title={DHC: a density-based hierarchical clustering method for time series gene expression data}, + author={Jiang, Daxin and Pei, Jian and Zhang, Aidong}, + booktitle={Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on}, + pages={393--400}, + year={2003}, + organization={IEEE} +} + +@inproceedings{kriegel2005density, + title={Density-based clustering of uncertain data}, + author={Kriegel, Hans-Peter and Pfeifle, Martin}, + booktitle={Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining}, + pages={672--677}, + year={2005}, + organization={ACM} +} + +@book{agrawal1998automatic, + title={Automatic subspace clustering of high dimensional data for data mining applications}, + author={Agrawal, Rakesh and Gehrke, Johannes and Gunopulos, Dimitrios and Raghavan, Prabhakar}, + volume={27}, + number={2}, + year={1998}, + publisher={ACM} +} + +@inproceedings{cao2006density, + title={Density-Based Clustering over an Evolving Data Stream with Noise.}, + author={Cao, Feng and Ester, Martin and Qian, Weining and Zhou, Aoying}, + booktitle={SDM}, + volume={6}, + pages={328--339}, + year={2006}, + organization={SIAM} +} + +@inproceedings{chen2007density, + title={Density-based clustering for real-time stream data}, + author={Chen, Yixin and Tu, Li}, + booktitle={Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining}, + pages={133--142}, + year={2007}, + organization={ACM} +} + + +@article{kriegel:2011, + title={Density-based clustering}, + author={Kriegel, Hans-Peter and Kr{\"o}ger, Peer and Sander, J{\"o}rg and Zimek Arthur}, + journal={Wires Data and Knowledge Discovery}, + volume={1}, + number={}, + pages={231--240}, + year={2011}, + publisher={John Wiley \& Sons} +} + +@book{Aggarwal:2013, + author = {Aggarwal, Charu C. and Reddy, Chandan K.}, + title = {Data Clustering: Algorithms and Applications}, + year = {2013}, + isbn = {1466558210, 9781466558212}, + edition = {1st}, + publisher = {Chapman \& Hall/CRC}, +} + +@book{Kaufman:1990, + title = "Finding groups in data : an introduction to cluster analysis", + author = "Kaufman, Leonard and Rousseeuw, Peter J.", + series = "Wiley series in probability and mathematical statistics", + publisher = "Wiley", + address = "New York", + isbn = "0-471-87876-6", + year = 1990 +} diff --git a/vignettes/hdbscan.Rmd b/vignettes/hdbscan.Rmd index 4a68db7..e55d1e6 100644 --- a/vignettes/hdbscan.Rmd +++ b/vignettes/hdbscan.Rmd @@ -8,7 +8,7 @@ vignette: > header-includes: \usepackage{animation} output: html_document --- -The dbscan package includes a fast implementation of Hierarchical DBSCAN (HDBSCAN) and its related algorithm(s) for the +The dbscan package [6] includes a fast implementation of Hierarchical DBSCAN (HDBSCAN) and its related algorithm(s) for the R platform. This vignette introduces how to interface with these features. To understand how HDBSCAN works, we refer to an excellent Python Notebook resource that goes over the basic concepts of the algorithm (see [ the SciKit-learn docs](http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html)). For the sake of simplicity, consider the same sample dataset from the notebook: ```{r} @@ -162,5 +162,6 @@ One of the primary computational bottleneck with using HDBSCAN is the computatio 3. Campello, Ricardo JGB, Davoud Moulavi, and Joerg Sander. "Density-based clustering based on hierarchical density estimates." In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 160-172. Springer Berlin Heidelberg, 2013. 4. Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Jörg Sander. "Hierarchical density estimates for data clustering, visualization, and outlier detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 10, no. 1 (2015): 5. 5. Karypis, George, Eui-Hong Han, and Vipin Kumar. "Chameleon: Hierarchical clustering using dynamic modeling." Computer 32, no. 8 (1999): 68-75. +6. Hahsler M, Piekenbrock M, Doran D (2019). "dbscan: Fast Density-Based Clustering with R." Journal of Statistical Software, 91(1), 1-30. doi: [10.18637/jss.v091.i01](https://doi.org/10.18637/jss.v091.i01)