Skip to content

Commit

Permalink
prep for CRAN submission
Browse files Browse the repository at this point in the history
  • Loading branch information
jlmelville committed Sep 19, 2023
1 parent ba05fc7 commit 389be79
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 47 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: RcppHNSW
Title: 'Rcpp' Bindings for 'hnswlib', a Library for Approximate Nearest Neighbors
Version: 0.4.1.9000
Version: 0.5.0
Authors@R: c(person("James", "Melville", email = "[email protected]",
role = c("aut", "cre")),
person("Aaron", "Lun", role = "ctb"),
Expand Down
47 changes: 25 additions & 22 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,35 @@
# RcppHNSW 0.4.9000
# RcppHNSW 0.5.0

## New features

* For high-dimensional data, there can be a noticeable CPU overhead in copying
data out of the non-contiguous memory regions when row-wise data is used. If
you wish to provide data where each *column* of the input matrix contains an
item to be indexed/search then see the following additions to the API:
* For the class-based API: `addItemsCol`, `getAllNNsCol` and
`getAllNNsListCol` are the column-based equivalents of `addItems`,
`getAllNNs` and `getAllNNsList`, respectively. Note that the returned
nearest neighbor data from `getAllNNsCol` and `getAllNNsListCol` are *also*
stored by column, i.e. the matrices have dimensions `k x n` where `k` is the
number of neighbors, and `n` the number of items in the data being searched.
* For the function-based API, a new parameter `byrow` has been added to
`hnsw_knn`, `hnsw_build` and `hnsw_search`. By default this is set to `TRUE`
and indicates that the items in the input matrix are found in each row. To
pass column-stored items, set `byrow = FALSE`. Any matrices returned by
`hnsw_search` and `hnsw_knn` will now follow the convention provided by the
value of `byrow`: i.e. if `byrow = FALSE`, the matrices contain nearest
neighbor information in each column.
* new method: `getItems`, which returns a matrix of the data vectors in the
index with the specified integer identifiers. From a feature request made by
[d4tum](https://github.com/d4tum) (<https://github.com/jlmelville/rcpphnsw/issues/18>).
* Updated hnswlib to [version 0.7.0](https://github.com/nmslib/hnswlib/releases/tag/v0.7.0).
Note that I made some very minor changes to the code to silence some compiler warnings. These
changes have been submitted up-stream to the hnswlib project.
* For high-dimensional data, there can be a noticeable CPU overhead in copying data out of the
non-contiguous memory regions when row-wise data is used. If you wish to provide data where each
*column* of the input matrix contains an item to be indexed/search then see the following additions
to the API:
* For the class-based API: `addItemsCol`, `getAllNNsCol` and `getAllNNsListCol` are the
column-based equivalents of `addItems`, `getAllNNs` and `getAllNNsList`, respectively. Note that
the returned nearest neighbor data from `getAllNNsCol` and `getAllNNsListCol` are *also* stored
by column, i.e. the matrices have dimensions `k x n` where `k` is the number of neighbors, and
`n` the number of items in the data being searched.
* For the function-based API, a new parameter `byrow` has been added to `hnsw_knn`, `hnsw_build`
and `hnsw_search`. By default this is set to `TRUE` and indicates that the items in the input
matrix are found in each row. To pass column-stored items, set `byrow = FALSE`. Any matrices
returned by `hnsw_search` and `hnsw_knn` will now follow the convention provided by the value of
`byrow`: i.e. if `byrow = FALSE`, the matrices contain nearest neighbor information in each
column.
* new method: `getItems`, which returns a matrix of the data vectors in the index with the
specified integer identifiers. From a feature request made by [d4tum](https://github.com/d4tum)
(<https://github.com/jlmelville/rcpphnsw/issues/18>).

## Bug fixes and minor improvements

* The `progress` parameter in the functional interface no longer does anything. When
* The `progress` parameter in the functional interface no longer does anything. When
`verbose = TRUE`, a progress bar is no longer shown.
* Due to a breaking change in roxygen2 7.0.0, there was a missing package alias in the
documentation.

# RcppHNSW 0.4.1

Expand Down
61 changes: 37 additions & 24 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,52 +1,65 @@
## Release Summary

This is a patch release to fix a valgrind error that was introduced with
the previous submission.
This is a patch release to fix various CRAN check errors.

## Test environments

* ubuntu 20.04 (on github actions), R 4.1.3, R 4.2.1, devel
* local ubuntu 22.04 R 4.2.1
* Windows Server 2022 (on github actions), R 4.1.3
* local Windows 11 build, R 4.2.1
* mac OS X Big Sur (on github actions) R 4.2.1
* ubuntu 22.04 (on github actions), R 4.2.3, R 4.3.1, devel
* local ubuntu 23.04 R 4.2.2
* Debian Linux, R-devel, GCC ASAN/UBSAN (via rhub)
* Debian Linux, R-release, GCC (via rhub)
* Ubuntu Linux 20.04.1 LTS, R-release, GCC (via rhub)
* Fedora Linux, R-devel, clang, gfortran (via rhub)
* Windows Server 2022 (on github actions), R 4.2.3, R 4.3.1
* Windows Server 2022, R-devel, 64 bit (via rhub)
* local Windows 11 build, R 4.3.1
* win-builder (devel)
* mac OS X Monterey (on github actions) R 4.3.1

## R CMD check results

There were no ERRORs or WARNINGs.

There was one NOTE:

checking installed package size ... NOTE
installed size is 5.4Mb
sub-directories of 1Mb or more:
libs 5.1Mb
N checking installed package size ...
installed size is 6.6Mb
sub-directories of 1Mb or more:
libs 6.3Mb

This is expected due to the use of C++ templates in hnswlib.

There was a message about possibly mis-spelled words in DESCRIPTION:

HNSW (2:28)

This is spelled correctly.

## CRAN checks

There are no ERRORs or WARNINGs.

There is a NOTE for all flavors about LazyData. This release fixes that NOTE.
There is a NOTE:

Check: C++ specification
Result: NOTE
Specified C++11: please drop specification unless essential

This submission fixes this.

There is a NOTE:

Check: Rd metadata
Result: NOTE
Invalid package aliases in Rd file ‘RcppHnsw-package.Rd’:
‘RcppHnsw-package’

There are four flavors with NOTEs about installed package size
(r-release-macos-arm64, r-release-macos-x86_64, r-oldrel-macos-arm64,
r-oldrel-macos-x86_64). This is expected and won't be fixed.
This submissions fixes this.

There is a valgrind issue. This releases fixes that issue.
There are four flavors with NOTEs about installed package size (r-release-macos-arm64,
r-release-macos-x86_64, r-oldrel-macos-arm64, r-oldrel-macos-x86_64). This is expected and won't be
fixed.

## Downstream dependencies

We checked 3 reverse dependencies (1 from CRAN + 2 from Bioconductor), comparing
R CMD check results across CRAN and dev versions of this package.
We checked 2 reverse dependencies (0 from CRAN + 2 from Bioconductor), comparing R CMD check
results across CRAN and dev versions of this package.

* We saw 0 new problems
* We failed to check 0 packages
* We saw 0 new problems
* We failed to check 0 packages

0 comments on commit 389be79

Please sign in to comment.