Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebi - an R-Client for Europe PMC #29

Closed
9 of 10 tasks
njahn82 opened this issue Mar 11, 2016 · 18 comments
Closed
9 of 10 tasks

rebi - an R-Client for Europe PMC #29

njahn82 opened this issue Mar 11, 2016 · 18 comments

Comments

@njahn82
Copy link
Member

njahn82 commented Mar 11, 2016

    1. What does this package do? (explain in 50 words or less)

rebi gives access to Europe PubMed Central, an indexing service for life-
science publications that is provided by the European Bioinformatics Institute
(EBI). This client can be used to search metadata and full-texts, retrieve
reference sections, citations, text-mined terms, and links to other EBI
databases or external sources like the Wikipedia.

    1. Paste the full DESCRIPTION file inside a code block below.
Package: rebi
Version: 0.0.999
Maintainer: 'Najko Jahn' <[email protected]>
Author: Najko Jahn
License: GPL-3
Title: rebi -- R Interface for Europe PMC RESTful Web Service
URL: http://github.com/njahn82/rebi/
BugReports: http://github.com/njahn82/rebi/issues
Description: R Interface to Europe PMC RESTful Web Service. Europe PMC covers
    life science literature and it gives access to open access full texts.
    Coverage is not only restricted to Europe, but articles and abstracts are
    indexed from all over the world. As a partner in the PMC International
    (PMCi), Europe PMC ingests all PubMed content and extends its index with
    other sources, including Agricola, a bibliographic database of citations to
    the agricultural literature, or Biological Patents. In addition to
    bibliographic data, rebi gives automated access to citations and references
    that were indexed by Europe PMC. Links between life-science literature and
    other EBI databases, including ENA, PDB or ChEMBL are also accessible.
    External links provide a mechanism to link out to external providers such as
    Wikipedia or research data repositories.
LazyLoad: yes
LazyData: yes
VignetteBuilder: knitr
Depends:
  R (>= 3.00)
Imports:
    httr,
    jsonlite,
    plyr,
    xml2
RoxygenNote: 5.0.1
Suggests:
    testthat,
    knitr,
    rmarkdown

    1. URL for the package (the development repository, not a stylized html page)

https://github.com/njahn82/rebi

    1. What data source(s) does it work with (if applicable)?

It only works with Europe PMC's Articles RESTful API.
https://europepmc.org/RestfulWebService

    1. Who is the target audience?

Life-science researchers and students, scholars that study the life sciences,
librarians.

    1. Are there other R packages that accomplish the same thing? If so, what is different about yours?

To my knowledge, this is the only package that implements Europe PMC's
Articles RESTful API. Since there is a considerable overlap with PubMed/PubMed
Central, the rentrez package could be used as well to fetch bibliographic
information. fulltext package gives access to supplementary material
deposited in Europe PMC. oai package could be used to retrieve metadata from
Europe PMC via its OAI-PMH interface.

    1. Check the box next to each policy below, confirming that you agree. These are mandatory.
    • This package does not violate the Terms of Service of any service it interacts with.
    • The repository has continuous integration with Travis CI and/or another service
    • The package contains a vignette
    • The package contains a reasonably complete README with devtools install instructions
    • The package contains unit tests
    • The package only exports functions to the NAMESPACE that are intended for end users
    1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
    • Are there any package dependencies not on CRAN?
    • Do you intend for this package to go on CRAN?
    • Does the package have a CRAN accepted license?
    • Did devtools::check() produce any errors or warnings? If so paste them below.
    1. Please add explanations below for any exceptions to the above:
    1. If this is a resubmission following rejection, please explain the change in circumstances.

This package has been available through rOpenSci since summer 2013. Because I
made major upgrades in the last days that reflect the rOpenSci guidance on how
to write packages, I thought it might be helpful to re-submit this package.

See also the discussion in the forum:
https://discuss.ropensci.org/t/major-package-update-of-rebi-and-question-regarding-re-submission/333/

@noamross
Copy link
Contributor

Reviewers: @toph-allen

@sckott
Copy link
Contributor

sckott commented Mar 27, 2016

@toph-allen - hey there, it's been 16 days, please get your review in by Apr 01, thanks 😺 (ropensci-bot)

@toph-allen
Copy link

Here's my review. N.B. According to the submission above, the package has been available through rOpenSci since 2013, but has recently made major upgrades that reflect rOpenSci guidance on package authorship, so it probably is already sufficient. Let me know if you have any questions, or any areas where you'd like me to clarify more.

Installation

  • No issues installing on OS X.

General comments on code

  • All sample code ran correctly.
  • The code generally seems concise and sensibly divided into different functions. As far as I can tell, the package addresses the problem space pretty comprehensively, and the code is clear to understand and pretty self-explanatory. Nice work. 👍

Comments on specific functions

  • epmc_ftxt() includes a data_src = "pmc" argument, but the function is hard-coded to always use PMC as a source--this argument is only used to throw an error if anything other than PMC is used. Perhaps better to remove this argument and clarify in documentation, as using args(epmc_ftxt) might sugget to a user that they can use another source.

Improvements to code style?

  • There are a few local package variables in use: supported_dbs and supported_semantic_types are used in one function each, and are defined in that function's file; supported_data_src is used in many functions, but defined in epmc_details.r. I'm not sure if there's an accepted best practice for this, but it seems it'd make more sense to either put these inside functions or put them in their own file, say supported.r.
  • While code itself is self-explanatory enough that I don't think this is a major issue, sometimes comments are a little sparse or terse (i.e. just a word or two).

Comments on documentation?

  • The top-level package help file is rebi-package.r. Seems like it would be better to have this as rebi.r and change @name rebi-package to @name rebi, so that ?rebi brings up that document.
  • There are one or two places where documentation could use a touch of clarifying.
    • E.g. epmc_citations.r could be clearer as to whether it fetches works cited by a publication vs. works citing a publication.
    • epmc_refs.r has the same problem, especially because it refers to "citations" in its documentation. This could be simply a reflection of the language in the underlying API's documentation, but I think it could use a little clarification.

Do tests pass locally?

  • Yes

Documentation and examples run without issue

  • Yes

ROpenSci Packaging Guidelines

  • Package naming
    • It's a fine name, though I don't know its provenance.
  • Function/variable naming
    • Snake case, self-explanatory. All functions are prefixed with "epmc_", which seems unnecessary? It seems, however, that this at least in part to avoid naming conflicts with defunct functions. The new naming scheme, while a bit redundant, seems more consistent than the old one.
  • README
    • README.md is clear and very comprehensive, explaining what the package can do and how to do it.
  • Code of conduct
    • Yep, included
  • Documentation
    • Uses roxygen2
  • Package dependencies
  • Testing
    • Uses testthat
  • Continuous integration
    • Yep, TravisCI
  • Console messages
    • No instances of cat() or print() in the package
  • Recommended software scaffolding
    • Uses all recommended packages where relevant

(I spent about 1.5–2 hours on the review).

@noamross
Copy link
Contributor

Thanks for the review @toph-allen!

@njahn82 Let us know when you've incorporated / responded to @toph-allen's comments and we can merge the updated version of rebi into the rOpenSci repo.

@njahn82
Copy link
Member Author

njahn82 commented Mar 29, 2016

Thank you for the review. I will incorporate the suggested improvements in the next days.

@njahn82
Copy link
Member Author

njahn82 commented May 3, 2016

Hi @toph-allen, sorry for coming back to you after such a long time. Thank you again for your helpful comments and suggestions to improve my package. I have a new version ready that include your remarks. I'll go over them in the following.

Please let me know if you have any further comments.

epmc_ftxt() includes a data_src = "pmc" argument, but the function is hard-coded to always use PMC as a source--this argument is only used to throw an error if anything other than PMC is used. Perhaps better to remove this argument and clarify in documentation, as using args(epmc_ftxt) might sugget to a user that they can use another source.

As suggested, I removed the data_src = "pmc" argument from the function epmc_ftxt()

ropensci/europepmc#2

There are a few local package variables in use: supported_dbs and supported_semantic_types are used in one function each, and are defined in that function's file; supported_data_src is used in many functions, but defined in epmc_details.r. I'm not sure if there's an accepted best practice for this, but it seems it'd make more sense to either put these inside functions or put them in their own file, say supported.r.

Agreed. I moved functions that are commonly used to utils.r. Thanks to your suggestion, I also started to dry out my code by

  • defining the uri and the REST API path in utils.r only,
  • use of roxygen's @inheritParams
  • Thanks to @stubben , the default limit is now 25. Batch_size is by default 1000 and is dropped as function parameter. The function make_queries in utils.r implements paging and is used by epmc_search(), make_path provides a more generic way to generate paths for epmc_ref(), epmc_citations(), epmc_db(), epmc_tm()

While code itself is self-explanatory enough that I don't think this is a major issue, sometimes comments are a little sparse or terse (i.e. just a word or two).

Agreed. Added more comprehensive explanations in some places.

The top-level package help file is rebi-package.r. Seems like it would be better to have this as rebi.r and change @name rebi-package to @name rebi, so that ?rebi brings up that document.

I moved rebi-package.r to rebi.r

There are one or two places where documentation could use a touch of clarifying. E.g. epmc_citations.r could be clearer as to whether it fetches works cited by a publication vs. works citing a publication.

As suggested, I clarified both functions. In the documentation of epmc_refs() it says now:

#'  Get references for a given publication
#'
#'  This function retrieves all the works listed in the bibliography of a given
#'  article.

epmc_citations()

#' Get citations for a given publication
#'
#' The function finds works that cite a given publication.

Package naming: It's a fine name, though I don't know its provenance.

It comes from the acronym European Bioinformatics Institute (EBI), which hosts Europe PMC. Do you think europepmc as package name would be more appropriate since the EBI offers many other services that are not covered by this package?

Function/variable naming All functions are prefixed with "epmc_", which seems unnecessary?

I followed the example of other rOpenSci packages like rcrossref or aRxiv. Is there any best practise available if and when to prefix functions?

Again, many thanks for your helpful comments and suggestions.

@noamross
Copy link
Contributor

Quick editor's comment about function naming: We recommend prefixing (e.g, epmc_) to prevent conflicts and ease autocompletion when function names are likely to conflict with other packages. In this case, this is certainly the case, as search(), ref(), etc., will conflict with other packages and even the base R namespace.

@noamross
Copy link
Contributor

@toph-allen - Please let us know if you have responses to the changes, thanks!

@toph-allen
Copy link

toph-allen commented May 25, 2016

I'm so sorry I took even longer to respond to your changes—I had been budgeting more time to go through it than it actually took.

Reading through your responses to my suggestions, and looking through the accompanying changes to the code, I don't think I have any further issues.

Regarding the naming of your packages:

Do you think europepmc as package name would be more appropriate since the EBI offers many other services that are not covered by this package?

I think that renaming the package europepmc or epmc would make its name more obviously connected with its main functions, and with the function prefixes. Those would be concrete benefits over the current name. It might be a bit of a hassle (or it might not be more than a "find and replace" and some touch-up)—and, you might just be fond of the current name. I'll leave that decision up to you.

All seems in order to me!

@njahn82
Copy link
Member Author

njahn82 commented May 25, 2016

Great, I would prefer to rename the package to europepmc as I have not initially submitted it to CRAN yet, but plan to do this within the next days.

Many thanks for reviewing this package!

@noamross
Copy link
Contributor

Thanks, @toph-allen for reviewing and @njahn82 for keeping up this package!

@njahn82 I am getting one local error for a test that is skipped on Travis:

1. Failure (at test_epmc_citations.r#22): epmc_citations returns ---------------
epmc_citations("13814508") does not match 'No citing documents found'. Actual value: "Error in rebi_GET(path = path) : Not Found (HTTP 404).\n"

Once that's fixed, and you've made all the internal changes for changing the name, I think we're good to go. You should be able to apply your changes to the package in the ropensci repo via a pull request. I'm not sure about whether you have the ability to change that repo's name. @sckott: heads up, we may need to see that a name change populates through our systems properly.

@njahn82
Copy link
Member Author

njahn82 commented May 25, 2016

Thanks for letting me know. I realized that the Europe PMC API has started to give a 404 when no citations were found, which is not a nice behavior.

Compare
http://www.ebi.ac.uk/europepmc/webservices/rest/MED/13814508/citations
with
http://www.ebi.ac.uk/europepmc/webservices/rest/MED/9843981/citations

I will fix the error messages and corresponding tests.

@sckott
Copy link
Contributor

sckott commented May 26, 2016

we may need to see that a name change populates through our systems properly.

okay, not sure how to handle this when there's a fork and original version - I'll look into this

@njahn82
Copy link
Member Author

njahn82 commented May 30, 2016

I prepared the package for name change.

@sckott To get rid of the fork, I would prefer to transfer repository ownership to ropensci first. Then we could do the repository name change.

What do you think?

@sckott
Copy link
Contributor

sckott commented May 30, 2016

Okay, sounds good!

@sckott
Copy link
Contributor

sckott commented May 31, 2016

name changed, https://github.com/ropensci/europepmc

@njahn82
Copy link
Member Author

njahn82 commented May 31, 2016

Great, all references to the former package name should now be changed to europepmc.

Many thanks for all your comments and suggested improvements, I really appreciate your help.

@ropensci-review-bot
Copy link
Collaborator

Checks for riem (v0.3.0.9000)

git hash: a4501ec8

  • ✔️ Package is already on CRAN.
  • ✖️ does not have a 'CITATION' file.
  • ✔️ has a 'codemeta.json' file.
  • ✖️ does not have a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 100%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 2)


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 4 files) and
  • 1 authors
  • 2 vignettes
  • no internal data file
  • 7 imported packages
  • 3 exported functions (median 15 lines of code)
  • 9 non-exported functions in R (median 9 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 4 28.3
files_vignettes 2 85.7
files_tests 12 92.5
loc_R 107 12.0
loc_vignettes 120 31.3
loc_tests 1984 94.1
num_vignettes 2 89.2
n_fns_r 12 16.1
n_fns_r_exported 3 12.9
n_fns_r_not_exported 9 20.1
n_fns_per_file_r 2 25.6
num_params_per_fn 1 1.6 TRUE
loc_per_fn_r 12 35.4
loc_per_fn_r_exp 15 35.6
loc_per_fn_r_not_exp 9 27.1
rel_whitespace_R 26 22.6
rel_whitespace_vignettes 66 55.5
rel_whitespace_tests 1 28.1
doclines_per_fn_exp 21 15.7
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 7 27.1

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

R-CMD-check

GitHub Workflow Results

name conclusion sha date
R-CMD-check success a4501e 2022-02-08

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking package dependencies ... NOTE
    Package suggested but not available for checking: ‘forecast’

Test coverage with covr

Package coverage: 100

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 17 potential issues:

message number of times
Lines should not be more than 80 characters. 17


Package Versions

package version
pkgstats 0.0.3.88
pkgcheck 0.0.2.237


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants