-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patentsview #112
Comments
Editor checks:
Editor commentsThanks for your submission, @crew102! 😁 I'm currently looking for reviewers. In the meantime here is the output of It is good practice to
✖ write unit tests for all functions, and all package code in general. 55% of code
lines are covered by test cases.
R/check-query.R:8:NA
R/check-query.R:9:NA
R/check-query.R:61:NA
R/flatten-pv-data.R:19:NA
R/flatten-pv-data.R:20:NA
... and 139 more lines
✖ avoid long code lines, it is bad for readability. Also, many people prefer editor
windows that are about 80 characters wide. Try make your lines shorter than 80
characters
R\check-query.R:12:1
R\check-query.R:20:1
R\check-query.R:38:1
R\search-pv.R:57:1
R\search-pv.R:71:1
... and 5 more lines
✖ avoid sapply(), it is not type safe. It might return a vector, or a list,
depending on the input data. Consider using vapply() instead.
R\flatten-pv-data.R:27:20
R\flatten-pv-data.R:32:18
R\print.R:14:8
R\query-dsl.R:64:3
R\query-dsl.R:67:3
... and 3 more lines
Using
Reviewers: @expectopatronum @poldham |
Thanks for running those commands for me, @maelle. I have:
The newest version of master is available here: crew102/patentsview@e6fdbc8 |
You're welcome @crew102! And nice work! 😺 |
Thanks @expectopatronum @poldham for accepting to review this package! |
@crew102 @expectopatronum @maelle Just a quick note that I am working on the review at the moment, so @crew102 don't worry that we have forgotten! |
Me too! So far I have one remark: I think that in the method
might be confusing. I'm not sure if for an R novice it is clear that one element is equivalent to a vector of length one. Maybe the error message can changed to something like "endpoint must be a single element and be one of:" I hope comments like this are ok here :) |
Agreed that this is confusing. At several points I felt conflicted between precision and clarity (I actually dropped the phrase "vector of length one" in several places, thinking that it may be confusing...I guess I didn't do that here though). I will update this so it is less confusing. |
@poldham @expectopatronum maybe useless since you've both been working on this but here is a friendly reminder that your reviews are due on Wednesday. 😸 |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 6 Review CommentsI'm very impressed by the code in the package, I think it is very well written and I enjoyed reviewing it! I only have a few remarks/comments.
Possibilities for improvement
|
Thanks a lot for your review, @expectopatronum. I will get working on integrating your feedback soon. |
Many thanks @expectopatronum ! 😸 |
Hi @expectopatronum - In addition to the notes/changes shown below, I have made the following three updates to
The markdown variant of README is actually generated from the rmarkdown variant, it just happens in a sort of atypical way. You can see the README.md target in the Makefile for details (https://github.com/crew102/patentsview/blob/master/Makefile#L8-L11).
I have added a unit test for
I have changed the wording so it is not so confusing
Good idea...I have added
There are some fields that you are not allowed to include in your query but can include in your
Looks like this is not in the documentation anywhere. I will add it. |
@poldham friendly reminder that your review was due last week. Do you think you'll soon have time to post it? Thanks. 😸 @expectopatronum @crew102 nice discussion&work! |
@expectopatronum, I have made the distinction between queryable vs retrievable fields clearer in the README (crew102/patentsview@070a241). |
@crew102 btw regarding the README an issue with it is that users installing the package from CRAN do not get to see it, so it might be good to have all important content in a vignette (as well / instead). |
Dear @crew102, Apologies for the delay... I was unexpectedly asked to do an overseas presentation and that took up a lot of time. Before I get into the reviewing part, let me say that it is fantastic that this package has been prepared for R as it could be an incredible resource. Most of my comments will be on thinking about the user experience for those who are R users and who may or may not know much about patent data. The comments are all intended to be constructive. In a discussion last week with colleagues at WIPO we already plan to start using the package in the analytics trainings around the world this year. But, as you will see below, I have some comments on the user experience. Code CommentsMy main code comment is on the requirement for users to write using One possiblility would be to try to mask the qry_funs$gt as simply If it is not possible to find a way forward on that at the moment could I suggest that the list funs are clearly documented in CMD CheckI receive a warning:
The tests ran OK VignettesI think the package merits some work on the vignettes to make it easier for the first time user. In both vignettes we see a description of moving from JSON to the DSL as the preferred approach. As the users will be coming in from R I think it makes sense to start with the DSL and maybe explain later on that there are other ways of doing this. I think (as an R user and sometime coder) that being confronted with
is a bit off putting in term of engaging with the package and is actually, using the favoured DSL, not necessary. There are of course a ton of things one could do with the package but could I suggest that the vignettes and examples focus on some of the main types of searches that patent researchers will normally do. Those would be:
In addition the locations endpoint is really really exciting (and I suspect folks will jump all over it with Leaflet and so on), but I found it difficult to get going with that. In terms of the vignettes it is of course up to you. On the patent side of things my own efforts to be straightforward are illustrated for lensr and opsr. Those were mainly written as part of the WIPO analytics training course for complete beginners. That may give you some ideas. In addition, rOpenSci packages such as rplos have good walkthroughs. @maelle monkeylearn walkthrough was really helpful for a newbiew as an other example. I think that makes all the difference in helping people to engage with the package. Getting to Grips with the FieldsI have seen the good addition of In the documentation for
Inventors and Assignees.Can I suggest that an explanation is needed somewhere that these are cleaned up data fields. I am not sure if there is a published account of the clean up algorithm yet but I saw the video. Not vital right now but important to know that the data has been cleaned and that the raw is also available for purists. Those are my main comments at the moment. I intend to take another look as I know there have already been some changes but wanted to be sure you received these given the unexpected delay. |
Hi @maelle , the README is actually the "patentsview" vignette (https://github.com/crew102/patentsview/blob/master/vignettes/patentsview.Rmd). The makefile renders the vignette to README.md and also to html (placing the necessary files in inst/doc). I know it's a little wonky how it happens, but the vignette still ends up in the right place (as far as I can tell). I just realized that I had |
Hi @poldham , thanks a lot for taking the time to review the package. I will review your comments and post a reply soon. |
@poldham thx for your review - can you give an estimate of number of hrs spent reviewing? |
Thanks again @poldham for reviewing the package. My responses to your comments are shown below. qry_funsIt sounds like you have two issues with the DSL - first, the fact that the functions are stored in a list and thus can be verbose to use, and second, the lack of documentation on the functions themselves.
R CMD Check
This should be fixed now. Vignettes
I agree that the DSL may be more natural for R users. However, I think the first example of
I have added a new example of searching for patents belonging to a given assignee (crew102/patentsview@8cf0c1dd). I think the REAMDE now has all of the example types that you mentioned, but let me know if you think a few more would be helpful. Getting to Grips with the Fields
Similar to the case of the query functions, I was hoping to just refer the interested reader to the online field tables (e.g., in the README and
I have linked to
The location group does not exist in the patents endpoint, so that's probably why you're not seeing it online (assuming you are just looking at the patents table). It exists for locations, inventors, and assignees endpoints (e.g., http://www.patentsview.org/api/location.html#field_list)
I typically like to use plurals for parameters which the user can specify multiple choices for. For example,
I have added "common name" and "description" to Inventors and Assignees.
Yes, it is very important that people realize that they are not getting raw data by default. I have added additional explanation about this in the README/patentsview vignette (crew102/patentsview@938d3167). |
OK, no problem. I have to do some final checks as well. I'll also be updating cran-comments.md. I'll take care of those items once the review process is officially over. |
@crew102 @maelle. Thanks @crew102. This is all fine with me except that I still think that the opening sentence of the README as raised under point 1 in Vignettes above still needs a sentence or so to better orient the reader that a) they are dealing with a JSON API and b) that there is an easy to use DSL. On the disambiguation point I am assuming that you mean http://www.patentsview.org/workshop/ which is fine with me. I think the reference to the disambiguation method process is important to give users confidence in using the disambiguated data as inventor disambiguation has been an intractable and non-trivial problem for many many years. Looking forward to seeing this join the ropensci collection and to spreading the word and using it. Many congratulations! |
Hi @poldham , I added a link to the disambiguation workshop (crew102/patentsview@9b137e28). In my mind the documentation surrounding the DSL is complete...As per your request, the README references the DSL in the first section. I believe that any commentary beyond that would be calling too much attention to the DSL early on, and introduce too many new concepts at one point (which I tried to avoid by breaking it up into various sections). Users who want more info regarding the DSL are referred to the vignette, which should address your issue that the JSON will scare off novice users. I have added a small note that lets users know that the query language uses JSON (crew102/patentsview@bc0ef6a7). |
@crew102 approved! 😸 I've added you to the ropensci organization, could you transfer the repository? After that you'll have to change all URLs and I'll activate continuous integration via rOpenSci account (if not today, then tomorrow). Besides, please add the rOpenSci footer |
Hi @maelle , great news! Thanks again to you and the two reviewers, @poldham and @expectopatronum. I made small edits to the documentation which I have yet to push. I will push those, add the footer, trasfer the repo, change the urls, and then you can activate CI with your account. After that is finished, can I submit to CRAN? |
Yes! |
|
Also, does ropensci have an AppVeyor account? If so, can we run builds on AppVeyor as well as Travis? |
Oops sorry yes this is a current pandoc bug so you can wait until the pandoc bug is fixed to add itn (will tell you) Yes I'll activate your repo on appveyor as well😊 |
Can you confirm that I have permission to create repos on ropensci? Github thinks I don't have permission. |
@crew102 you mean that when you tried transferring the repo, it didn't work? |
Have you received my invitation to the patentsview team in the rOpenSci organization @crew102 ? |
Hi @maelle , I have transferred the repo and it's asking me "Please select any teams you wish to have access to ropensci/patentsview." I chose the "patentsview" group. I hope this is correct? |
No problem - I thought the process was well done. I'm changing the urls now and will open a pr to integrate those into ropensci/patentsview. I'm also changing the badge urls for travis and appveyor. |
Cool! The repo was activated on Travis and I think on Appveyor for which the badges will be No need for a PR, you can commit and push directly well unless you prefer doing it with a branch. |
We can discuss further in the thread but I'll close this now. Thanks @crew102 for your work, and @expectopatronum & @poldham for your reviews! |
Sounds good..I did a PR but I'll probably just push directly to ropensci/patentsview master in the future. Thanks again for your help. |
I'd like to tweet about the package, are you on Twitter? |
I'm not on twitter but a tweet would be nice! |
Hi @maelle , is it ok if I submit patentsview to CRAN without the ropensci footer in the readme? I'm getting this warning when I build on winbuilder (which I assume will be a showstopper for CRAN admins):
|
@crew102 yes with that Pandoc bug the best is to remove it before CRAN submission and then put it back afterwards. Hopefully that bug will be resolved soon. Would you be interested in writing a blog post about |
Sure, I could do a post. I'd like to submit to CRAN first. I'll let you know when that process is over. Are there any guidelines/suggestions for writing the blog posts? |
Awesome! I'll let @stefaniebutland take over reg. the blog post (she's rOpenSci community manager), she'll be able to tell you more regarding timing & content of the blog post. In any case I look forward to reading it! Good luck with the CRAN submission, don't hesitate to ask any question if needed. |
Thanks a lot! |
Summary
This package provides a wrapper around the patentsview api, which serves USPTO patent data that has been disambiguated. There is one main function to search and download data from the API (
search_pv
) and several other helper functions.URL for the package (the development repository, not a stylized html page):
master branch on github
Who is the target audience?
Researchers who want to interact with patent data. There are several specific use cases of patent data that have been listed in this thread of the rOpenSci discussion board.
Are there other R packages that accomplish the same thing? If so, what is different about yours?
I know of two R packages that one can use to download patent data: lensr and a development version of opsr.
lensr
is the most similar, as it can be used to download USPTO patent data. I do not know of any other packages that wrap patentsview, which is the only service that provides USPTO data that has been disambiguated.Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:R CMD check
passes with one warning regarding this being my first package submission.I believe the package follows the guidelines, but there are a few items I would like to highlight. These may or may not be issues:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
The text was updated successfully, but these errors were encountered: