Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_dupes() grouped data error fix #345

Merged
merged 6 commits into from
Mar 16, 2020

Conversation

jzadra
Copy link
Contributor

@jzadra jzadra commented Mar 12, 2020

Description

If data is grouped:

  1. Saves the grouping structure and then removes it before checking for duplicates.
  2. One warning per session is issued notifying user that get_dupes() isn't checking within groups but rather over the whole data set, however will still return an object with the same grouping structure.
  3. Groups are reapplied before returning dupes.

Added tests that check that

  1. The grouping structure of the original data is the same as the result.
  2. The output for grouped data is the same as the output for ungrouped data other than the grouping structure.

Related Issue

Addresses #329.

Example

x <- mtcars %>% group_by(cyl, gear)
get_dupes(x)

@codecov
Copy link

codecov bot commented Mar 12, 2020

Codecov Report

Merging #345 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #345   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          26     26           
  Lines         996   1002    +6     
=====================================
+ Hits          996   1002    +6
Impacted Files Coverage Δ
R/get_dupes.R 100% <100%> (ø) ⬆️

@jzadra
Copy link
Contributor Author

jzadra commented Mar 12, 2020

I'm not clear on exactly what codecov is if anyone can provide some feedback on what I've done wrong!

@sfirke
Copy link
Owner

sfirke commented Mar 14, 2020

codecov checks which lines of code are run when all the tests are run. Your PR has two "misses", that is there are two lines that are not run in any test: https://codecov.io/gh/sfirke/janitor/pull/345/diff?src=pr&el=tree#diff-Ui9nZXRfZHVwZXMuUg Looks like it's the grouped_df warning, that could be checked with a new test that looks at expect_message.

@sfirke
Copy link
Owner

sfirke commented Mar 14, 2020

Code itself looks great though! 👍 After reviewing it, now I wonder if it's impossible to test that chunk that codecov says is missing coverage, because the message only prints if interactive() and in the test environment interactive() returns false.

I don't see a way to fake an interactive environment to run that warning. The only idea I have is to just not have tests for those two lines and acknowledge that by ending those two lines with # nocov to tell codecov to ignore them.

@jzadra
Copy link
Contributor Author

jzadra commented Mar 14, 2020

Looks like you were correct about the interactive issue. I'll add #nocov to those lines.

…ta because codecov can't fake an interactive session.
@sfirke
Copy link
Owner

sfirke commented Mar 14, 2020

Looks like it didn't run, I'm guessing because interactive() would be FALSE? I think you can just add `# nocov to those two lines in the function itself.

@sfirke
Copy link
Owner

sfirke commented Mar 16, 2020

Excellent! Thanks @jzadra

@sfirke sfirke merged commit c2e2277 into sfirke:master Mar 16, 2020
sfirke added a commit that referenced this pull request Mar 16, 2020
@jzadra jzadra deleted the get_dupes_grouped_data_fix branch March 16, 2020 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants