Skip to content

Releases: sfirke/janitor

janitor 2.2.1

22 Dec 17:58
Compare
Choose a tag to compare

This is a trivial bugfix release whose only purpose is fixing a test that was failing on CRAN due to the way timezones are handled in Debian. In making that fix (PR #584), we made a small - technically breaking - improvement to a function that works with SAS dates. >99.9% of janitor users should be unaffected by this release.

Breaking changes

  • sas_numeric_to_date() now warns for timezones other than "UTC" due to the way that SAS loads timezones, and the default timezone for sas_numeric_to_date() is now "UTC" instead of "" (#583, @billdenney)

janitor 2.2.0

03 Feb 16:19
Compare
Choose a tag to compare

Breaking changes

These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.

  • A new ... argument was added to row_to_names(), preceding the remove_row argument, as part of the new find_header() functionality. If code previously used remove_row as an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other than TRUE or FALSE to remove_row, unexpected results may occur.

  • Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year). excel_numeric_to_date() did not account for this error, and now it does. Dates returned from excel_numeric_to_date() that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will become as.POSIXct(NA). (#423, thanks @billdenney for fixing)

  • A minor breaking change is that the time zone is now always set for excel_numeric_to_date() and convert_date(). The default timezone is Sys.timezone(), previously it was an empty string (""). (#422, thanks @billdenney for fixing)

  • get_dupes() results are now sorted first by descending order of dupe_count, then alphabetically by sorting variables. (#493)

  • There are several minor breaking changes resulting from enhancements to adorn_ns():

    • The addition of the new argument format_func means that previous calls relying on ,,, as shorthand to get to the ... column selection argument may now require an extra comma.
    • adorn_ns() now defaults to displaying numbers of >3 digits with big.mark = ",", as part of the default value of the new format_func argument. E.g., 1234 is now 1,234.
    • adorn_ns() no longer prints leading whitespace when position = "front" - this is not a visible change in the printed result and it would be rare that this affects any code.
  • When the first column of the data.frame input to adorn_totals() is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (#494).

New features

  • row_to_names() now has a new helper function, find_header() to help find the row that contains the names. It can be used by passing row_number="find_header". See the documentation of row_to_names() and find_header() for more examples. (fix #429)

  • remove_empty() has a new argument, cutoff which allows rows or columns to be removed if at least the cutoff fraction of the data are missing. (fix #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing)

  • A new function sas_numeric_to_date() has been added to convert SAS dates, times, and datetimes to R objects (fix #475, thanks to @billdenney for suggesting and implementing)

  • A new function single_value() has been added to ensure that only a single value or missing values are present in a vector (fix #428)

  • A new function get_one_to_one() has been added to find columns that map 1:1 to each other, even if the values within the columns differ (fix #291, @billdenney)

  • adorn_Ns() contains a new format_func argument so that the user can format the Ns to their liking, e.g., changing the big.mark character. (#444)

  • clean_names() can now be called on database connection in a dbplyr code pipeline (#467)

Minor features

  • make_clean_names() (and therefore clean_names()) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by a replace argument value. (#448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert "[mu]g" to "mg" when it would be more typically be converted to "ug" for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements.

  • excel_numeric_to_date() now warns when times are converted to NA due to hours that do not exist because of daylight savings time (fix #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (#423).

  • If a tabyl() or similar data.frame is sorted (e.g., with dplyr::arrange()), then has adorn_totals() and/or adorn_percentages() called on it, followed by adorn_ns(), the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix #407)

  • clean_names() now supports all object types that have either names or dimnames (#481, @DanChaltiel).

  • adorn_pct_formatting() uses the locale-dependent value of decimal.mark as a decimal separator, e.g., in locales where getOption("OutDec") is , it will print percentages in the format "12,34%". This character can also be set manually with options(OutDec = ",").(#451).

  • adorn_totals(where ="row") now preserves factor class and levels of the first column of the input data.frame (#494).

  • make_clean_names() now allows for duplicate names to be returned by specifying TRUE to the new allow_dupes argument (#495, @JasonAizkalns).

  • Some warning messages now have classes so that they can be specifically suppressed with suppressWarnings(..., class="the_class_to_suppress"). To find the class of a warning you typically must look at the code where the error is occurring. (#452, thanks to @mgacc0 for suggesting and @billdenney for fixing)

Bug fixes

  • adorn_percentages() was refactored for compatibility with dplyr package versions >= 1.1.0 (#490)

  • When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a tabyl, the resulting columns or list are now sorted in numeric order, not alphabetic. (#438, thanks @daaronr for reporting and @mattroumaya for fixing)

  • tabyl() now succeeds when the second variable is named "n" (#445).

  • adorn_ns() can act on a single-column data.frame input with custom Ns supplied if the variable to adorn is specified with ... (#456).

  • adorn_totals() on a one_way tabyl preserves the tabyl_type attribute so that a subsequent call to adorn_pct_formatting() works correctly on one-way tabyls (#523).

janitor 2.1.0

07 Jan 14:57
Compare
Choose a tag to compare

Miscellaneous bug fixes and minor improvements that accumulated since version 2.1.0 in April. Most notably:

  • A bug was fixed in round_half_up() where some specific numbers would not round up due to floating point precision problems (#396
  • Improvements were made to adorn_totals()

janitor 2.0.1

12 Apr 14:33
Compare
Choose a tag to compare

A patch to janitor v2.0.0 that fixes a bug where make_clean_names() and thus clean_names() failed on some machines depending on the user's installation of the stringi package (#365).

janitor 2.0.0

09 Apr 01:25
Compare
Choose a tag to compare

janitor 2.0.0 has many bug fixes and improvements, see the NEWS.md file for changes.

This is version 2.0.0 primarily because there are minor breaking changes, including to the flagship function clean_names. There is also a lot of change & new content, so the 2.0.0 makes sense for magnitude of change as well as level of breakage.

janitor v1.2.0

21 Apr 04:21
Compare
Choose a tag to compare

CRAN required resubmission to address a trivial issue in a test caused by changes to base::sample(). While I was at it, I incorporated new functionality & bug fixes accumulated over the last 9 months since 1.1.1 went to CRAN.

Notable additions include:

  • compare_df_cols()
  • make_clean_names()
  • round_to_fraction()
  • chisq.test() and fisher.test() now run on a 2-way tabyl

Plus other minor additions, tweaks, and bug fixes. See NEWS.md for more.

Bill Denney made this release possible and wrote most of the good new code. Thanks Bill!

janitor v1.1.1

31 Jul 02:42
Compare
Choose a tag to compare

Patches a bug introduced in version 1.1.0 where excel_numeric_to_date() would fail if given an input vector containing an NA value.

janitor v1.1.0

29 Jul 12:07
Compare
Choose a tag to compare

This release was requested by CRAN to address some minor package dependency issues. It also contains an update to excel_numeric_to_date() to support capturing fractional days as time and a new function row_to_names().

janitor v1.0.0

22 Mar 13:07
Compare
Choose a tag to compare

This version 1.0 release of janitor marks the package as stable. Lots of changes since the last substantial release, v.0.3.0 in May 2017. See the NEWS page for a summary of changes. The official janitor documentation site is http://sfirke.github.io/janitor/.

v0.3.0

06 May 06:30
Compare
Choose a tag to compare

janitor 0.3.0 (Release date: 2017-05-06)

Release summary

The primary purpose of this release is to maintain accuracy given the changes to the dplyr package, upon which janitor is built, in dplyr version 0.6.0. This update also contains a number of minor improvements.

Critical: if you update the package dplyr to version 0.6.0, you must update janitor to version 0.3.0 to ensure accurate results from janitor's tabyl() function. This is due to a change in the behavior of dplyr's _join functions (discussed in #111).

janitor 0.3.0 is compatible with this new version of dplyr as well as old versions of dplyr back to 0.5.0. That is, updating janitor to 0.3.0 does not necessitate an update to dplyr 0.6.0.

Breaking changes

  • The functions add_totals_row and add_totals_col were combined into a single function, adorn_totals(). (#57). The add_totals_ functions are now deprecated and should not be used.
  • The first argument of adorn_crosstab() is now "dat" instead of "crosstab" (indicating that the function can be called on any data.frame, not just a result of crosstab())

Features

Major

  • Exported the %>% pipe from magrittr (#107).

Deprecated the following functions:

  • use_first_valid_of() - use dplyr::coalesce() instead
  • convert_to_NA() - use dplyr::na_if() instead
  • add_totals_row() and add_totals_col() - replaced by the single function adorn_totals()

Minor

  • adorn_totals() and ns_to_percents() can now be called on data.frames that have non-numeric columns beyond the first one (those columns will be ignored) (#57)
  • adorn_totals("col") retains factor class in 1st column if 1st column in the input data.frame was a factor

Bug fixes

  • clean_names() now handles leading spaces (#85)
  • adorn_crosstab() and ns_to_percents() work on a 2-column data.frame (#89)
  • adorn_totals() now works on a grouped tibble (#97)
  • Long variable names with spaces no longer break tabyl() and crosstab() (#87)
  • An NA_ column in the result of a crosstab() will appear at the last column position (#109)