Releases: sfirke/janitor
janitor 2.2.1
This is a trivial bugfix release whose only purpose is fixing a test that was failing on CRAN due to the way timezones are handled in Debian. In making that fix (PR #584), we made a small - technically breaking - improvement to a function that works with SAS dates. >99.9% of janitor users should be unaffected by this release.
Breaking changes
sas_numeric_to_date()
now warns for timezones other than "UTC" due to the way that SAS loads timezones, and the default timezone forsas_numeric_to_date()
is now "UTC" instead of "" (#583, @billdenney)
janitor 2.2.0
Breaking changes
These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.
-
A new
...
argument was added torow_to_names()
, preceding theremove_row
argument, as part of the newfind_header()
functionality. If code previously usedremove_row
as an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other thanTRUE
orFALSE
toremove_row
, unexpected results may occur. -
Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year).
excel_numeric_to_date()
did not account for this error, and now it does. Dates returned fromexcel_numeric_to_date()
that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will becomeas.POSIXct(NA)
. (#423, thanks @billdenney for fixing) -
A minor breaking change is that the time zone is now always set for
excel_numeric_to_date()
andconvert_date()
. The default timezone isSys.timezone()
, previously it was an empty string (""
). (#422, thanks @billdenney for fixing) -
get_dupes()
results are now sorted first by descending order ofdupe_count
, then alphabetically by sorting variables. (#493) -
There are several minor breaking changes resulting from enhancements to
adorn_ns()
:- The addition of the new argument
format_func
means that previous calls relying on,,,
as shorthand to get to the...
column selection argument may now require an extra comma. adorn_ns()
now defaults to displaying numbers of >3 digits withbig.mark = ","
, as part of the default value of the newformat_func
argument. E.g.,1234
is now1,234
.adorn_ns()
no longer prints leading whitespace whenposition = "front"
- this is not a visible change in the printed result and it would be rare that this affects any code.
- The addition of the new argument
-
When the first column of the data.frame input to
adorn_totals()
is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (#494).
New features
-
row_to_names()
now has a new helper function,find_header()
to help find the row that contains the names. It can be used by passingrow_number="find_header"
. See the documentation ofrow_to_names()
andfind_header()
for more examples. (fix #429) -
remove_empty()
has a new argument,cutoff
which allows rows or columns to be removed if at least thecutoff
fraction of the data are missing. (fix #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing) -
A new function
sas_numeric_to_date()
has been added to convert SAS dates, times, and datetimes to R objects (fix #475, thanks to @billdenney for suggesting and implementing) -
A new function
single_value()
has been added to ensure that only a single value or missing values are present in a vector (fix #428) -
A new function
get_one_to_one()
has been added to find columns that map 1:1 to each other, even if the values within the columns differ (fix #291, @billdenney) -
adorn_Ns()
contains a newformat_func
argument so that the user can format the Ns to their liking, e.g., changing thebig.mark
character. (#444) -
clean_names()
can now be called on database connection in a dbplyr code pipeline (#467)
Minor features
-
make_clean_names()
(and thereforeclean_names()
) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by areplace
argument value. (#448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert"[mu]g"
to"mg"
when it would be more typically be converted to"ug"
for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements. -
excel_numeric_to_date()
now warns when times are converted toNA
due to hours that do not exist because of daylight savings time (fix #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (#423). -
If a
tabyl()
or similar data.frame is sorted (e.g., withdplyr::arrange()
), then hasadorn_totals()
and/oradorn_percentages()
called on it, followed byadorn_ns()
, the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix #407) -
clean_names()
now supports all object types that have either names or dimnames (#481, @DanChaltiel). -
adorn_pct_formatting()
uses the locale-dependent value ofdecimal.mark
as a decimal separator, e.g., in locales wheregetOption("OutDec")
is,
it will print percentages in the format"12,34%"
. This character can also be set manually withoptions(OutDec = ",")
.(#451). -
adorn_totals(where ="row")
now preserves factor class and levels of the first column of the input data.frame (#494). -
make_clean_names()
now allows for duplicate names to be returned by specifyingTRUE
to the newallow_dupes
argument (#495, @JasonAizkalns). -
Some warning messages now have classes so that they can be specifically suppressed with
suppressWarnings(..., class="the_class_to_suppress")
. To find the class of a warning you typically must look at the code where the error is occurring. (#452, thanks to @mgacc0 for suggesting and @billdenney for fixing)
Bug fixes
-
adorn_percentages()
was refactored for compatibility withdplyr
package versions >= 1.1.0 (#490) -
When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a
tabyl
, the resulting columns or list are now sorted in numeric order, not alphabetic. (#438, thanks @daaronr for reporting and @mattroumaya for fixing) -
tabyl()
now succeeds when the second variable is named"n"
(#445). -
adorn_ns()
can act on a single-column data.frame input with custom Ns supplied if the variable to adorn is specified with...
(#456). -
adorn_totals()
on a one_way tabyl preserves thetabyl_type
attribute so that a subsequent call toadorn_pct_formatting()
works correctly on one-way tabyls (#523).
janitor 2.1.0
Miscellaneous bug fixes and minor improvements that accumulated since version 2.1.0 in April. Most notably:
- A bug was fixed in
round_half_up()
where some specific numbers would not round up due to floating point precision problems (#396 - Improvements were made to
adorn_totals()
janitor 2.0.1
A patch to janitor v2.0.0 that fixes a bug where make_clean_names()
and thus clean_names()
failed on some machines depending on the user's installation of the stringi
package (#365).
janitor 2.0.0
janitor 2.0.0 has many bug fixes and improvements, see the NEWS.md file for changes.
This is version 2.0.0 primarily because there are minor breaking changes, including to the flagship function clean_names
. There is also a lot of change & new content, so the 2.0.0 makes sense for magnitude of change as well as level of breakage.
janitor v1.2.0
CRAN required resubmission to address a trivial issue in a test caused by changes to base::sample()
. While I was at it, I incorporated new functionality & bug fixes accumulated over the last 9 months since 1.1.1 went to CRAN.
Notable additions include:
compare_df_cols()
make_clean_names()
round_to_fraction()
chisq.test()
andfisher.test()
now run on a 2-way tabyl
Plus other minor additions, tweaks, and bug fixes. See NEWS.md for more.
Bill Denney made this release possible and wrote most of the good new code. Thanks Bill!
janitor v1.1.1
Patches a bug introduced in version 1.1.0 where excel_numeric_to_date()
would fail if given an input vector containing an NA
value.
janitor v1.1.0
This release was requested by CRAN to address some minor package dependency issues. It also contains an update to excel_numeric_to_date()
to support capturing fractional days as time and a new function row_to_names()
.
janitor v1.0.0
This version 1.0 release of janitor marks the package as stable. Lots of changes since the last substantial release, v.0.3.0 in May 2017. See the NEWS page for a summary of changes. The official janitor documentation site is http://sfirke.github.io/janitor/.
v0.3.0
janitor 0.3.0 (Release date: 2017-05-06)
Release summary
The primary purpose of this release is to maintain accuracy given the changes to the dplyr package, upon which janitor is built, in dplyr version 0.6.0. This update also contains a number of minor improvements.
Critical: if you update the package dplyr
to version 0.6.0, you must update janitor to version 0.3.0 to ensure accurate results from janitor's tabyl()
function. This is due to a change in the behavior of dplyr's _join
functions (discussed in #111).
janitor 0.3.0 is compatible with this new version of dplyr as well as old versions of dplyr back to 0.5.0. That is, updating janitor to 0.3.0 does not necessitate an update to dplyr 0.6.0.
Breaking changes
- The functions
add_totals_row
andadd_totals_col
were combined into a single function,adorn_totals()
. (#57). Theadd_totals_
functions are now deprecated and should not be used. - The first argument of
adorn_crosstab()
is now "dat" instead of "crosstab" (indicating that the function can be called on any data.frame, not just a result ofcrosstab()
)
Features
Major
- Exported the
%>%
pipe from magrittr (#107).
Deprecated the following functions:
use_first_valid_of()
- usedplyr::coalesce()
insteadconvert_to_NA()
- usedplyr::na_if()
insteadadd_totals_row()
andadd_totals_col()
- replaced by the single functionadorn_totals()
Minor
adorn_totals()
andns_to_percents()
can now be called on data.frames that have non-numeric columns beyond the first one (those columns will be ignored) (#57)adorn_totals("col")
retains factor class in 1st column if 1st column in the input data.frame was a factor
Bug fixes
clean_names()
now handles leading spaces (#85)adorn_crosstab()
andns_to_percents()
work on a 2-column data.frame (#89)adorn_totals()
now works on a grouped tibble (#97)- Long variable names with spaces no longer break
tabyl()
andcrosstab()
(#87) - An
NA_
column in the result of acrosstab()
will appear at the last column position (#109)