Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump rust-polars to 0.43.1 #1230

Merged
merged 34 commits into from
Sep 21, 2024
Merged

Bump rust-polars to 0.43.1 #1230

merged 34 commits into from
Sep 21, 2024

Conversation

etiennebacher
Copy link
Collaborator

@etiennebacher etiennebacher commented Sep 11, 2024

Not going to address that very soon, just to ensure we do the bump on this branch

Changelog for 0.43.0: https://github.com/pola-rs/polars/releases/tag/rs-0.43.0
Changelog for 0.43.1: https://github.com/pola-rs/polars/releases/tag/rs-0.43.1

@eitsupi
Copy link
Collaborator

eitsupi commented Sep 12, 2024

There seem to be quite a few critical bugs in this version, so perhaps the next release will be made soon.
You may want to wait for that before working on it.

@etiennebacher etiennebacher changed the title Bump rust-polars to 0.43.0 Bump rust-polars to 0.43.1 Sep 12, 2024
@eitsupi
Copy link
Collaborator

eitsupi commented Sep 13, 2024

This update adds a broadcast rule so that Series of length 1 are no longer automatically broadcast. (#835 (comment))
Probably requires significant internal modifications.

@etiennebacher
Copy link
Collaborator Author

etiennebacher commented Sep 14, 2024

@eitsupi there are still a few errors in Rust in arrow_interop/to_rust.rs and series.rs. They are all due to failures in Series::try_from(). Could you take a look? I think those are due to https://github.com/pola-rs/polars/pull/18564/files.

Also, it would be great if you could double check my changes in Rust as I'm almost sure some things are not idiomatic. After that I can take care of the R side.

Probably requires significant internal modifications.

I guess so, let's see how it is when the remaining Rust errors are fixed.

@eitsupi
Copy link
Collaborator

eitsupi commented Sep 14, 2024

They are all due to failures in Series::try_from(). Could you take a look?

I fixed it so that it would compile successfully. All the fixes seem to be due to the introduction of PlSmallStr.

Also, it would be great if you could double check my changes in Rust as I'm almost sure some things are not idiomatic.

I think you did an excellent job. There were no areas that I felt particularly uncomfortable with.

Personally, I would prefer to use the %||% operator on the R side rather than the NULL exception handling on the Rust side, but since the original code is that way, I don't think it is a big problem (no such point in the next branch)

@eitsupi
Copy link
Collaborator

eitsupi commented Sep 14, 2024

Hmm, it seems that polars-parquet fails to compile for the arm64 targets (macOS/Linux) when the nightly feature is enabled.
I'm assuming it's an upstream issue, but I'm not sure what the cause is here, since the Python wheel (for arm64 macOS) build should have succeeded.

@eitsupi
Copy link
Collaborator

eitsupi commented Sep 14, 2024

I guess so, let's see how it is when the remaining Rust errors are fixed.

Does this seem to be the only one giving a broadcast error in the test log?

df = pl$DataFrame(
start = as.Date(c("2022-01-01", "2022-01-02", NA)),
end = as.Date("2022-01-03")
)

In fact, this is not a big deal, since Python's DataFrame constructor also fails unless all the data lengths match; the next branch even documents that it fails.

#' @param ... <[`dynamic-dots`][rlang::dyn-dots]>
#' Name-value pairs of objects to be converted to polars [Series]
#' by the [as_polars_series()] function.
#' Each [Series] will be used as a column of the [DataFrame].
#' All values must be the same length.

@etiennebacher
Copy link
Collaborator Author

Thanks for the help!

I'll explore the test failures to see what comes from this broadcasting change. I don't have a linux or macos machine to test those compilation failures now but let's see if it comes up in the issue tracker upstream.

@etiennebacher
Copy link
Collaborator Author

etiennebacher commented Sep 14, 2024

Pleasantly surprised by the few number of fixes to do, there are just the two failures you mentioned about broadcasting. Those do not always appear, for example this works:

pl$DataFrame(
    start = c("2022-01-01", "2022-01-02"),
    end = c("2022-01-03")
)

while this fails:

pl$DataFrame(
    start = c("2022-01-01", "2022-01-02"),
    end = as.Date(c("2022-01-03"))
)

I'll try to take a look in the next few days, unless you want to take care of that first.

@eitsupi
Copy link
Collaborator

eitsupi commented Sep 15, 2024

I'll try to take a look in the next few days

Perhaps it might be here, and in the match arm that calls to_series_then_lit(), branch off into length 1 and other cases, and call .first() in the case of length 1?

pub fn lit(robj: Robj) -> RResult<RPolarsExpr> {
let rtype = robj.rtype();
let rlen = robj.len();
fn to_series_then_lit(robj: Robj) -> RResult<pl::Expr> {
RPolarsSeries::any_robj_to_pl_series_result(robj)
.map_err(polars_to_rpolars_err)
.map(dsl::lit)
}
match (rtype, rlen) {
(Rtype::Null, _) => Ok(dsl::lit(pl::NULL)),
(Rtype::Raw, _) => Ok(dsl::lit(robj_to_binary_vec(robj)?)), // Raw in R is seen as a vector of bytes, in polars it is a Literal, not wrapped in a Series.
(_, rlen) if rlen != 1 => to_series_then_lit(robj),
(Rtype::List, _) => to_series_then_lit(robj),
(_, _) if robj_inherits(&robj, ["POSIXct", "PTime", "Date"]) => {
to_series_then_lit(robj)
}
(Rtype::Integers, 1) => {
let opt_val = robj.as_integer();
if let Some(val) = opt_val {
Ok(dsl::lit(val))
} else if robj.is_na() {
Ok(dsl::lit(pl::NULL).cast(pl::DataType::Int32))
} else {
unreachable!("internal error: could unexpectedly not handle this R value");
}
}
(Rtype::Doubles, 1) if robj.inherits("integer64") => {
let opt_val = robj.as_real();
if let Some(val) = opt_val {
let x = val.to_bits() as i64;
if x == crate::utils::BIT64_NA_ENCODING {
Ok(dsl::lit(pl::NULL).cast(pl::DataType::Int64))
} else {
Ok(dsl::lit(x))
}
} else {
unreachable!("internal error: could unexpectedly not handle this R value");
}
}
(Rtype::Doubles, 1) => {
let opt_val = robj.as_real();
if let Some(val) = opt_val {
Ok(dsl::lit(val))
} else if robj.is_na() {
Ok(dsl::lit(pl::NULL).cast(pl::DataType::Float64))
} else {
unreachable!("internal error: could unexpectedly not handle this R value");
}
}
(Rtype::Strings, 1) => {
if robj.is_na() {
Ok(dsl::lit(pl::NULL).cast(pl::DataType::String))
} else {
Ok(dsl::lit(robj.as_str().unwrap()))
}
}
(Rtype::Logicals, 1) => {
let opt_val = robj.as_bool();
if let Some(val) = opt_val {
Ok(dsl::lit(val))
} else if robj.is_na() {
Ok(dsl::lit(pl::NULL).cast(pl::DataType::Boolean))
} else {
unreachable!("internal error: could unexpectedly not handle this R value");
}
}
(Rtype::ExternalPtr, 1) => match () {
_ if robj.inherits("RPolarsSeries") => {
let s: RPolarsSeries =
unsafe { &mut *robj.external_ptr_addr::<RPolarsSeries>() }.clone();
Ok(pl::lit(s.0))
}
_ if robj.inherits("RPolarsExpr") => {
let expr: RPolarsExpr =
unsafe { &mut *robj.external_ptr_addr::<RPolarsExpr>() }.clone();
Ok(expr.0)
}
_ if robj_inherits(&robj, ["RPolarsThen", "RPolarsChainedThen"]) => unpack_r_eval(
R!("polars:::result({{robj}}$otherwise(polars::pl$lit(NULL)))"),
)
.and_then(r_expr_to_rust_expr)
.map(|expr| expr.0),
_ if robj_inherits(&robj, ["RPolarsWhen", "RPolarsChainedWhen"]) => rerr()
.plain("Cannot use a When or ChainedWhen-statement as Expr without a $then()"),
_ => rerr()
.bad_robj(&robj)
.plain("pl$lit() this ExternalPtr class is not currently supported"),
},
(_, _) => rerr().bad_robj(&robj).plain("unsupported R type "),
}
.map(RPolarsExpr)
.when("constructing polars literal from Robj")
}

This is a target to be completely rewritten in the next branch, so it will be a minimal modification and not worth much effort.

@etiennebacher
Copy link
Collaborator Author

Compiling failure on macOS is due to rust-lang/rust#129887, which should have been fixed in more recent nightly versions. I tried bumping the toolchain to 2024-09-19 to see if it works

@etiennebacher etiennebacher marked this pull request as ready for review September 21, 2024 13:06
@etiennebacher
Copy link
Collaborator Author

@eitsupi I didn't add much since last time but can you check the fix for literal values?

Comment on lines +90 to +95
(_, rlen) if robj_inherits(&robj, ["POSIXct", "PTime", "Date"]) => {
if rlen == 1 {
Ok(to_series_then_lit(robj)?.first())
} else {
to_series_then_lit(robj)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this is not enough, because the case of length 1 should be handled for all R objects, not only for POSIXct.
So the length determination must be done after the converted Series in the to_series_then_lit function, not here.

Copy link
Collaborator

@eitsupi eitsupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge them as they are now, since literal values are not likely to be used that often and I think a rewrite is needed to fundamentally solve the problem.

@eitsupi eitsupi merged commit cb3dd57 into main Sep 21, 2024
33 of 35 checks passed
@eitsupi eitsupi deleted the rust-polars-0.43.0 branch September 21, 2024 22:30
@etiennebacher etiennebacher mentioned this pull request Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants