Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Possibility to strip datafusion error name #10186

Merged
merged 6 commits into from
Apr 27, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 112 additions & 58 deletions datafusion/common/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -281,64 +281,7 @@ impl From<GenericError> for DataFusionError {

impl Display for DataFusionError {
fn fmt(&self, f: &mut Formatter) -> std::fmt::Result {
match *self {
DataFusionError::ArrowError(ref desc, ref backtrace) => {
let backtrace = backtrace.clone().unwrap_or("".to_owned());
write!(f, "Arrow error: {desc}{backtrace}")
}
#[cfg(feature = "parquet")]
DataFusionError::ParquetError(ref desc) => {
write!(f, "Parquet error: {desc}")
}
#[cfg(feature = "avro")]
DataFusionError::AvroError(ref desc) => {
write!(f, "Avro error: {desc}")
}
DataFusionError::IoError(ref desc) => {
write!(f, "IO error: {desc}")
}
DataFusionError::SQL(ref desc, ref backtrace) => {
let backtrace: String = backtrace.clone().unwrap_or("".to_owned());
write!(f, "SQL error: {desc:?}{backtrace}")
}
DataFusionError::Configuration(ref desc) => {
write!(f, "Invalid or Unsupported Configuration: {desc}")
}
DataFusionError::NotImplemented(ref desc) => {
write!(f, "This feature is not implemented: {desc}")
}
DataFusionError::Internal(ref desc) => {
write!(f, "Internal error: {desc}.\nThis was likely caused by a bug in DataFusion's \
code and we would welcome that you file an bug report in our issue tracker")
}
DataFusionError::Plan(ref desc) => {
write!(f, "Error during planning: {desc}")
}
DataFusionError::SchemaError(ref desc, ref backtrace) => {
let backtrace: &str =
&backtrace.as_ref().clone().unwrap_or("".to_owned());
write!(f, "Schema error: {desc}{backtrace}")
}
DataFusionError::Execution(ref desc) => {
write!(f, "Execution error: {desc}")
}
DataFusionError::ResourcesExhausted(ref desc) => {
write!(f, "Resources exhausted: {desc}")
}
DataFusionError::External(ref desc) => {
write!(f, "External error: {desc}")
}
#[cfg(feature = "object_store")]
DataFusionError::ObjectStore(ref desc) => {
write!(f, "Object Store error: {desc}")
}
DataFusionError::Context(ref desc, ref err) => {
write!(f, "{}\ncaused by\n{}", desc, *err)
}
DataFusionError::Substrait(ref desc) => {
write!(f, "Substrait error: {desc}")
}
}
self.format_err(f)
}
}

Expand Down Expand Up @@ -419,6 +362,9 @@ impl DataFusionError {
Self::Context(description.into(), Box::new(self))
}

/// Strips backtrace out of the error message
/// If backtrace enabled then error has a format "message" [`Self::BACK_TRACE_SEP`] "backtrace"
/// The method strips the backtrace and outputs "message"
pub fn strip_backtrace(&self) -> String {
self.to_string()
.split(Self::BACK_TRACE_SEP)
Expand Down Expand Up @@ -450,6 +396,99 @@ impl DataFusionError {
#[cfg(not(feature = "backtrace"))]
"".to_owned()
}

fn get_error_name(&self) -> &'static str {
match self {
DataFusionError::ArrowError(_, _) => "Arrow error: ",
#[cfg(feature = "parquet")]
DataFusionError::ParquetError(_) => "Parquet error: ",
#[cfg(feature = "avro")]
DataFusionError::AvroError(_) => "Avro error: ",
#[cfg(feature = "object_store")]
DataFusionError::ObjectStore(_) => "Object Store error: ",
DataFusionError::IoError(_) => "IO error: ",
DataFusionError::SQL(_, _) => "SQL error: ",
DataFusionError::NotImplemented(_) => "This feature is not implemented: ",
DataFusionError::Internal(_) => "Internal error: ",
DataFusionError::Plan(_) => "Error during planning: ",
DataFusionError::Configuration(_) => "Invalid or Unsupported Configuration: ",
DataFusionError::SchemaError(_, _) => "Schema error: ",
DataFusionError::Execution(_) => "Execution error: ",
DataFusionError::ResourcesExhausted(_) => "Resources exhausted: ",
DataFusionError::External(_) => "External error: ",
DataFusionError::Context(_, _) => "",
DataFusionError::Substrait(_) => "Substrait error: ",
}
}

fn format_err(&self, f: &mut Formatter) -> std::fmt::Result {
let error_name = self.get_error_name();
match *self {
DataFusionError::ArrowError(ref desc, ref backtrace) => {
let backtrace = backtrace.clone().unwrap_or("".to_owned());
write!(f, "{error_name}{desc}{backtrace}")
}
#[cfg(feature = "parquet")]
DataFusionError::ParquetError(ref desc) => {
write!(f, "{error_name}{desc}")
}
#[cfg(feature = "avro")]
DataFusionError::AvroError(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::IoError(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::SQL(ref desc, ref backtrace) => {
let backtrace: String = backtrace.clone().unwrap_or("".to_owned());
write!(f, "{error_name}{desc:?}{backtrace}")
}
DataFusionError::Configuration(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::NotImplemented(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::Internal(ref desc) => {
write!(f, "{error_name}{desc}.\nThis was likely caused by a bug in DataFusion's \
code and we would welcome that you file an bug report in our issue tracker")
}
DataFusionError::Plan(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::SchemaError(ref desc, ref backtrace) => {
let backtrace: &str =
&backtrace.as_ref().clone().unwrap_or("".to_owned());
write!(f, "{error_name}{desc}{backtrace}")
}
DataFusionError::Execution(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::ResourcesExhausted(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::External(ref desc) => {
write!(f, "{error_name}{desc}")
}
#[cfg(feature = "object_store")]
DataFusionError::ObjectStore(ref desc) => {
write!(f, "{error_name}{desc}")
}
DataFusionError::Context(ref desc, ref err) => {
write!(f, "{error_name}{desc}\ncaused by\n{}", *err)
}
DataFusionError::Substrait(ref desc) => {
write!(f, "{error_name}{desc}")
}
}
}

/// Strips error description out of the error message
/// Error has a format `{error_name}{`error`}
/// The method strips the `error_name` from level and outputs 'error``
pub fn strip_error_name(self) -> String {
self.to_string().replace(self.get_error_name(), "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if we could avoid including the error message in the first place rather than removing it later. There is the risk of accidentally removing text that was not added by impl Display for DataFusionError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this in the first place.

  • We can fix the formatter so it will skip the error name for some format options, that doesn't work with to_string() which has default formatter.

  • we can tweak impl Display with cargo feature and return name as blank if the feature enabled and that looks little bit overcompicated.

  • we can refer to env variable and depending on the value define the error name but this might be expensive.

  • we can probably introduce a DF config param?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure exactly what the usecase is, but if seems like if the only need is:

Sometimes it is required to fetch the original message without Datafusion error name.

Rather than trying to fix the message after it has been created, how about we add a function that just provides the underlying message? Something like

impl DataFusionError {
  /// retrieve the underlying error message for this error, without the
  /// type prefix. 
  ///
  /// For example, if err.to_string() returns `Error during planning: My Query Error` 
  /// this function would return only `My Query Error`
  fn message(&self) -> &str { .. }
}

And then the impl could be

impl Display for DataFusionError {
    fn fmt(&self, f: &mut Formatter) -> std::fmt::Result {
        match *self {
            DataFusionError::ArrowError(ref desc, ref backtrace) => {
                let backtrace = backtrace.clone().unwrap_or("".to_owned());
                write!(f, "Arrow error: {}{backtrace}", self.message())
            }
            #[cfg(feature = "parquet")]
            DataFusionError::ParquetError(ref desc) => {
                write!(f, "Parquet error: {}", message)
            }
...
}

Maybe fn message(&self) would have to return Cow<str> or something to avoid copying in the external cases....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the simplest solution.

Our use case (Comet) is that we are using thiserror and we are delegating to DataFusionError for the error string and it currently uses the Display trait.

@comphead I don't know enough about thiserror yet to know if/how we could call the new method proposed here. Do you know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it works even without thiserror magic, we can call .message() method in From<> error conversion

}
}

/// Unwrap an `Option` if possible. Otherwise return an `DataFusionError::Internal`.
Expand Down Expand Up @@ -778,6 +817,21 @@ mod test {
);
}

#[test]
#[cfg(not(feature = "backtrace"))]
fn test_strip_error_name() {
let res: Result<(), DataFusionError> = plan_err!("Err");
let res = res.unwrap_err();
assert_eq!(res.strip_error_name(), "Err");

// Test only top level stripped
let res: Result<(), DataFusionError> = plan_err!("Err");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are two scenarios that you are testing, should they go in two different test cases ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @edmondop I'd say the second scenario is variation of first one. Not sure if every variation requires its own test tbh

let res2: Result<(), DataFusionError> =
exec_err!("{}", res.unwrap_err().strip_backtrace());
let res2 = res2.unwrap_err();
assert_eq!(res2.strip_error_name(), "Error during planning: Err");
}

/// Model what happens when implementing SendableRecordBatchStream:
/// DataFusion code needs to return an ArrowError
fn return_arrow_error() -> arrow::error::Result<()> {
Expand Down