Tell user which variable caused error when quantifying #42

TomNicholas · 2020-11-01T23:25:45Z

While debugging the error caused by #40 , I found that if pint throws an error whilst quantifying one variable in a dataset, then ds.pint.quantify throws that error without telling your which variable caused it. This is not very helpful for the the user who is trying to convert their xarray data to be pint-backed, it would be much better if the exception were caught and re-raised, something like this

for variable in ds:
    try:
        _decide_units(unit, registry, attr)
    except Exception as err:
        print(f"Error whilst quantifying {variable}")
        raise(err)

I tried to implement this, but got stuck when trying to unpack the dictionary comprehension here

pint-xarray/pint_xarray/accessors.py

Lines 501 to 505 in 62c15c5

    
           units = { 
        
               name: _decide_units(unit, registry, attr) 
        
               for name, (unit, attr) in zip_mappings(units, unit_attrs).items() 
        
               if unit is not None or attr is not None 
        
           }

I'm probably just being dumb, but how exactly is this unpacked loop + condition different from the original dictionary comprehension?

units = {}
for name, (unit, attr) in zip_mappings(units, unit_attrs).items():
    if unit is not None or attr is not None:
        units[name] = _decide_units(unit, registry, attr)

The latter causes test failures but the former doesn't...

The text was updated successfully, but these errors were encountered:

keewis · 2020-11-01T23:39:13Z

error reporting is a good point. However, I'm not a big fan of these unpacked comprehensions (which technically are not equivalent), so I'd like to find a way to keep the comprehension while still getting an error that is informative enough. Unless we want to add a name parameter to _decide_units that might be pretty difficult, though.

As for the failure, units is the name of the passed units, and if you add a new units variable you can't reference the old one.

TomNicholas · 2020-11-01T23:45:31Z

As for the failure, units is the name of the passed units, and if you add a new units variable you can't reference the old one.

Thank you! I knew it was going to be something silly.

However, I'm not a big fan of these unpacked comprehensions (which technically are not equivalent)

Why? This comprehension goes over 4 lines, and (in my opinion) is not particularly easy to understand. What's the advantage of not unpacking it?

Unless we want to add a name parameter to _decide_units that might be pretty difficult, though.

It seems like it would be bad practice to pass a variable into a function for the sole purpose of keeping track of the context from which that function was called.

keewis · 2020-11-02T01:23:38Z

I usually find comprehensions easier to read: there's only a single place to look for a description of the new values or a list of the applied filters. Of course, comprehensions can be written in a way that makes it harder to read. Would it help if we assign a name to the result of the zip_mappings call?

The technical difference between both options is that while the comprehension will collect all items and only then create the new container, in the loop the container will have to grow on each assignment because it doesn't know the total number of elements. Python will work around that by allocating more than is necessary, which means that there's less of an overhead.

However, we don't expect users to have a lot of variables (I didn't measure, but I would guess this is not much of an issue with less than a hundred elements) which makes this discussion a matter of taste.

It seems like it would be bad practice to pass a variable into a function for the sole purpose of keeping track of the context from which that function was called

exactly, which is why I am trying to find a better way. I'm not sure there is one, though.

TomNicholas · 2020-11-02T12:06:58Z

I usually find comprehensions easier to read: there's only a single place to look for a description of the new values or a list of the applied filters.

Yeah fair enough.

Even just changing some variable names would help, e.g.

possible_new_units = zip_mappings(units, unit_attrs)
new_units = { 
    name: _decide_units(unit, registry, attr) 
    for name, (unit, attr) in possible_new_units.items() 
    if (unit and attr) is not None 
}

the comprehension will collect all items and only then create the new container, in the loop the container will have to grow

Yes because of the generator underlying the comprehension, right. But I agree this doesn't really matter - the largest datasets I've ever seen still only have ~70 variables.

I am trying to find a better way. I'm not sure there is one, though.

I was trying to find out whether catching then immediately re-raising with context added is "acceptable" or not. I think it's fine to do this

possible_new_units = zip_mappings(units, unit_attrs)
new_units = {}
for name, (unit, attr) in possible_new_units.items():
    if unit is not None or attr is not None:
        try:
            new_units[name] = _decide_units(unit, registry, attr)
        except Exception as e:
            raise ValueError(f"Failed to assign units to variable {name}") from e

which then means instead of just this non-variable-specific error

UndefinedUnitError                        Traceback (most recent call last)
/marconi_work/FUA34_SOLBOUT4/tnichola/pycode/pint-xarray/pint_xarray/accessors.py in quantify(self, units, unit_registry, **unit_kwargs)
    505                 try:
--> 506                     new_units[name] = _decide_units(unit, registry, attr)
    507                 except Exception as e:

/marconi_work/FUA34_SOLBOUT4/tnichola/pycode/pint-xarray/pint_xarray/accessors.py in _decide_units(units, registry, unit_attribute)
    134         # TODO option to read and decode units according to CF conventions (see MetPy)?
--> 135         units = registry.parse_units(unit_attribute)
    136     elif isinstance(units, Unit):

/marconi_work/FUA34_SOLBOUT4/tnichola/anaconda3/envs/working/lib/python3.7/site-packages/pint/registry.py in parse_units(self, input_string, as_delta, case_sensitive)
   1100             input_string = p(input_string)
-> 1101         units = self._parse_units(input_string, as_delta, case_sensitive)
   1102         return self.Unit(units)

/marconi_work/FUA34_SOLBOUT4/tnichola/anaconda3/envs/working/lib/python3.7/site-packages/pint/registry.py in _parse_units(self, input_string, as_delta, case_sensitive)
   1314 
-> 1315         return super()._parse_units(input_string, as_delta, case_sensitive)
   1316 

/marconi_work/FUA34_SOLBOUT4/tnichola/anaconda3/envs/working/lib/python3.7/site-packages/pint/registry.py in _parse_units(self, input_string, as_delta, case_sensitive)
   1128         for name in units:
-> 1129             cname = self.get_name(name, case_sensitive=case_sensitive)
   1130             value = units[name]

/marconi_work/FUA34_SOLBOUT4/tnichola/anaconda3/envs/working/lib/python3.7/site-packages/pint/registry.py in get_name(self, name_or_alias, case_sensitive)
    645         if not candidates:
--> 646             raise UndefinedUnitError(name_or_alias)
    647         elif len(candidates) == 1:

UndefinedUnitError: 'm2' is not defined in the unit registry

you get this bit appended to the end

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-4-82feefd6c8c5> in <module>
----> 1 ds = ds.pint.quantify()

/marconi_work/FUA34_SOLBOUT4/tnichola/pycode/pint-xarray/pint_xarray/accessors.py in quantify(self, units, unit_registry, **unit_kwargs)
    507                 except Exception as e:
    508                     raise ValueError(
--> 509                         f"Failed to assign units to variable {name}") from e
    510 
    511         # TODO: remove once indexes support units

ValueError: Failed to assign units to variable g_11

This could be even clearer:
ValueError: Failed to assign units to variable g_11 using its attribute 'm2'

However, this messes up a couple of tests because then a less specific error gets raised as the final one instead of say pint.Errors.UndefinedUnitError. It would be better to catch all pint-specific errors, but I noticed that sometimes pint's parse_units will throw a generic TypeError instead of a pint-specific error.

keewis · 2020-11-02T13:00:28Z

That sounds good. Maybe it would be easier to raise the same type of exception as the one we catch?

Something like this:

try:
    new_units[name] = _decide_units(unit, registry, attr)
except Exception as e:
    if not isinstance(e, (TypeError, pint.errors.UndefinedUnitError)):
        raise

    raise type(e)(f"Failed to assign units to variable {name}: {e}") from e

We probably wouldn't even have to care about filtering out other errors (not that I would expect any). Not sure about trying to find out if the unit came from the attribute or if it was passed to quantify as an argument, though.

TomNicholas · 2020-11-02T13:56:53Z

Raising the same type of error seems like the best solution to me - then tests would only break if they were expecting a specific error message rather than a specific error type. I'm kind of worried this is like horrible practice or something but I can't really see a better way to do it. There's also probably no point trying to filter anything in that case.

Not sure about trying to find out if the unit came from the attribute or if it was passed to quantify as an argument, though.

I suppose really the user should know which of these two cases it was based on their input.

I'll submit a PR then.

keewis · 2021-04-24T00:21:48Z

fixed by #43 and #92

TomNicholas mentioned this issue Nov 2, 2020

Give quantify errors context #43

Merged

2 tasks

keewis mentioned this issue Mar 4, 2021

release v0.2 #80

Closed

6 tasks

keewis closed this as completed Apr 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tell user which variable caused error when quantifying #42

Tell user which variable caused error when quantifying #42

TomNicholas commented Nov 1, 2020

keewis commented Nov 1, 2020

TomNicholas commented Nov 1, 2020

keewis commented Nov 2, 2020

TomNicholas commented Nov 2, 2020

keewis commented Nov 2, 2020 •

edited

Loading

TomNicholas commented Nov 2, 2020

keewis commented Apr 24, 2021

Tell user which variable caused error when quantifying #42

Tell user which variable caused error when quantifying #42

Comments

TomNicholas commented Nov 1, 2020

keewis commented Nov 1, 2020

TomNicholas commented Nov 1, 2020

keewis commented Nov 2, 2020

TomNicholas commented Nov 2, 2020

keewis commented Nov 2, 2020 • edited Loading

TomNicholas commented Nov 2, 2020

keewis commented Apr 24, 2021

keewis commented Nov 2, 2020 •

edited

Loading