Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add get_fill_value Variable method and fill_value='default' option #1375

Merged
merged 20 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Changelog
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
===============================
* add static type hints (PR #1302)
* Expose nc_rc_set, nc_rc_get (via rc_set, rc_get module functions). (PR #1348)
* Add Variable.get_fill_value and allow `fill_value='default'` to
set `_FillValue` using default fill values. (issue #1374, PR #1375).
* Fix NETCDF3 endian error (issue #1373, PR #1355).

version 1.7.1 (tag v1.7.1rel)
===============================
Expand Down
26 changes: 20 additions & 6 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ <h2 id="dimensions-in-a-netcdf-file">Dimensions in a netCDF file</h2>
&lt;class 'netCDF4._netCDF4.Dimension'&gt;: name = 'lon', size = 144
</code></pre>
<p><code><a title="netCDF4.Dimension" href="#netCDF4.Dimension">Dimension</a></code> names can be changed using the
<code>Dataset.renameDimension</code> method of a <code><a title="netCDF4.Dataset" href="#netCDF4.Dataset">Dataset</a></code> or
<code><a title="netCDF4.Dataset.renameDimension" href="#netCDF4.Dataset.renameDimension">Dataset.renameDimension()</a></code> method of a <code><a title="netCDF4.Dataset" href="#netCDF4.Dataset">Dataset</a></code> or
<code><a title="netCDF4.Group" href="#netCDF4.Group">Group</a></code> instance.</p>
<h2 id="variables-in-a-netcdf-file">Variables in a netCDF file</h2>
<p>netCDF variables behave much like python multidimensional array objects
Expand Down Expand Up @@ -2676,12 +2676,16 @@ <h3>Instance variables</h3>
Ignored if <code>significant_digts</code> not specified. If 'BitRound' is used, then
<code>significant_digits</code> is interpreted as binary (not decimal) digits.</p>
<p><strong><code>fill_value</code></strong>:
If specified, the default netCDF <code>_FillValue</code> (the
If specified, the default netCDF fill value (the
value that the variable gets filled with before any data is written to it)
is replaced with this value.
If fill_value is set to <code>False</code>, then
the variable is not pre-filled. The default netCDF fill values can be found
in the dictionary <code>netCDF4.default_fillvals</code>.</p>
is replaced with this value, and the <code>_FillValue</code> attribute is set.
If fill_value is set to <code>False</code>, then the variable is not pre-filled.
The default netCDF fill values can be found in the dictionary <code>netCDF4.default_fillvals</code>.
If not set, the default fill value will be used but no <code>_FillValue</code> attribute will be created
(this is the default behavior of the netcdf-c library). If you want to use the
default fill value, but have the <code>_FillValue</code> attribute set, use
<code>fill_value='default'</code> (note - this only works for primitive data types). <code><a title="netCDF4.Variable.get_fill_value" href="#netCDF4.Variable.get_fill_value">Variable.get_fill_value()</a></code>
can be used to retrieve the fill value, even if the <code>_FillValue</code> attribute is not set.</p>
<p><strong><code>chunk_cache</code></strong>: If specified, sets the chunk cache size for this variable.
Persists as long as Dataset is open. Use <code>set_var_chunk_cache</code> to
change it when Dataset is re-opened.</p>
Expand Down Expand Up @@ -2806,6 +2810,15 @@ <h3>Methods</h3>
<p>return a tuple of <code><a title="netCDF4.Dimension" href="#netCDF4.Dimension">Dimension</a></code> instances associated with this
<code><a title="netCDF4.Variable" href="#netCDF4.Variable">Variable</a></code>.</p></div>
</dd>
<dt id="netCDF4.Variable.get_fill_value"><code class="name flex">
<span>def <span class="ident">get_fill_value</span></span>(<span>self)</span>
</code></dt>
<dd>
<div class="desc"><p><strong><code>get_fill_value(self)</code></strong></p>
<p>return the fill value associated with this <code><a title="netCDF4.Variable" href="#netCDF4.Variable">Variable</a></code> (returns <code>None</code> if data is not
pre-filled). Works even if default fill value was used, and <code>_FillValue</code> attribute
does not exist.</p></div>
</dd>
<dt id="netCDF4.Variable.get_var_chunk_cache"><code class="name flex">
<span>def <span class="ident">get_var_chunk_cache</span></span>(<span>self)</span>
</code></dt>
Expand Down Expand Up @@ -3241,6 +3254,7 @@ <h4><code><a title="netCDF4.Variable" href="#netCDF4.Variable">Variable</a></cod
<li><code><a title="netCDF4.Variable.filters" href="#netCDF4.Variable.filters">filters</a></code></li>
<li><code><a title="netCDF4.Variable.getValue" href="#netCDF4.Variable.getValue">getValue</a></code></li>
<li><code><a title="netCDF4.Variable.get_dims" href="#netCDF4.Variable.get_dims">get_dims</a></code></li>
<li><code><a title="netCDF4.Variable.get_fill_value" href="#netCDF4.Variable.get_fill_value">get_fill_value</a></code></li>
<li><code><a title="netCDF4.Variable.get_var_chunk_cache" href="#netCDF4.Variable.get_var_chunk_cache">get_var_chunk_cache</a></code></li>
<li><code><a title="netCDF4.Variable.getncattr" href="#netCDF4.Variable.getncattr">getncattr</a></code></li>
<li><code><a title="netCDF4.Variable.group" href="#netCDF4.Variable.group">group</a></code></li>
Expand Down
1 change: 1 addition & 0 deletions src/netCDF4/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,7 @@ class Variable(Generic[T_Datatype]):
def renameAttribute(self, oldname: str, newname: str) -> None: ...
def assignValue(self, val: Any) -> None: ...
def getValue(self) -> Any: ...
def get_fill_value(self) -> Any: ...
def set_auto_chartostring(self, chartostring: bool) -> None: ...
def use_nc_get_vars(self, use_nc_get_vars: bool) -> None: ...
def set_auto_maskandscale(self, maskandscale: bool) -> None: ...
Expand Down
54 changes: 50 additions & 4 deletions src/netCDF4/_netCDF4.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -4035,11 +4035,16 @@ behavior is similar to Fortran or Matlab, but different than numpy.
Ignored if `significant_digts` not specified. If 'BitRound' is used, then
`significant_digits` is interpreted as binary (not decimal) digits.

**`fill_value`**: If specified, the default netCDF `_FillValue` (the
**`fill_value`**: If specified, the default netCDF fill value (the
value that the variable gets filled with before any data is written to it)
is replaced with this value. If fill_value is set to `False`, then
the variable is not pre-filled. The default netCDF fill values can be found
in the dictionary `netCDF4.default_fillvals`.
is replaced with this value, and the `_FillValue` attribute is set.
If fill_value is set to `False`, then the variable is not pre-filled.
The default netCDF fill values can be found in the dictionary `netCDF4.default_fillvals`.
If not set, the default fill value will be used but no `_FillValue` attribute will be created
(this is the default behavior of the netcdf-c library). If you want to use the
default fill value, but have the `_FillValue` attribute set, use
`fill_value='default'` (note - this only works for primitive data types). `Variable.get_fill_value`
can be used to retrieve the fill value, even if the `_FillValue` attribute is not set.

**`chunk_cache`**: If specified, sets the chunk cache size for this variable.
Persists as long as Dataset is open. Use `set_var_chunk_cache` to
Expand Down Expand Up @@ -4403,6 +4408,17 @@ behavior is similar to Fortran or Matlab, but different than numpy.
if ierr != NC_NOERR:
if grp.data_model != 'NETCDF4': grp._enddef()
_ensure_nc_success(ierr, extra_msg=error_info)
elif fill_value == 'default':
if self._isprimitive:
fillval = numpy.array(default_fillvals[self.dtype.str[1:]])
if not fillval.dtype.isnative: fillval.byteswap(True)
_set_att(self._grp, self._varid, '_FillValue',\
fillval, xtype=xtype)
else:
msg = """
WARNING: there is no default fill value for this data type, so fill_value='default'
does not do anything."""
warnings.warn(msg)
else:
if self._isprimitive or self._isenum or \
(self._isvlen and self.dtype == str):
Expand Down Expand Up @@ -4638,6 +4654,36 @@ behavior is similar to Fortran or Matlab, but different than numpy.
return the group that this `Variable` is a member of."""
return self._grp

def get_fill_value(self):
"""
**`get_fill_value(self)`**

return the fill value associated with this `Variable` (returns `None` if data is not
pre-filled). Works even if default fill value was used, and `_FillValue` attribute
does not exist."""
cdef int ierr, no_fill
with nogil:
ierr = nc_inq_var_fill(self._grpid,self._varid,&no_fill,NULL)
_ensure_nc_success(ierr)
if no_fill == 1: # no filling for this variable
return None
else:
try:
fillval = self._FillValue
return fillval
Comment on lines +4668 to +4673
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when no_fill would be one -- but if there is a _FillValue attribute, maybe it should be returned anyway? e.g. look for that first?

The other question is what to do if the _FillValue attribute doesn't match what nc_inq_var_fill returns?

That would be a malformed file, but maybe helpful to warn the user somehow?

Copy link
Collaborator Author

@jswhit jswhit Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no_fill=1, there is no pre-filling of data in the variable (so _FillValue is not used). Not sure what happens if pre-filling is turned off and _FillValue is set - but in this case, I think the user would expect to get information on what is actually happening when you create a variable and don't write data to it.

Copy link
Contributor

@ChrisBarker-NOAA ChrisBarker-NOAA Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- that's the challenge -- the fill value has a well defined meaning and purpose, but it's commonly used (abused?) to means missing value, or invalid value. So someone could, in theory, write a file without the fill value set, and then use the attribute to mean missing data. so ????

But I suppose the pathological cases are not our problem :-) -- the point of this new method to get the actual, under the hood, fill_value.

in which case, looking for the FillValue attribute is unnecessary -- unless we want to check that it matches, which might be a good idea!

except AttributeError:
# _FillValue attribute not set, see if we can retrieve _FillValue.
# for primitive data types.
if self._isprimitive:
#return numpy.array(default_fillvals[self.dtype.str[1:]],self.dtype)
fillval = numpy.empty((),self.dtype)
ierr=nc_inq_var_fill(self._grpid,self._varid,&no_fill,PyArray_DATA(fillval))
_ensure_nc_success(ierr)
return fillval
else:
# no default filling for non-primitive data types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is where no_fill would be 1 -- so I'd think we would want to return the _FillValue attribute if it exists.

return None

def ncattrs(self):
"""
**`ncattrs(self)`**
Expand Down
43 changes: 43 additions & 0 deletions test/test_get_fill_value.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import unittest, os, tempfile
import netCDF4
from numpy.testing import assert_array_equal
import numpy as np

fill_val = np.array(9.9e31)

# test Variable.get_fill_value

class TestGetFillValue(unittest.TestCase):
def setUp(self):
self.testfile = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name
f = netCDF4.Dataset(self.testfile, 'w')
dim = f.createDimension('x',10)
for dt in netCDF4.default_fillvals.keys():
if not dt.startswith('c'):
v = f.createVariable(dt+'_var',dt,dim)
v = f.createVariable('float_var',np.float64,dim,fill_value=fill_val)
# test fill_value='default' option (issue #1374)
v2 = f.createVariable('float_var2',np.float64,dim,fill_value='default')
f.close()

def tearDown(self):
os.remove(self.testfile)

def runTest(self):
f = netCDF4.Dataset(self.testfile, "r")
# no _FillValue set, test that default fill value returned
for dt in netCDF4.default_fillvals.keys():
if not dt.startswith('c'):
fillval = np.array(netCDF4.default_fillvals[dt])
if dt == 'S1': fillval = fillval.astype(dt)
v = f[dt+'_var']
assert_array_equal(fillval, v.get_fill_value())
# _FillValue attribute is set.
v = f['float_var']
assert_array_equal(fill_val, v.get_fill_value())
v = f['float_var2']
assert_array_equal(np.array(netCDF4.default_fillvals['f8']), v._FillValue)
f.close()

if __name__ == '__main__':
unittest.main()
Loading