Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size of range returned by get and get_values can be smaller than the input size #1289

Closed
6 tasks done
alifeee opened this issue Sep 4, 2023 · 5 comments · Fixed by #1305
Closed
6 tasks done

size of range returned by get and get_values can be smaller than the input size #1289

alifeee opened this issue Sep 4, 2023 · 5 comments · Fixed by #1305
Assignees
Milestone

Comments

@alifeee
Copy link
Collaborator

alifeee commented Sep 4, 2023

Worksheet.get and Worksheet.get_values return a smaller area than asked for if some of it is empty. For example, see this call of `worksheet.get("E7:I11")

Screenshot of spreadsheet

Replication

import gspread
gc = gspread.service_account(filename="google_credentials.json")
spreadsheet = gc.open_by_key("<spreadsheet_key>")
worksheet = spreadsheet.sheet1

values = worksheet.get("E7:I11")
print(values)

expected output (size 5 x 5)

[
 ['', '', '', '', ''],
 ['', 'data', 'data', 'data', ''],
 ['', '', 'data', '', ''],
 ['', '', 'data', 'data', ''],
 ['', '', '', '', '']
]

actual output (size 4 x 4)

[
 ['', '', '', ''],
 ['', 'data', 'data', 'data'],
 ['', '', 'data', ''],
 ['', '', 'data', 'data']
]

Related issues

#910, #947, #1284

Attempted fixes

#985, #1281

steps to fix

  • add a boolean kwarg to Worksheet.get and Worksheet.get_values, something like maintain_size
  • if maintain_size is true
    • if the size is knowable (see below)
      • get the size (number of rows/cols) asked for by the user from range_name using a1_to_rowcol
      • use utils.fill_gaps to fill the empty values and return this
  • add test, i.e., WorksheetTest.test_get_values_and_maintain_size

Problems

There are many ways to define a range with Google Sheets. See the documentation for a1 notation. Here is a summary. In green are the ranges we can know the size of. In red the ones we can't.

Screenshot of Google Sheets documentation

What we could do is resize the data if the size is knowable (cases 1 and 5 above) and leave it as-is if not (cases 2, 3, 4, 6, 7, 8 above).

Questions

Should this be the default behaviour?

@alifeee alifeee changed the title size of range returned by get and get_values can be smaller than the input size ( size of range returned by get and get_values can be smaller than the input size Sep 4, 2023
@alifeee alifeee self-assigned this Sep 4, 2023
@alifeee
Copy link
Collaborator Author

alifeee commented Sep 4, 2023

Thinking about it, if the user does not know the size of the region they request (cases 2,3,4,7,8 above) then they cannot complain when we return the "wrong" size region. This is only not the case for named ranges, which have a defined size when you create them.

@botcs
Copy link

botcs commented Sep 4, 2023

Can we not query the sheet's size?
Here's a related SO question

@Zylvian
Copy link

Zylvian commented Sep 5, 2023

Thinking about it, if the user does not know the size of the region they request (cases 2,3,4,7,8 above) then they cannot complain when we return the "wrong" size region. This is only not the case for named ranges, which have a defined size when you create them.

Re: my comment in #1284 as well as your statement above: should the default behavior not be to maintain the shape, given that the region is specified? As in:
case 1, 5 -> maintain shape
case 2,3,4,6,7,8 -> return unspecified region

First instinct would be to parse the A1 notation for 2x characters and 2x integers (post !) which would define a specified range, but given that it's my initial idea, there is probably something wrong with it.

@alifeee
Copy link
Collaborator Author

alifeee commented Sep 5, 2023

I agree with what is said here.

Can we not query the sheet's size?

We can, and we do. We store this as properties in the worksheet object.

@property
def row_count(self):
"""Number of rows."""
return self._properties["gridProperties"]["rowCount"]
@property
def col_count(self):
"""Number of columns."""
return self._properties["gridProperties"]["columnCount"]

should the default behavior not be to maintain the shape, given that the region is specified?

I too believe this should be the default.

See this table for the ways (I believe) a user can use worksheet.get(). For most of them, we know the expected size, and could pad the result to be this way (as in my above instructions).

Case Example Known size Can we know row/col count?
1 Sheet1!A1:B2 col + row
2 Sheet1!A:A col
3 Sheet1!1:2 row
4 Sheet1!A5:A col
5 A1:B2 col + row
6 Sheet1
7 'My Custom Sheet'!A:A col
8 'My Custom Sheet' col
9 named range

✅ - columns and rows are explicit in the notation
☑ - columns and rows are not explicit, but can be obtained via col_count and/or row_count props
❌ - size of region cannot be obtained

However, it is a known issue that the col_count and row_count props are not kept up to date very well (#881, #1211, #1201), so using these for this feature (the blue ☑ above) may result in nasty bugs.

For now, I suggest we implement padding with a1_to_rowcol and utils.fill_gaps for only cases 1 and 5 above. I think this will be enough.

For this, we simply need a regex match which can determine between cases 1 and 5, and the rest, and then to apply the logic stated in the instructions above. We already use regex in a similar way in the module:

gspread/gspread/utils.py

Lines 25 to 26 in 817ebb8

CELL_ADDR_RE = re.compile(r"([A-Za-z]+)([1-9]\d*)")
A1_ADDR_ROW_COL_RE = re.compile(r"([A-Za-z]+)?([1-9]\d*)?$")

I suggest something like ([A-Za-z]+)(\d+):([A-Za-z]+)(\d+) would work (see on regex101).

@alifeee
Copy link
Collaborator Author

alifeee commented Sep 10, 2023

Technically, the (unexpected) behaviour is in the documentation for get_values

image

However, it is still unexpected. I will continue my work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants