Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_values does not return all data in some scenarios with empty data #910

Closed
kyle-tc opened this issue Aug 17, 2021 · 11 comments
Closed

get_values does not return all data in some scenarios with empty data #910

kyle-tc opened this issue Aug 17, 2021 · 11 comments
Assignees
Labels

Comments

@kyle-tc
Copy link

kyle-tc commented Aug 17, 2021

Hello! Thanks for your time. I wanted to report a potential bug in the get_values function. When you have a spreadsheet that has say 6 columns with headers, where data should be and you make a query for a particular row, there is a certain scenario where empty data fields are not returned. This can be confusing for downstream users of the gspread API who want to say, convert the row into a dictionary (ala get_all_records).

This scenario occurs when data is laid out like this:

header1, header2, header3, header4, header5, header6
x1, , , x4, ,

In that data above header5 and header6 are empty for the first row. Suppose you made this call:

ws.get_values("A1:1")

You only get back:

[['x1',,,'x4']]

As opposed to what I would expect:

[['x1',,,'x4',,]]

Minimal reproducer:

Screen Shot 2021-08-17 at 11 17 03 AM

(Pdb++) ws.get_values("A7:7")
[['', 'kl3020', '', '07/26/2021']]

Expected result:

(Pdb++) ws.get_values("A7:7")
[['', 'kl3020', '', '07/26/2021', '', '']]
@kyle-tc
Copy link
Author

kyle-tc commented Aug 17, 2021

Furthermore get_all_records works exactly as expected in this scenario.

@lavigne958 lavigne958 added good first issue Bug Need investigation This issue needs to be tested or investigated labels Aug 18, 2021
@lavigne958
Copy link
Collaborator

Hi @kyle-tc ,

thank you for reporting this issue.

I ran some tests an noticed the resulting length of the list returned can vary depending on the longest row with non empty cells that is returned. for example if your call to get_values includes your headers then the headers is the longest list of non empty values (6 values in total) so any row below/above will be of length 6. If you request some value in the middle of your spreadsheet and it only contains 4 non empty values then the list will be of length 4.

This is due to SpreadSheet API that returns a list of values for each row that is returned and trims empty values at the begining and the end.

Gspread made a workaround that by finding the longest row in the all the results then making sure all the list of values returned are of the same length. But you still need to retrieve one full line to obtain all your empty cells in all rows.

Hope this is not too confusing 😆 if so please comment here and I'll clarify any thing that is unclear 🙂

@lavigne958 lavigne958 removed good first issue Need investigation This issue needs to be tested or investigated labels Aug 20, 2021
@wgwz
Copy link

wgwz commented Aug 23, 2021

hey @lavigne958 👋 (replying from my other account). That makes sense, and I see why it would work that way. It makes me wish that header rows were a first class concept in google spreadsheets. But alas, I guess these types of headaches are to be expected if you're using spreadsheets as a database. I think in my case, I'll standardize on pulling in the first row of data and following a similar procedure to what you mentioned about padding rows that are shorter than the header row with empty values. i'm tempted to say that it would be nice to bake this feature into gspread but i'm sure that would be a backwards compatibility issue as it forces users to use the first row as headers, or there are people who have built other ways around it. perhaps it could be added as a opt-in functionality... but still i can see why it's tricky.

but the whole reason i actually went down this path was because i was hoping that the "find text" function was an API call to google that offloaded the search onto googles servers. then i noticed that we are just making a call to "get all records" anyways, so it was a bit pointless. i don't mean to change topic from the issue that i reported. but maybe it will be helpful for someone else searching google in the future. do you know if there is an API that does that with spreadsheets? i'm pretty sure the answer is no but admittedly i didn't look to hard once i realized what the search functions in gspread were doing.

@lavigne958
Copy link
Collaborator

Hi @wgwz , unfortunately no, there is not function that offload the search on the server side.

Closing this issue.

@kyle-tc
Copy link
Author

kyle-tc commented Oct 25, 2021

Google's got enough compute power for that right? ;-) Thanks for the consideration on my half-baked idea though @lavigne958

@ccppoo
Copy link
Contributor

ccppoo commented Oct 27, 2021

This is sample worksheet with name 'test'

A B C D E F
1 ID NAME math science python rock
2 9938 aaaa _ _ 90 _
3 9738 bbbb 30 _ 55 45
4 4821 cccc _ 40 _ _
5 3908 dddd _ _ 100 _
6 7182 eeee _ 90 _ 80

* in this example, _(underbar) means empty cell

* ROWs : 1 ... 6 // COLs : A ... F

... # auth stuffs

ws = spreadSheet.worksheet('test')

case 1. when requesting values A1:E3 (expectable)

result = ws.get_values('A1:E3')
# result
[
    ['ID', 'NAME', 'math', 'science', 'python'],
    [9938, 'aaaa', '', '', 90 ],
    [9738, 'bbbb', 30, '', 55]
]

case 2. when requesting values C1:E3 (expectable)

result = ws.get_values('C1:E3')
# result
[
    ['math', 'science', 'python'],
    ['', '', 90 ],
    [30, '', 55]
]

case 3. when requesting values A1:F2 (expectable)

result = ws.get_values('A1:F2')
# result
[
    ['ID', 'NAME', 'math', 'science', 'python', 'rock'],
    [9938, 'aaaa', '', '', 90, '' ],
]

edit added : use case for pandas.DataFrame

result = ws.get_values('A1:F2')
df2 = pands.DataFrame(result)

print(df2)
#    ID    NAME      math    science   python   rock
# 0  9938  aaaa                        90

case 4. when requesting values A2:F2 (quite not expectable)

result = ws.get_values('A2:F2')
# result
[
    [9938, 'aaaa', '', '', 90],
]

what some people would have expected :

[
    [9938, 'aaaa', '', '', 90, '']
]

comparing with case 2 and case 4,

we could know google search from left to right

but don't take into account when cell is empty

just like they are using str.rstrip() python string method

tl;dr google spread request return results after using rstrip()-like method

any thing to fix?

@lavigne958
Copy link
Collaborator

hi @ccppoo

thank you for this example, this is very clear.

We could fix it, sure, we could take the width of the requested range and return a list with extra empty cells at the ends if needed.
this should not take too much extra computation.

I see only 1 concern, what if someone uses a named range ? may be the response from google will alway have the right size in that case, ToBeConfirmed.

@ccppoo
Copy link
Contributor

ccppoo commented Oct 27, 2021

@lavigne958 named range shows the same result, may it's just no more than a reference like a variable,

@ccppoo
Copy link
Contributor

ccppoo commented Oct 28, 2021

@lavigne958 Can you re-open this issue for later PR?
this problem, not a bug, looks like it could be handled with gspread side

I solved this problem like this :

Link to def getMember

def getMember(self, name: str, replace_empty=None) -> Member:
        result: Cell = self.memberSheet.find(name, in_column=USER_NAME_COL)

        if not result or not result.value == name:
            return MEMBER_UNKNOWN

        # y, x
        start = rowcol_to_a1(result.row, result.col - 1)
        end = rowcol_to_a1(result.row, result.col - 1 + MEMBER_INFO_width - 1)

        # edit : in this case, it's 1-D list, see that I used [0] indexing below
        # expected column length
        data = [replace_empty  for _ in range(MEMBER_INFO_width + 1)]

        reply = self.memberSheet.get_values(
            f'{start}:{end}',
            value_render_option='UNFORMATTED_VALUE'
        )[0]

        emptyFilled = [val if val else dafault for val, dafault in zip_longest(
            reply, data, fillvalue=replace_empty)]
        return emptyFilled

as you could see, nothing to do with google api, nor core functions

@lavigne958 lavigne958 reopened this Oct 28, 2021
@lavigne958
Copy link
Collaborator

You're right, issue re-open.
If you have a PR ready feel free to open it I'll review it.

Thanks.

@alifeee
Copy link
Collaborator

alifeee commented Sep 4, 2023

superceded by #1289

@alifeee alifeee closed this as completed Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants