-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add DataFrame.iter_rows #317
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is amazing, well @Priyansh121096 for figuring it all out! π
I just have some minor comments really - do you have time / interest to address them? if not no worries, I can take it from here, lmk your preference
narwhals/dataframe.py
Outdated
|
||
We define a library agnostic function: | ||
|
||
>>> def func(df_any, named): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but this is a "boolean trap" - can we make named
keyword-only for the example please?
i.e.
func(df_any, *, named):
then, when you call it - func(df_pd, named=False)
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
tests/test_common.py
Outdated
), | ||
], | ||
) | ||
@pytest.mark.filterwarnings("ignore:Determining|Resolving.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
tests/test_common.py
Outdated
], | ||
) | ||
@pytest.mark.filterwarnings("ignore:Determining|Resolving.*") | ||
def test_rows( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you make a new file tests/frame/rows_test.py
for this one please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is great! Thanks for the PR! I left a comment to add type hinting π
P.s. it would require some additional imports
narwhals/dataframe.py
Outdated
def rows( | ||
self, *, named: bool = False | ||
) -> list[tuple[Any, ...]] | list[dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a great addition to add type hinting for the two cases:
def rows( | |
self, *, named: bool = False | |
) -> list[tuple[Any, ...]] | list[dict[str, Any]]: | |
@overload | |
def rows( | |
self, *, named: Literal[True] | |
) -> list[dict[str, Any]]: | |
@overload | |
def rows( | |
self, *, named: Literal[False] | |
) -> list[tuple[Any, ...]]: | |
@overload | |
def rows( | |
self, *, named: bool | |
) -> list[tuple[Any, ...]] | list[dict[str, Any]]: | |
def rows( | |
self, *, named: bool = False | |
) -> list[tuple[Any, ...]] | list[dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we want to put these overloads in an if TYPE_CHECKING:
block. What do you think @FBruzzesi ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing it!
What do you think @FBruzzesi ?
This is personal taste grey zone! I will let @MarcoGorelli make the final call π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we want to put these overloads in an if TYPE_CHECKING: block
I didn't know you could do that π³ What you've done looks good to me anyway, it's probably nice to have the overloads near the definitions? no strong opinion
I think what you've done here looks great π
Thanks for your comments. I'll address them soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing stuff! thanks a tonne @Priyansh121096
Just got a question about this functionality - instead of DataFrame.rows
, couldn't we just add DataFrame.iter_rows
?
Because then, if someone wants a list, then can do
rows = list(df.iter_rows())
However, if someone is just going to loop over the rows, then
for row in df.iter_rows():
# do something with `row`
would be more efficient than
for row in `df.rows()`:
# do something with `row`
narwhals/dataframe.py
Outdated
def rows( | ||
self, *, named: bool = False | ||
) -> list[tuple[Any, ...]] | list[dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we want to put these overloads in an if TYPE_CHECKING: block
I didn't know you could do that π³ What you've done looks good to me anyway, it's probably nice to have the overloads near the definitions? no strong opinion
I think what you've done here looks great π
@MarcoGorelli I was going to propose something similar (you beat me to it π). I was thinking we could expose both One issue with this one is I'm not sure if pandas has a convenient equivalent to this when
Please let me know if you're aware of a pandas API which returns an iterator for iterating over rows as dictionaries. |
@Priyansh121096 I tried to play with the I guess that 4. can become |
How about using https://docs.python.org/3/library/collections.html#collections.somenamedtuple._asdict (which , despite the underscore, is public): something like (simplified)
|
I feel like this defeats the purpose of using iter_rows over rows though.
Amazing! This should work. |
@MarcoGorelli I'll raise another PR for iter_rows soon. Can we merge this one? |
thanks - tbh I'm not really sure about DataFrame.rows, I might even suggest deprecating it altogether in Polars itself could we just repurpose this one for DataFrame.iter_rows please? sorry for not having thought about this straight away |
@MarcoGorelli pushed a change for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Priyansh121096 !
@pre-commit.ci autofix |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below.
I've matched the API with https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.rows.html