feat: add DataFrame.iter_rows #317

Priyansh121096 · 2024-06-18T14:06:56Z

What type of PR is this? (check all applicable)

Related issues

Related issue #
Closes [Enh]: Add support for DataFrame.rows #285

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

I've matched the API with https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.rows.html

MarcoGorelli

this is amazing, well @Priyansh121096 for figuring it all out! 🙌

I just have some minor comments really - do you have time / interest to address them? if not no worries, I can take it from here, lmk your preference

MarcoGorelli · 2024-06-18T14:14:03Z

narwhals/dataframe.py

+
+            We define a library agnostic function:
+
+            >>> def func(df_any, named):


nitpick, but this is a "boolean trap" - can we make named keyword-only for the example please?

i.e.

func(df_any, *, named):

then, when you call it - func(df_pd, named=False), etc

MarcoGorelli · 2024-06-18T14:14:55Z

tests/test_common.py

+        ),
+    ],
+)
+@pytest.mark.filterwarnings("ignore:Determining|Resolving.*")


do we need this in this test?

MarcoGorelli · 2024-06-18T14:15:24Z

tests/test_common.py

+    ],
+)
+@pytest.mark.filterwarnings("ignore:Determining|Resolving.*")
+def test_rows(


could you make a new file tests/frame/rows_test.py for this one please?

FBruzzesi

That is great! Thanks for the PR! I left a comment to add type hinting 😁
P.s. it would require some additional imports

FBruzzesi · 2024-06-18T14:20:55Z

narwhals/dataframe.py

+    def rows(
+        self, *, named: bool = False
+    ) -> list[tuple[Any, ...]] | list[dict[str, Any]]:


I think it would be a great addition to add type hinting for the two cases:

Suggested change

def rows(

self, *, named: bool = False

) -> list[tuple[Any, ...]] | list[dict[str, Any]]:

@overload

def rows(

self, *, named: Literal[True]

) -> list[dict[str, Any]]:

@overload

def rows(

self, *, named: Literal[False]

) -> list[tuple[Any, ...]]:

@overload

def rows(

self, *, named: bool

) -> list[tuple[Any, ...]] | list[dict[str, Any]]:

def rows(

self, *, named: bool = False

) -> list[tuple[Any, ...]] | list[dict[str, Any]]:

I was wondering if we want to put these overloads in an if TYPE_CHECKING: block. What do you think @FBruzzesi ?

Thanks for addressing it!

What do you think @FBruzzesi ?

This is personal taste grey zone! I will let @MarcoGorelli make the final call 😁

I was wondering if we want to put these overloads in an if TYPE_CHECKING: block

I didn't know you could do that 😳 What you've done looks good to me anyway, it's probably nice to have the overloads near the definitions? no strong opinion

I think what you've done here looks great 🙌

Priyansh121096 · 2024-06-18T14:57:59Z

Thanks for your comments. I'll address them soon.

MarcoGorelli

amazing stuff! thanks a tonne @Priyansh121096

Just got a question about this functionality - instead of DataFrame.rows, couldn't we just add DataFrame.iter_rows?

Because then, if someone wants a list, then can do

rows = list(df.iter_rows())

However, if someone is just going to loop over the rows, then

for row in df.iter_rows():
    # do something with `row`

would be more efficient than

for row in `df.rows()`:
    # do something with `row`

MarcoGorelli · 2024-06-18T17:50:18Z

narwhals/dataframe.py

+    def rows(
+        self, *, named: bool = False
+    ) -> list[tuple[Any, ...]] | list[dict[str, Any]]:


I was wondering if we want to put these overloads in an if TYPE_CHECKING: block

I didn't know you could do that 😳 What you've done looks good to me anyway, it's probably nice to have the overloads near the definitions? no strong opinion

I think what you've done here looks great 🙌

Priyansh121096 · 2024-06-19T10:55:12Z

Just got a question about this functionality - instead of DataFrame.rows, couldn't we just add DataFrame.iter_rows?

@MarcoGorelli I was going to propose something similar (you beat me to it 😄). I was thinking we could expose both rows and iter_rows since the polars API does as well.

One issue with this one is I'm not sure if pandas has a convenient equivalent to this when named=True. Rest of the cases are fine:

polars + named=False = df.iter_rows(named=False)
polars + named=True = df.iter_rows(named=True)
pandas + named=False = df.itertuples(index=False, name=None)
pandas + named=True = ?

Please let me know if you're aware of a pandas API which returns an iterator for iterating over rows as dictionaries.

FBruzzesi · 2024-06-19T11:21:42Z

@Priyansh121096 I tried to play with the into argument of .to_dict() method, sadly with no success.

I guess that 4. can become iter(df.to_dict("records"))?!

MarcoGorelli · 2024-06-19T12:32:14Z

How about using https://docs.python.org/3/library/collections.html#collections.somenamedtuple._asdict (which , despite the underscore, is public):

something like (simplified)

def iter_rows(df):
    yield from (row._asdict() for row in df.itertuples(index=False))

Priyansh121096 · 2024-06-19T12:41:06Z

I guess that 4. can become iter(df.to_dict("records"))?!

I feel like this defeats the purpose of using iter_rows over rows though.

How about using https://docs.python.org/3/library/collections.html#collections.somenamedtuple._asdict (which , despite the underscore, is public):

Amazing! This should work.

Priyansh121096 · 2024-06-20T16:54:58Z

@MarcoGorelli I'll raise another PR for iter_rows soon. Can we merge this one?

MarcoGorelli · 2024-06-20T22:39:50Z

thanks - tbh I'm not really sure about DataFrame.rows, I might even suggest deprecating it altogether in Polars itself

could we just repurpose this one for DataFrame.iter_rows please? sorry for not having thought about this straight away

Priyansh121096 · 2024-06-21T15:43:18Z

could we just repurpose this one for DataFrame.iter_rows please?

@MarcoGorelli pushed a change for this.

MarcoGorelli

thanks @Priyansh121096 !

narwhals/dataframe.py

MarcoGorelli · 2024-06-21T15:56:42Z

@pre-commit.ci autofix

feat: add DataFrame.rows

4aa8106

github-actions bot added the enhancement New feature or request label Jun 18, 2024

Priyansh121096 mentioned this pull request Jun 18, 2024

[Enh]: Add support for DataFrame.rows #285

Closed

MarcoGorelli reviewed Jun 18, 2024

View reviewed changes

Dont need the filterwarnings

7a786fc

FBruzzesi reviewed Jun 18, 2024

View reviewed changes

Priyansh121096 added 4 commits June 18, 2024 16:03

Avoid boolean trap in doctest

71ed592

Add typing.overloads for DataFrame.rows

d9ec130

Move out tests into tests/frame/rows_test.py and add more tests

1668807

Ignore FutureWarning from modin

5eabf0a

MarcoGorelli reviewed Jun 18, 2024

View reviewed changes

Replace rows with iter_rows

4baf74c

MarcoGorelli approved these changes Jun 21, 2024

View reviewed changes

narwhals/dataframe.py Outdated Show resolved Hide resolved

Update narwhals/dataframe.py

21419a3

Update dataframe.py

ee7421f

MarcoGorelli merged commit 4018aec into narwhals-dev:main Jun 21, 2024
15 checks passed

Priyansh121096 deleted the feat-rows branch June 21, 2024 16:24

Priyansh121096 changed the title ~~feat: add DataFrame.rows~~ feat: add DataFrame.iter_rows Jun 21, 2024

Priyansh121096 mentioned this pull request Jun 29, 2024

feat: add DataFrame.rows #351

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DataFrame.iter_rows #317

feat: add DataFrame.iter_rows #317

Priyansh121096 commented Jun 18, 2024 •

edited

Loading

MarcoGorelli left a comment

MarcoGorelli Jun 18, 2024

Priyansh121096 Jun 18, 2024

MarcoGorelli Jun 18, 2024

Priyansh121096 Jun 18, 2024

MarcoGorelli Jun 18, 2024

Priyansh121096 Jun 18, 2024

FBruzzesi left a comment

FBruzzesi Jun 18, 2024 •

edited

Loading

Priyansh121096 Jun 18, 2024

Priyansh121096 Jun 18, 2024

FBruzzesi Jun 18, 2024

MarcoGorelli Jun 18, 2024 •

edited

Loading

Priyansh121096 commented Jun 18, 2024

MarcoGorelli left a comment

MarcoGorelli Jun 18, 2024 •

edited

Loading

Priyansh121096 commented Jun 19, 2024

FBruzzesi commented Jun 19, 2024

MarcoGorelli commented Jun 19, 2024

Priyansh121096 commented Jun 19, 2024

Priyansh121096 commented Jun 20, 2024

MarcoGorelli commented Jun 20, 2024

Priyansh121096 commented Jun 21, 2024

MarcoGorelli left a comment

MarcoGorelli commented Jun 21, 2024


		We define a library agnostic function:

		>>> def func(df_any, named):

feat: add DataFrame.iter_rows #317

feat: add DataFrame.iter_rows #317

Conversation

Priyansh121096 commented Jun 18, 2024 • edited Loading

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FBruzzesi left a comment

Choose a reason for hiding this comment

FBruzzesi Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Priyansh121096 commented Jun 18, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Priyansh121096 commented Jun 19, 2024

FBruzzesi commented Jun 19, 2024

MarcoGorelli commented Jun 19, 2024

Priyansh121096 commented Jun 19, 2024

Priyansh121096 commented Jun 20, 2024

MarcoGorelli commented Jun 20, 2024

Priyansh121096 commented Jun 21, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Jun 21, 2024

Priyansh121096 commented Jun 18, 2024 •

edited

Loading

FBruzzesi Jun 18, 2024 •

edited

Loading

MarcoGorelli Jun 18, 2024 •

edited

Loading

MarcoGorelli Jun 18, 2024 •

edited

Loading