Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37730][PYTHON] Replace use of MPLPlot._add_legend_handle with MPLPlot._append_legend_handles_labels #35000

Closed
wants to merge 5 commits into from

Conversation

mslapek
Copy link
Contributor

@mslapek mslapek commented Dec 23, 2021

What changes were proposed in this pull request?

Replace use of MPLPlot._add_legend_handle (removed in pandas) with MPLPlot._append_legend_handles_labels in histogram and KDE plots.
Based on: pandas-dev/pandas@029907c

Why are the changes needed?

Fix of SPARK-37730. plot.hist and plot.kde don't throw AttributeError for pandas=1.3.5.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested with existing plot test on CI (for older pandas only). (it seems that CI doesn't run matplotlib tests, see #35000 (comment))

I've run tests on a local computer, see #35000 (comment) :

$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py

QUESTION: Maybe add plot testing for pandas 1.3.5 on CI? (I've noticed that CI uses pandas=1.3.4, maybe update it to 1.3.5?)

@mslapek mslapek changed the title [WIP][SPARK-37730][PYTHON] [WIP][SPARK-37730][PYTHON] Replace use of MPLPlot._add_legend_handle with MPLPlot._add_legend_handle Dec 23, 2021
@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

Thanks for taking a look at this, @mslapek. Mind enabling GitHub Actions in your forked repository? (see also https://github.com/apache/spark/pull/35000/checks?check_run_id=4617900123). Apache Spark leverages the GitHub Actions resources from author's forked repository.

@HyukjinKwon
Copy link
Member

cc @itholic would you mind taking a look please?

@SparkQA
Copy link

SparkQA commented Dec 23, 2021

Test build #146523 has finished for PR 35000 at commit 2a33641.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class PandasOnSparkHistPlot(PandasHistPlot, HistogramPlotBase, _PandasOnSparkKdeHistCommon):
  • class PandasOnSparkKdePlot(PandasKdePlot, KdePlotBase, _PandasOnSparkKdeHistCommon):

@mslapek
Copy link
Contributor Author

mslapek commented Dec 23, 2021

Thanks for taking a look at this, @mslapek. Mind enabling GitHub Actions in your forked repository? (see also https://github.com/apache/spark/pull/35000/checks?check_run_id=4617900123). Apache Spark leverages the GitHub Actions resources from author's forked repository.

I've enabled workflow only for "Build and test". 🤦‍♂️ Now all workflows should be enabled.

@SparkQA
Copy link

SparkQA commented Dec 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50999/

@SparkQA
Copy link

SparkQA commented Dec 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50999/

@mslapek mslapek changed the title [WIP][SPARK-37730][PYTHON] Replace use of MPLPlot._add_legend_handle with MPLPlot._add_legend_handle [SPARK-37730][PYTHON] Replace use of MPLPlot._add_legend_handle with MPLPlot._append_legend_handles_labels Dec 23, 2021
@mslapek
Copy link
Contributor Author

mslapek commented Dec 24, 2021

One hour ago I got result from GitHub action "Report test results": Error: no artifacts found.

@HyukjinKwon Maybe all actions must be manually requested to rerun? 😕

https://github.com/mslapek/spark/actions/runs/1618397788

@HyukjinKwon
Copy link
Member

You can rebase to the latest master branch, or push an empty commit e.g.) git commit --allow-empty

@SparkQA
Copy link

SparkQA commented Dec 24, 2021

Test build #146564 has finished for PR 35000 at commit 0f223f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class PandasOnSparkHistPlot(PandasHistPlot, HistogramPlotBase, _PandasOnSparkKdeHistCommon):
  • class PandasOnSparkKdePlot(PandasKdePlot, KdePlotBase, _PandasOnSparkKdeHistCommon):

@SparkQA
Copy link

SparkQA commented Dec 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51039/

@SparkQA
Copy link

SparkQA commented Dec 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51039/

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@itholic
Copy link
Contributor

itholic commented Dec 27, 2021

❓ QUESTION: Maybe add plot testing for pandas 1.3.5 on CI? (I've noticed that CI uses pandas=1.3.4, maybe update it to 1.3.5?)

Yeah, I think we can update the pandas version to 1.3.5 in CI, and add related tests.

And could you briefly show what you fixed with Before & After example if possible ??

@HyukjinKwon
Copy link
Member

we already use latest pandas always. I suspect it's not uploaded to pypi or github actions cannot reach yet if it uses 1.3.5. matplotlib test case is disabled IIRC in CI so that's probably why the tests pass: https://app.codecov.io/gh/apache/spark/blob/master/python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py

@itholic
Copy link
Contributor

itholic commented Dec 27, 2021

+1 for #35000 (comment) and #35000 (comment).

Otherwise, LGTM 👍

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2 except two comments

@mslapek
Copy link
Contributor Author

mslapek commented Dec 27, 2021

And could you briefly show what you fixed with Before & After example if possible ??

A brief example is given in the issue: https://issues.apache.org/jira/browse/SPARK-37730

I've run locally matplotlib tests:

$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py

After fix (my latest commit a1ca73d):

✔️ New pandas

$ python -c "import pandas as pd; print(pd.__version__)"
1.3.5
$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py
...
Ran 14 tests in 35.059s

OK

✔️ Old pandas

$ python -c "import pandas as pd; print(pd.__version__)"
1.2.5
$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py
...
Ran 14 tests in 36.258s

OK

Main branch before fix (commit c0cedb1):

❌ New pandas

$ python -c "import pandas as pd; print(pd.__version__)"
1.3.5
$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py
...
ERROR: test_single_value_hist (pyspark.pandas.tests.plot.test_series_plot_matplotlib.SeriesPlotMatplotlibTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/develop/repo/spark/python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py", line 391, in test_single_value_hist
    ax2 = psdf["single"].plot.hist()
...
  File "/home/develop/repo/spark/python/pyspark/pandas/plot/matplotlib.py", line 386, in _make_plot
    self._add_legend_handle(artists[0], label, index=i)
AttributeError: 'PandasOnSparkHistPlot' object has no attribute '_add_legend_handle'

...
======================================================================
FAIL: test_pie_plot (pyspark.pandas.tests.plot.test_series_plot_matplotlib.SeriesPlotMatplotlibTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/develop/repo/spark/python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py", line 137, in test_pie_plot
    self.assertEqual(bin1, bin2)
AssertionError: b'iVB[159 chars]dpAABSIUlEQVR4nO3dd3gc1d098DPbtLvqvdlqttwLbhhs[28020 chars]gg==' != b'iVB[159 chars]dpAABRr0lEQVR4nO3dd3xedf3//+e5VvZo2szOdO/SAYWC[27868 chars]gg=='

----------------------------------------------------------------------
Ran 14 tests in 33.338s

FAILED (failures=2, errors=4)

✔️ Old pandas

$ python -c "import pandas as pd; print(pd.__version__)"
1.2.5
$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py
...
Ran 14 tests in 36.266s

OK

@HyukjinKwon
Copy link
Member

Thanks for testing it thoroughly. LGTM

@mslapek
Copy link
Contributor Author

mslapek commented Dec 27, 2021

It seems that CI wants me to reformat file, which I haven't changed: python/pyspark/shuffle.py

I've merged main branch.

@zero323
Copy link
Member

zero323 commented Dec 28, 2021

@mslapek

It seems that CI wants me to reformat file, which I haven't changed: python/pyspark/shuffle.py

I've merged main branch.

We've recently updated (ac13494) Black to 21.12b0 and there is change of behavior compared to older releases.

@HyukjinKwon
Copy link
Member

I manually ran the related tests locally, and verified that it passed.

Merged to master.

@HyukjinKwon
Copy link
Member

@itholic would you mind investigating to enable matplotlib related tests?

@mslapek
Copy link
Contributor Author

mslapek commented Dec 28, 2021

@HyukjinKwon @itholic Thanks for review and help! 😃

@dongjoon-hyun
Copy link
Member

Thank you, @mslapek, @HyukjinKwon , @itholic .
I tested this manually on branch-3.2 and backported for Apache Spark 3.2.2 RC1.

Starting test(python3): pyspark.pandas.tests.plot.test_series_plot_matplotlib
Finished test(python3): pyspark.pandas.tests.plot.test_series_plot_matplotlib (75s)

dongjoon-hyun pushed a commit that referenced this pull request Jul 10, 2022
…MPLPlot._append_legend_handles_labels

### What changes were proposed in this pull request?

Replace use of MPLPlot._add_legend_handle (removed in pandas) with MPLPlot._append_legend_handles_labels in histogram and KDE plots.
Based on:  pandas-dev/pandas@029907c

### Why are the changes needed?

Fix of SPARK-37730. plot.hist and plot.kde don't throw AttributeError for pandas=1.3.5.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

~~Tested with existing plot test on CI (for older pandas only).~~ (it seems that CI doesn't run matplotlib tests, see #35000 (comment))

I've run tests on a local computer, see #35000 (comment) :
```
$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py
```

:question: **QUESTION:** Maybe add plot testing for pandas 1.3.5 on CI? (I've noticed that CI uses `pandas=1.3.4`, maybe update it to `1.3.5`?)

Closes #35000 from mslapek/fixpythonplot.

Authored-by: Michał Słapek <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 371e307)
Signed-off-by: Dongjoon Hyun <[email protected]>
@HyukjinKwon
Copy link
Member

Thanks!!!!!!

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…MPLPlot._append_legend_handles_labels

### What changes were proposed in this pull request?

Replace use of MPLPlot._add_legend_handle (removed in pandas) with MPLPlot._append_legend_handles_labels in histogram and KDE plots.
Based on:  pandas-dev/pandas@029907c

### Why are the changes needed?

Fix of SPARK-37730. plot.hist and plot.kde don't throw AttributeError for pandas=1.3.5.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

~~Tested with existing plot test on CI (for older pandas only).~~ (it seems that CI doesn't run matplotlib tests, see apache#35000 (comment))

I've run tests on a local computer, see apache#35000 (comment) :
```
$ python python/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py
```

:question: **QUESTION:** Maybe add plot testing for pandas 1.3.5 on CI? (I've noticed that CI uses `pandas=1.3.4`, maybe update it to `1.3.5`?)

Closes apache#35000 from mslapek/fixpythonplot.

Authored-by: Michał Słapek <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 371e307)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants