Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docstring's param descriptions to task's and workflow's input and output variables #557

Merged
merged 15 commits into from
Jul 23, 2021

Conversation

mayitbeegh
Copy link
Contributor

@mayitbeegh mayitbeegh commented Jul 16, 2021

Signed-off-by: Sean Lin [email protected]

TL;DR

This PR fills out the description field in an input Variable with parameter descriptions parsed from the docstring of a Python task.

reStructuredText

        """
        function description
        :param arg1: the first value
        :param arg2: the first value
        :param arg3: the first value
        :returns: arg1/arg2 +arg3
        """

numpydoc

"""
Return (x1 == x2) element-wise.

Unlike `numpy.equal`, this comparison is performed by first
stripping whitespace characters from the end of the string.  This
behavior is provided for backward-compatibility with numarray.

Parameters
----------
x1, x2 : array_like of str or unicode
    Input arrays of the same shape.

Returns
-------
out : ndarray
    Output array of bools.

See Also
--------
not_equal, greater_equal, less_equal, greater, less
"""

googledocstring

'''One line summary.

Extended description.

Args:
    arg1(int): Description of `arg1`
    arg2(str): Description of `arg2`
Returns:
    str: Description of return value.
'''

TODOs:
Support for different styles of docstring (Sphinx reStructuredText, numpy, and google)
Figure out the best way to add descriptions to a namedtuple output when :returns: contains shared description. Maybe encourage the use of :returns foo: bar?
Move the parser out of common/util

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Style guide

https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html#the-sphinx-docstring-format
https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard
https://google.github.io/styleguide/pyguide.html

Library

As tempted as it sounds to write a new parser, there are a few helpful libraries out there:

docutils

Often used as a command line tool, the library can be used for parsing reStructuredText programatically.
Pros:

  • Works with reStructuredText

Cons:

  • Doesn't work well with numpydoc, I think
  • Hard to use

Sphinx (with Napoleon extension)

A documentation generator. Can be used for parsing docstring programatically.
Pros:

  • Works with Sphinx reStructuredText, numpy, and google
  • Separates comma separated numpydoc parameter

Cons:

  • Somewhat closely coupled with Sphinx stuff
  • 14.7 MB in size

docstring_parser (recommend)

Small docstring library that showed up in Google. Seems legit.
https://github.com/rr-/docstring_parser
Pros:

  • 213 KB in size
  • Easy to use
  • Works with Sphinx reStructuredText, numpy, and google. Plus auto style detection
  • Is active

Cons:

  • Has issue with directives
  • Doesn't do anything about comma separated numpydoc parameters (could be pro)
  • Doesn't support multiple return values in numpy style docstring (fixed in this PR)

function.__doc__ vs inspect.getdoc()

function.__doc__ provides raw docstring, which is good for ascii arts or other creative stuff. WYSIWYG. The raw docstring may come with extra indentations that screws up parser. inspect.getdoc() cleans up the extra indentations. Looks like the clean up is part of PEP 257, so we're going with that.

Output descriptions

Usually the return value doesn't have a name associated with it, so we have to work with that. In the future, maybe we can encourage task authors to separate the return values for namedtuples similar to the input parameters.
Example multiple return values:
numpydoc

Returns
-------
err_code : int
    Non-zero value indicates error code, or zero on success.
err_msg : str or None
    Human readable error message, or None on success.

Seems like numpydoc supports multiple return values, but Sphinx reStructuredText and google docstring do not. Similar to params, we can encourage the use of :returns o1: description. It's valid reStructuredText and it's not going to break Spinx, but this style is not recognized.

Sticking with replicating the output description for now.

Tracking Issue

flyteorg/flyte#1235

Follow-up issue

NA

@welcome
Copy link

welcome bot commented Jul 16, 2021

Thank you for opening this pull request! 🙌
These tips will help get your PR across the finish line: - Most of the repos have a PR template; if not, fill it out to the best of your knowledge. - Sign off your commits (Reference: DCO Guide).

@mayitbeegh mayitbeegh changed the title wip: add docstring's param descriptions to input variables Add docstring's param descriptions to task's input and output variables Jul 21, 2021
Signed-off-by: Sean Lin <[email protected]>
@codecov
Copy link

codecov bot commented Jul 21, 2021

Codecov Report

Merging #557 (e646831) into master (51373f6) will decrease coverage by 0.00%.
The diff coverage is 91.86%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #557      +/-   ##
==========================================
- Coverage   85.47%   85.46%   -0.01%     
==========================================
  Files         372      376       +4     
  Lines       28995    29158     +163     
  Branches     2318     2338      +20     
==========================================
+ Hits        24783    24921     +138     
- Misses       3569     3606      +37     
+ Partials      643      631      -12     
Impacted Files Coverage Δ
flytekit/clients/friendly.py 67.20% <ø> (+0.79%) ⬆️
tests/flytekit/unit/core/test_serialization.py 93.02% <87.50%> (-0.18%) ⬇️
tests/flytekit/unit/core/test_interface.py 85.13% <88.00%> (+0.58%) ⬆️
flytekit/core/docstring.py 89.47% <89.47%> (ø)
tests/flytekit/unit/core/test_docstring.py 91.11% <91.11%> (ø)
flytekit/core/base_task.py 88.00% <100.00%> (+0.05%) ⬆️
flytekit/core/interface.py 80.54% <100.00%> (+1.11%) ⬆️
flytekit/core/python_auto_container.py 81.37% <100.00%> (+0.18%) ⬆️
flytekit/core/python_function_task.py 88.11% <100.00%> (+0.11%) ⬆️
flytekit/core/workflow.py 87.70% <100.00%> (+0.03%) ⬆️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51373f6...e646831. Read the comment docs.

@mayitbeegh mayitbeegh marked this pull request as ready for review July 21, 2021 17:35
@mayitbeegh mayitbeegh marked this pull request as draft July 21, 2021 18:04
@mayitbeegh mayitbeegh changed the title Add docstring's param descriptions to task's input and output variables wip: Add docstring's param descriptions to task's input and output variables Jul 21, 2021
* add support for named output descriptions in docstring

Signed-off-by: Sean Lin <[email protected]>
Signed-off-by: Sean Lin <[email protected]>
Signed-off-by: Sean Lin <[email protected]>
@mayitbeegh mayitbeegh marked this pull request as ready for review July 21, 2021 23:34
@mayitbeegh mayitbeegh changed the title wip: Add docstring's param descriptions to task's input and output variables Add docstring's param descriptions to task's input and output variables Jul 21, 2021
Signed-off-by: wild-endeavor <[email protected]>
@mayitbeegh mayitbeegh changed the title Add docstring's param descriptions to task's input and output variables Add docstring's param descriptions to task's and workflow's input and output variables Jul 23, 2021
@mayitbeegh mayitbeegh merged commit 8c6764c into master Jul 23, 2021
@mayitbeegh mayitbeegh deleted the feat/docstring-to-variable-desc branch July 23, 2021 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants