Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs for Training Operator 1.8 #1998

Closed
3 of 4 tasks
Tracked by #1994
andreyvelich opened this issue Jan 25, 2024 · 8 comments
Closed
3 of 4 tasks
Tracked by #1994

Improve docs for Training Operator 1.8 #1998

andreyvelich opened this issue Jan 25, 2024 · 8 comments

Comments

@andreyvelich
Copy link
Member

andreyvelich commented Jan 25, 2024

On the recent AutoML and Training WG call we discuss how we can improve the documentation for Training Operator and onboarding for new contributors.

We identify several action items that we can work before the release:

Please let me know if we should add something else @kubeflow/release-managers @kubeflow/wg-training-leads @tenzen-y @shashank-iitbhu.

@tenzen-y
Copy link
Member

Thank you for raising this great issue!
Describing all features in the doc would be great.
For example, we don't have any doc for TFJob with enableDynamicWorker.

So, as a first iteration, we should identify which feature we don't have any document.

@andreyvelich
Copy link
Member Author

cc @andreeamun

@StefanoFioravanzo
Copy link
Member

@andreyvelich @tenzen-y As discussed, I looked into the training operator docs and I want to propose an initial refactoring to better align with best practices in how technical docs are organized.

A little premise to my porposal: in general you want tech docs to be organized in macro sections that roughly address

  • "Overview/Installation/GettingStarted"
  • "HowTOs/UserGuides"
  • "Reference" (Anything from autogen API docs, to arch diagrams, implementation details, etc.)
  • "Explanation" (anything that concerns explaining in free form why the project took some decisions, or discussions ecosystem, integrations, etc.

In our case we may also want to consider a "Developer" section, particularly useful for OSS projects.

Now, I can see clear ways to improve the current doc structure to better align with that model. Here are some suggestions:

  1. Split "Overview" into
    • "Overview" - trimmed down to only contain an intro to the project, how it fits within the ecosystem, who should care and why
    • "Getting Started" - a (one or two) simple example to experiment with the training operator. No explanation required, something that just works end to end
    • "Installation" - particularly important for those who want to install without Kubeflow Platform
    • Move the Architecture part to a new section "Reference"
  2. Move "Job Scheduling" under a new section called "User Guides", with the name "Advanced Scheduling". The main page provides an overview and then we have two child pages respectively called "Volvano" and "Scheduler Plugins"
  3. Revisit each framework page with the following process:
    1. Create a “<framework_name> Training>” under “User Guides” -> all the “how do I do something” goes here
    2. Create a “<framework_name>” under “Reference” -> all the CRD reference + implementation details go here.

This doesn't have to happen all in one PR, that's why I split into sequential steps. Let me know what you think. We can start iterating on some of these points in draft PRs and I am happy to get this started.

@andreyvelich
Copy link
Member Author

Thank you so much for this @StefanoFioravanzo, I really like your ideas.
A few questions:

  • Should we order Installation before Getting Started page ? Like in Model Registry docs.
  • Do we want to separate guides between Users, Administrators, and Developers like in KServe docs or Jupyter Docs or we can do it in the next iteration ?
    • For example, initially we can move all guides to the User Guides.

all the CRD reference + implementation details go here.

We don't have CRD reference right now, how should we split these sections?

@kubeflow/wg-training-leads what are your thoughts ?

@StefanoFioravanzo
Copy link
Member

@andreyvelich

Should we order Installation before Getting Started page ?

Yes let's keep installation before getting started. It makes sense for folks who need to go through the installation before getting their hands on.

Do we want to separate guides between Users, Administrators, and Developers

I am in favour of having additional grouping based on the persona. But, as a first step, I recommend limiting the amount of change. So, as you suggest, let's move all how-tos/guides to a generic "user guides" section. Once we go through this initial restructuring exercise, we can further refine.

We don't have CRD reference right now, how should we split these sections?

I think we do. I think I saw some generic CRD reference for some of the frameworks. If we don't have enough details, we can still add a "TBD" under a framework's reference/API guide.

@andreyvelich
Copy link
Member Author

@StefanoFioravanzo I think, we have only this one: https://github.com/kubeflow/training-operator/blob/master/docs/api/kubeflow.org_v1_generated.asciidoc, but I am not sure if we keep this doc updated.
Isn't it @kubeflow/wg-training-leads ?

@StefanoFioravanzo
Copy link
Member

@andreyvelich since we merged kubeflow/website#3719, can we revisit the first comment of this issue? What do we want to address for training operator 1.8 (Kubeflow 1.9)?

@andreyvelich
Copy link
Member Author

I think, as part of Kubeflow 1.9 we completed all items.
Let me close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants