Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: naming conventions in README #101

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

msarahan
Copy link

This is a rough proposal for documenting some naming conventions, as discussed in https://github.com/rapidsai/cudf/pull/15483/files/470ee3d01c8cfc404b998073e6ec7571817e87fb#r1655137710

I consider the suggestions here very open to debate, so please speak up if you disagree with them.

@msarahan msarahan changed the title Document naming conventions in README docs: naming conventions in README Jun 28, 2024
extras:
table: table_name
key: key_name
```

Currently the supported extras by file type are:
- pyproject.toml
- table: The table in pyproject.toml where the dependencies should be written. Acceptable values are "build-system", "project", and "project.optional-dependencies".
- key: The key corresponding to the dependency list in `table`. This may only be provided for the "project.optional-dependencies" table since the key name is fixed for "build-system" ("requires") and "project" ("dependencies"). Note that this implicitly prohibits including optional dependencies via an inline table under the "project" table.
- table: The table in pyproject.toml where the dependencies should be written. Acceptable values are "build-system", "project", "project.optional-dependencies", and "tool.rapids-build-backend"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically table is not limited to these four values - it can be the name of any table, though these are definitely the most common. Let's think about how we can word this one differently.

- table: The table in pyproject.toml where the dependencies should be written. Acceptable values are "build-system", "project", and "project.optional-dependencies".
- key: The key corresponding to the dependency list in `table`. This may only be provided for the "project.optional-dependencies" table since the key name is fixed for "build-system" ("requires") and "project" ("dependencies"). Note that this implicitly prohibits including optional dependencies via an inline table under the "project" table.
- table: The table in pyproject.toml where the dependencies should be written. Acceptable values are "build-system", "project", "project.optional-dependencies", and "tool.rapids-build-backend"
- key: The key corresponding to the dependency list in `table`. This may only be provided for the "project.optional-dependencies" or "tool.rapids-build-backend" table since the key name is fixed for "build-system" ("requires") and "project" ("dependencies"). Note that this implicitly prohibits including optional dependencies via an inline table under the "project" table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more accurate to say that key cannot be provided for build-system and project, but can be provided for all other values.

Comment on lines +226 to +259
## Naming conventions

With non-trivial collections of files and dependencies, names can look very related and become hard to distinguish. Some general guidelines that will help keep things more understandable:

### Patterns for [RAPIDS build backend](https://github.com/rapidsai/rapids-build-backend)

RBB builds generally include at least two sections:

```
files:
# This dependency brings in RBB
py_build_<something>:
output: pyproject
extras:
table: build-system # This is the thing to match with the py_build_* name
includes:
- <rapids_build_skbuild>
# This dependency expresses the build-time dependencies for your recipe (what might otherwise go in [build-system] when not using RBB)
py_rapids_build_<something>:
output: pyproject
extras:
table: tool.rapids-build-backend # This is the thing to match with the py_rapids_build_* name
key: requires
includes:
- <normal_project_build_time_deps>
```

### Patterns for dependencies

To help distinguish dependency specifications from file specifications, it is recommended to prefix dependency entries
with `dep_<type>_<name>`, where type is one of `build`, `run`, or `test`. This is functionally superfluous, as the `extras.table`
value will ultimately determine where the dependency entries will go, but following the convention will make your dependencies.yaml
file easier to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'll let @vyasr, @jameslamb, and @bdice verify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've seen a dependencies.yaml across RAPIDS that has any entries beginning with a literal dep_... did you mean for that be depends_on_? I have seen depends_on_{something}.

I'd probably just remove that, in favor of gently recommending grouping dependency entries with related purposes using similar prefixes, like build_{project}, run_{project}, test_{project}. cudf and its multiple sub-projects (cudf, cudf-kafka, cudf-polars, etc.) is a great example of where this aids readability:

https://github.com/rapidsai/cudf/blob/78f4a8a3f639677358bce83a699f92c90476ae75/dependencies.yaml#L23-L30


It's also not always the case that a dependency entry is only reserved for one of build, run, or test. For example, it's common to see a matrix of cuda-version dependencies used to constrain tests, docs builds, and the conda environments checked into source control that contributors are encouraged to use to run examples.

Like this: https://github.com/rapidsai/cudf/blob/78f4a8a3f639677358bce83a699f92c90476ae75/dependencies.yaml#L377

My perspective... the most important thing being documented in this paragraph is not the typical patterns for the names...it's the fact that the names are not functionally meaningful (e.g. you could add _sparkles to the end of all of them and not break anything), other than that they're unique identifiers to be used as references elsewhere in the same dependencies.yaml. I'd restructure this paragraph to lead with that, and then maybe include a nudge about standardizing the names if you'd like.

Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I left some comments for your consideration.

@@ -99,6 +99,8 @@ files:

When `output: none` is used, the `conda_dir`, `requirements_dir` and `matrix` keys can be omitted. The use case for `output: none` is described in the [_Additional CLI Notes_](#additional-cli-notes) section below.

If more than one `files` key specifies the same output and *_dir, their contents are merged. This is how different sections are written to a given output file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If more than one `files` key specifies the same output and *_dir, their contents are merged. This is how different sections are written to a given output file.
If more than one `files` key specifies the same output and `*_dir`, their contents are merged. This is how different sections are written to a given output file.

I think this was meant to be read as a literal wildcard. If so, it should be inline code-formatted so another one showing up later in the line doesn't accidentally change this to italics.

Comment on lines +226 to +259
## Naming conventions

With non-trivial collections of files and dependencies, names can look very related and become hard to distinguish. Some general guidelines that will help keep things more understandable:

### Patterns for [RAPIDS build backend](https://github.com/rapidsai/rapids-build-backend)

RBB builds generally include at least two sections:

```
files:
# This dependency brings in RBB
py_build_<something>:
output: pyproject
extras:
table: build-system # This is the thing to match with the py_build_* name
includes:
- <rapids_build_skbuild>
# This dependency expresses the build-time dependencies for your recipe (what might otherwise go in [build-system] when not using RBB)
py_rapids_build_<something>:
output: pyproject
extras:
table: tool.rapids-build-backend # This is the thing to match with the py_rapids_build_* name
key: requires
includes:
- <normal_project_build_time_deps>
```

### Patterns for dependencies

To help distinguish dependency specifications from file specifications, it is recommended to prefix dependency entries
with `dep_<type>_<name>`, where type is one of `build`, `run`, or `test`. This is functionally superfluous, as the `extras.table`
value will ultimately determine where the dependency entries will go, but following the convention will make your dependencies.yaml
file easier to follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've seen a dependencies.yaml across RAPIDS that has any entries beginning with a literal dep_... did you mean for that be depends_on_? I have seen depends_on_{something}.

I'd probably just remove that, in favor of gently recommending grouping dependency entries with related purposes using similar prefixes, like build_{project}, run_{project}, test_{project}. cudf and its multiple sub-projects (cudf, cudf-kafka, cudf-polars, etc.) is a great example of where this aids readability:

https://github.com/rapidsai/cudf/blob/78f4a8a3f639677358bce83a699f92c90476ae75/dependencies.yaml#L23-L30


It's also not always the case that a dependency entry is only reserved for one of build, run, or test. For example, it's common to see a matrix of cuda-version dependencies used to constrain tests, docs builds, and the conda environments checked into source control that contributors are encouraged to use to run examples.

Like this: https://github.com/rapidsai/cudf/blob/78f4a8a3f639677358bce83a699f92c90476ae75/dependencies.yaml#L377

My perspective... the most important thing being documented in this paragraph is not the typical patterns for the names...it's the fact that the names are not functionally meaningful (e.g. you could add _sparkles to the end of all of them and not break anything), other than that they're unique identifiers to be used as references elsewhere in the same dependencies.yaml. I'd restructure this paragraph to lead with that, and then maybe include a nudge about standardizing the names if you'd like.

py_build_<something>:
output: pyproject
extras:
table: build-system # This is the thing to match with the py_build_* name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table: build-system # This is the thing to match with the py_build_* name
table: build-system

I didn't understand the language in this comment and the one below that "this is the thing to match". Could you rephrase that, or consider dropping these inline comments?

The comments above the sections already pretty clearly describe the purpose.


```
files:
# This dependency brings in RBB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This dependency brings in RBB
# This dependency brings in RBB + the build backend it wraps (e.g. scikit-build-core, setuptools)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants