Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve arrow-rs examples #1613

Closed
3 of 6 tasks
alamb opened this issue Apr 24, 2022 · 3 comments
Closed
3 of 6 tasks

Improve arrow-rs examples #1613

alamb opened this issue Apr 24, 2022 · 3 comments
Labels
arrow Changes to the arrow crate documentation Improvements or additions to documentation help wanted

Comments

@alamb
Copy link
Contributor

alamb commented Apr 24, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
To help people get up to speed with a new library quickly, it is helpful to start with compiling examples. The arrow-rs docs have many small working examples, but it would also be nice to also include some full featured examples.

clap-rs has a nice example of examples: https://github.com/clap-rs/clap/blob/v3.1.12/examples/README.md

This is especially useful as we start making them easier to find such as apache/arrow-cookbook#185

Describe the solution you'd like
Here are some possible improvements

@alamb alamb added documentation Improvements or additions to documentation arrow Changes to the arrow crate help wanted labels Apr 24, 2022
@datapythonista
Copy link
Contributor

datapythonista commented Jun 5, 2022

I'm planning to work on this. What I'd personally do, is to have many small examples of increasing complexity. So, besides examples and recipes, it can be used as a tutorial, to learn Arrow topics step by step.

If people are happy with this, I'll start working on PRs for the next:

  • Creating arrays for primitive types
    • With the array constructor (e.g. Int32Array::from(vec![...]))
    • With a builder (using append_value and `append_null)
    • With collect()
  • Creating arrays with null values. I'm unsure about this one, if the above are simple enough, probably we can have this in the above examples. But worth having this here for consideration for now
  • Creating arrays of more complex types (e.g. Dictionary, Struct...)
  • Casting data to different types
  • Creating Schema
  • Creating RecordBatch
  • Reading from different formats
    • Parquet
    • CSV
    • JSON
  • Writing to different formats (same)
  • Data manipulation and kernels. Will expand on this when the rest are done, for now just couple of simple examples to have something (comparisons, sorting, simple aggregations like min, max or sum).
  • Sharing data, IPC and Flight/gRPC. I'm not very familiar with this yet, but I guess it makes sense to have some examples too.

Not sure how feasible it is, but would be amazing if we could render those examples (which will have documentation explaining what's going on) direct to the Arrow cookbook. I think it's a bit tricky, but doable. And I think it's better than having to maintain two different cookbooks/examples, or just having them in one place.

Feedback on any of this very welcome.

@alamb
Copy link
Contributor Author

alamb commented Jun 7, 2022

Thank you @datapythonista ! That sounds like a great (and also quite ambitious) plan

I remember @elferherrera may have worked on something similar so perhaps he has a comment.

Most array types have examples in the rust docs (thanks to @novemberkilo ), for example https://docs.rs/arrow/15.0.0/arrow/array/struct.DictionaryArray.html and https://docs.rs/arrow/15.0.0/arrow/array/type.Int64Array.html -- perhaps we could create a 'cookbook' that has the key examples and then links to the docs for more details -- there may be some way to keep the content in the cookbook and then include them into the rust docs

In general, it might be worth thinking how these examples will be maintained (specifically how to make sure they key working) -- rustdoc examples get automatically checked as part of CI and there is a way to run examples in markdown as well -- perhaps via https://crates.io/crates/doc-comment or something similar (example https://github.com/LaikaStudios/shotgrid-rs/pull/12/files)

Creating arrays with null values. I'm unsure about this one, if the above are simple enough, probably we can have this in the above examples. But worth having this here for consideration for now

I agree that it may be enough to start with arrays with

@alamb
Copy link
Contributor Author

alamb commented Dec 8, 2023

I don't think this is tracking anything actionable now, so closing this ticket

@alamb alamb closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate documentation Improvements or additions to documentation help wanted
Projects
None yet
Development

No branches or pull requests

2 participants