-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] Proposal move object_store
to its own github repo?
#6183
Comments
Hi, @alamb, thanks a lot for raising this discussion.
Most of my contributions, both in coding and reviewing, were focused on object_store. I feel that object_store is distinct from other arrow-related crates. I believe it would be beneficial to move
Since
I'm willing to help do this. But I'm guessing we need:
|
I agree -- thank you. I happen to be both a PMC member of Arrow / committer so I have the relevant permissions. As long as there are no concerns I'll send a note to the arrow list in a few days
I recently learned about "collaborators" in asf.yml https://github.com/apache/infrastructure-asfyaml?tab=readme-ov-file#triage where we could give someone rights to manage issues so in theory someone else could also help with thi |
(thank you!) |
I'm happy to help with this. ONE question: Do we need to migrate already closed issues/discussions? |
I think given it lives under the arrow PMC and not a separate arrow-rs PMC, I think it would be
Perhaps we could articulate what these overheads are? There might be ways to alleviate them that require less effort/churn
I don't feel very strongly either way, I think it really depends what the long-term goals for the project are. My vision was for it to parallel the arrow filesystem abstractions present in arrow-cpp / pyarrow. This would naturally entail it residing long-term as part of the arrow project, and being largely developed within that context. With this as the goal splitting it into a separate repository seems like a lot of work and disruption for relatively minor return, if not actively detrimental to that goal. If there is instead an initiative to split it out into its own top-level apache project, splitting it out into a new repository would be a natural next step. |
In my mind it is:
We could certainly make this better by making separate release documentation and I could add some filters on tags to my view.
The "what is the vision of the project" is a great discussion. I agree if the vision is that object_store will mostly be used by arrow-rs and related technologies (like DataFusion) splitting into a new repo would likely be more detrimental An alternate vision for object_store is a Rust ecosystem wide library for interacting with object stores, where arrow-rs / DataFusion were one of many downstream projects. The reverse dependencies at the moment would suggest DataFusion and related projects are the largest users at the moment: https://crates.io/crates/object_store/reverse_dependencies |
I'm not super involved w/ release cutting and admin things of
However I also understand that an extra repo may come w/ extra overhead. I think my answer is mostly in line with @alamb. |
Another annoyance of mine is that that the commit history is intermixed Specifically, if you look at https://github.com/apache/arrow-rs/commits/master/ it is not clear which commits modify |
I think if there is a group of people willing to undertake the work of splitting it out, and to foster, maintain and build a community around said project, I think that is a very exciting path forward. My concern would be that it gets split out without a very clear community around it, and this then hampers its ongoing development and maintenance. I'm especially wary given my reduced capacity going forwards, which would leave the repository with very few active maintainers.
One could make the argument that is what the changelogs are for, but I take your point |
+1 |
|
I am willing to help with this
I agree this is is a risk. However, I think it may actually be that moving to a new repo makes it easier to attract new contributors and maintainers (I am thinking about @Xuanwo for example who has been happy to help) -- I think the number of other things in the arrow-rs repo could make the barrier to contribution of |
I can also help with maintainance |
Yes 🙌, I'm willing to help build this project |
Also happy to help |
Awesome. It seems as if we have reached consensus here that moving to a new repo will be a good idea. What I would like to to is to complete the current release It seems to me the only potentially unresolved issue is the new name for the repo. Here are the options so far
I'll also send a note to the dev list asking about this too from the broader community. |
Is the |
Yes, we need to use the It is a reasonable (though larger discussion) if there is a better organizational home for this crate |
Here is another reason to move to a new repo. The Full Changelog link in the changelog https://github.com/apache/arrow-rs/blob/master/object_store/CHANGELOG.md object_store_0.10.1...object_store_0.10.2 Shows commits from both object_store and arrow which is confusing |
I like the idea of |
Hi, it's a pity that GitHub doesn't support multi-level project layouts, so we have to use prefixes to indicate PMC ownership. However, it is still possible for us to be elevated to a top-level project once we are mature enough. We can use |
Another annoyance while creating the arrow/parquet release candidate is that I had to filter the object_store related closed issues such as #6122 by tagging them with |
I have a fair understanding of I see that Object Store Python already exists, so we can't use the name, but do we want a Python wrapper for this, just like we have for Arrow? |
I don't know if any usecase for such an API, but maybe @roeap would know better as the owner of Object Store Python |
I have asked him on his repo. Let's see what he says. |
Probably best to discuss on a separate issue, but there's quite a lot of interest in exposing object-store to Python. Both as a user-facing Python API and as a way to construct |
Just my 2c, but this would be great! (moving to its own rep) |
Which part is this question about
This guthub repository contains an implementation of
arrow
andparquet
andobject_store
which are related but are in separate crates and reasonably could be in separate repos. arrow and parquet are still released in lockstep (have to be released at the same time, use the same Arrow voting thread)
However,
object_store
is released on a different schedule, with a different voting thread, a diffrent process, and has a non-trivally different set of maintainers and a substantial number of other usersWhile we have tags to separate issues and PRs of object_store I still find it confusing that this repo has content related to object_store
I believe the reason
object_store
is in the arrow-rs repo in the first place was convenience for the maintainers after it was first donated: #2030Now that we have settled the API down and its development and release cycle becomes decoupled from the other crates in this repo I think the overhead of keeping it in the same crate is greater than the value we get from keeping it in the same one
Describe your question
object_store
crate and associated Dev process (tickets, etc) to its own repositoryapache/arrow-rs-object-store
)Additional context
cc @tustvold and @crepererum
The text was updated successfully, but these errors were encountered: