-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Roadmap to Documentation #1104
Conversation
docs/source/specification/roadmap.md
Outdated
|
||
## Vision | ||
|
||
DataFusion's goal is to become _the de facto query engine_ of choice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_the de facto query engine_
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means the query engine of choice by fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will reword this to less idiomatic English
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. My poor English needs improving (/ω\)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's okay :-) @xudong963 it was not english and indeed a latin phrase to begin with.
@alamb i think de facto is a common phrase, one way to make it smoother is to put it in italic (like in many books containing latin expressions).
i'd make probably hammer it home by including more examples inline, even by using footnote (not sure GitHub markdown supports that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 makes sense. I do think in general it is a good idea to try and minimize 'flowery prose' and idiomatic English is probably the best plan for what have become world wide communities.
I will contribute some content for the Ballista vision |
2021 is less than three months away. I have a plan to do something for datafusion for the rest of the time.
|
I think it's good to merge as is, more detailed sections can be subsequent pull requests. |
@andygrove said he'll contribute some content for the Ballista vision👀 |
Thanks @jimexist and @xudong963 . I'll plan to leave this open for the weekend to give anyone who has more time then to contribute and then merge it early next week |
Co-authored-by: Loïc Sharma <[email protected]> Co-authored-by: QP Hou <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @alamb for putting this up.
If I recall correctly, @jorgecarleitao has plans to decouple datafusion into smaller reusable crates too. But if @jorgecarleitao and @andygrove are too busy adding their items right now, we could merge and add those items later as follow up PRs.
@Dandandan do you want to add your tokomak optimizer to the list?
also cc @yjshen in case we missed any item needed from your native spark executor work. |
|
Co-authored-by: Daniël Heres <[email protected]>
Thanks, @houqp. I think what I need most is covered by the On the Ballista side, I feel Broadcast join is great to add. Besides, we could have a sort-based shuffle writer for memory usage friendly and have a single map output file for each task to avoid creating too many small files when the output partition number is significant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be the right place to rate the maturity of each component? I feel that there is a pretty big gap between the DataFusion part of the codebase and Ballista (hard coded parameters, huge functions to be decomposed, un-supported SQL...). This information would be very important for a newcomer, but I am not sure how to formulate it 😄
This would be a great thing to describe, but the roadmap is probably not the right place for it (maybe the roadmap could have an entry like "* Mature ballista (see for details)" How about a document in the userguide https://arrow.apache.org/datafusion/#toc-guide somewhere? |
Co-authored-by: Carlos <[email protected]> Co-authored-by: rdettai <[email protected]>
We can add changes as follow on PRs perhaps. Thanks everyone for all the help! |
🎉 |
Thank you @alamb for making this happen. I can help deploy the doc change later today. Reminder for @jorgecarleitao and @andygrove to add your entries before the upcoming datafusion 6.0.0 release ;) |
Which issue does this PR close?
Closes #1102 suggested by @xudong963
** All Suggestions Welcome **
Rationale for this change
See also:
What changes are included in this PR?
Add roadmap to docs published on https://arrow.apache.org/datafusion/
Are there any user-facing changes?
Docs