-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add enable_url_table
as a argument to SessionStateBuilder
#12394
Comments
@goldmedal says:
|
take |
@alamb datafusion/datafusion/core/src/execution/context/mod.rs Lines 392 to 403 in 6590ea3
However, the RwLock is commonly built when creating the SessionContext. datafusion/datafusion/core/src/execution/context/mod.rs Lines 352 to 357 in 6590ea3
I think your proposed usage doesn't work 🤔 because let session_state_ref: Arc<RwLock<SessionState>> = SessionStateBuilder::new()
.with_default_features()
.with_config(cfg)
.enable_url_table_and_build()
let ctx = SessionContext::from(session_state_ref); WDYT? |
I see. That is a good question 🤔 We could potentially change build() to return an Arc but I don't know what implications that has Maybe we could change the DynamicTableProvider to keep a reference to some part of the SessionState (rather than an Arc to the whole thing 🤔 ) but that might be messy |
I guess it would be a huge breaking change for datafusion/datafusion/core/src/execution/context/mod.rs Lines 351 to 358 in 7bd7747
The
I think the main challenge is URL resolution. The datafusion/datafusion/core/src/datasource/dynamic_file.rs Lines 71 to 73 in 7bd7747
It’s not easy to extract only the required parts for this |
Yeah I am not having any great idea at the moment. I'll keep thinking |
I filed #12550 to track this and other ideas for making the APIs eaiser to use |
I wonder if we could change ListingTable::infer and anything else that uses That might then permit avoiding the need for an enture SessionContext 🤔 |
After some research, I found the main usages of SessionState by If we want to use /// Retrieves a [FileFormatFactory] based on file extension which has been registered
/// via SessionContext::register_file_format. Extensions are not case sensitive.
pub fn get_file_format_factory(
&self,
ext: &str,
) -> Option<Arc<dyn FileFormatFactory>> {
self.file_formats.get(&ext.to_lowercase()).cloned()
} The /// Infer the common schema of the provided objects. The objects will usually
/// be analysed up to a given number of records or files (as specified in the
/// format config) then give the estimated common schema. This might fail if
/// the files have schemas that cannot be merged.
async fn infer_schema(
&self,
state: &SessionState,
store: &Arc<dyn ObjectStore>,
objects: &[ObjectMeta],
) -> Result<SchemaRef>; Runtime_envIt's used by let list = match self.is_collection() {
true => match ctx.runtime_env().cache_manager.get_list_files_cache() {
None => store.list(Some(&self.prefix)),
Some(cache) => {
if let Some(res) = cache.get(&self.prefix) {
debug!("Hit list all files cache");
futures::stream::iter(res.as_ref().clone().into_iter().map(Ok))
.boxed()
} else {
let list_res = store.list(Some(&self.prefix));
let vec = list_res.try_collect::<Vec<ObjectMeta>>().await?;
cache.put(&self.prefix, Arc::new(vec.clone()));
futures::stream::iter(vec.into_iter().map(Ok)).boxed()
}
}
},
false => futures::stream::once(store.head(&self.prefix)).boxed(),
}; Some ConclusionsIf we can use
The building of a dynamic catalog would be: let runtime = Arc::new(RuntimeEnv::default());
// DynamicSession is an implementation of `Session`.
let factory = Arc::new(DynamicListTableFactory::new(DynamicSession::new(file_formats, config_options, runtime)));
let catalog_list = Arc::new(DynamicFileCatalog::new(
Arc::clone(state_ref.catalog_list()),
Arc::clone(&factory) as Arc<dyn UrlTableFactory>,
)); However, the file_formats and configs will be static. We can't register additional formats or change configs at the runtime. (if we want to change them at the runtime, we need to make them be something like share reference 🤔 ) Then, we can remove |
This seems reasonable to me
I agree this sounds not good. 🤔 I vaguely remember that something similar is needed for Anyhow, thank you for this thorough review. I d |
So it could look like
🤔
(we can do this as a follow on as well)
Originally posted by @alamb in #11035 (comment)
The text was updated successfully, but these errors were encountered: