[FEATURE REQUEST] Importers that consume tf or pytorch Dataset, and can produce identical Datasets after #118

elistevens · 2019-08-25T16:28:46Z

Is your feature request related to a problem? Please describe.
Existing DL projects are already going to have a data pipeline. Often, these are going to result in tf or pytorch Datasets. Having to replace all of the existing mechanisms by which those datasets are created and managed with hangar-specific code is a barrier to adoption.

Describe the solution you'd like
I think that import routines should be able to consume a third-party Dataset instance, inspect it, and store the relevant data in hangar. The intent would be that hangar could then produce a functionally identical Dataset for use in training, but without having to go back to the raw data.

This would change hangar adoption best practice from "replace your data pipeline" to "just insert this mostly-transparent step in the middle." Once the project is fully committed to using hangar, then the architecture can be revisited, if needed.

It would be nice if it could also vacuum up a dir tree of .tfrecord files, but that's a little less well-defined.

Describe alternatives you've considered
N/A

Additional context
N/A

The text was updated successfully, but these errors were encountered:

rlizzo · 2019-09-04T17:38:09Z

This is a phenomenal idea @elistevens! Will put this on the roadmap!

rlizzo · 2019-09-05T19:08:50Z

@hhsecond, would you have time to put together the first draft of this for the next release? (after 0.3)

hhsecond · 2019-09-06T01:45:32Z

Sounds good

rlizzo · 2019-10-09T13:54:59Z

Hey @hhsecond just wanted to ping you on this since it seems like an appropriate feature for 0.4 If you don't have the time, I may have some next week. Let me know if you think you'll be able to take this

hhsecond · 2019-10-09T15:22:06Z

@rlizzo I am finishing up the plugin module clean up (Sorry for the delay). I am sort of stuffed up till Wednesday with PyCon India and GPU stuffs from CircleCI. If you think this can wait till then, I am happy to take this up right after Wednesday

gauthampughazhendhi · 2019-10-15T07:18:10Z

@hhsecond, I am currently working on this issue.

gauthampughazhendhi · 2019-10-15T11:25:42Z

@hhsecond, I am working on the load method's implementation of the plugin. Will keep you updated.

rlizzo · 2019-10-21T13:03:24Z

Hey @GauthamPughaz, just wanted to check in and see how this is going? do you need any assistance? Any ETA when we might be able to see a first draft (it doesn't need to be pretty, but it may save time if we can check out an overview of the flow and suggest any necessary changes before you get too far in development.)

Thanks for volunteering to contribute this! It's a great feature which will be much appreciated!

gauthampughazhendhi · 2019-10-23T08:24:36Z

@rlizzo, I am halfway through it. I may need some assistance in understanding the internal working of arraysets to develop the feature better. I will consult this with @hhsecond. But, I am entirely held up this week and the next. I will be definitely available after that, and I would love to work on this feature.

rlizzo · 2019-10-23T17:46:09Z

Ok. Thanks for the update! I think that you should definitely talk with either myself or @hhsecond before getting too far then.

You shouldn't actually have to care about how arraysets work internally to develop this feature. In general we don't allow any access to the internal workings of the arraysets outside of the public API, even for internal hangar operations. This is because the actual data reader/writer backend methods are massively protected by weakref proxy's and context managers to ensure that all operations occur safely. Going outside of these protections could open up some nasty bugs/behavior if not done properly.

Why don't we try to set up a call with the three of us sometime in the next two weeks so we can discuss further. We can coordinate times through @hhsecond if thats ok?

Thanks for the hard work!
Rick

gauthampughazhendhi · 2019-10-23T19:13:43Z

@rlizzo , thanks for the clarification. I think we can have a call by Tuesday or Wednesday next week to wrap up this.

hhsecond · 2019-10-30T04:21:07Z

@GauthamPughaz Let's do the call today? We would like to push this to the upcoming release (0.4) and hence the hurry. Sorry

gauthampughazhendhi · 2019-10-30T15:55:53Z

@hhsecond, no problem. We can connect through a slack call tomorrow after 8:30 pm IST.

elistevens added the enhancement New feature or request label Aug 25, 2019

hhsecond self-assigned this Sep 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST] Importers that consume tf or pytorch Dataset, and can produce identical Datasets after #118

[FEATURE REQUEST] Importers that consume tf or pytorch Dataset, and can produce identical Datasets after #118

elistevens commented Aug 25, 2019

rlizzo commented Sep 4, 2019

rlizzo commented Sep 5, 2019

hhsecond commented Sep 6, 2019

rlizzo commented Oct 9, 2019 •

edited

Loading

hhsecond commented Oct 9, 2019

gauthampughazhendhi commented Oct 15, 2019

gauthampughazhendhi commented Oct 15, 2019 •

edited

Loading

rlizzo commented Oct 21, 2019

gauthampughazhendhi commented Oct 23, 2019

rlizzo commented Oct 23, 2019

gauthampughazhendhi commented Oct 23, 2019

hhsecond commented Oct 30, 2019

gauthampughazhendhi commented Oct 30, 2019

[FEATURE REQUEST] Importers that consume tf or pytorch Dataset, and can produce identical Datasets after #118

[FEATURE REQUEST] Importers that consume tf or pytorch Dataset, and can produce identical Datasets after #118

Comments

elistevens commented Aug 25, 2019

rlizzo commented Sep 4, 2019

rlizzo commented Sep 5, 2019

hhsecond commented Sep 6, 2019

rlizzo commented Oct 9, 2019 • edited Loading

hhsecond commented Oct 9, 2019

gauthampughazhendhi commented Oct 15, 2019

gauthampughazhendhi commented Oct 15, 2019 • edited Loading

rlizzo commented Oct 21, 2019

gauthampughazhendhi commented Oct 23, 2019

rlizzo commented Oct 23, 2019

gauthampughazhendhi commented Oct 23, 2019

hhsecond commented Oct 30, 2019

gauthampughazhendhi commented Oct 30, 2019

rlizzo commented Oct 9, 2019 •

edited

Loading

gauthampughazhendhi commented Oct 15, 2019 •

edited

Loading