Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RMP] Create a separate Merlin package for the dataloaders #394

Closed
13 of 14 tasks
karlhigley opened this issue Jun 16, 2022 · 4 comments
Closed
13 of 14 tasks

[RMP] Create a separate Merlin package for the dataloaders #394

karlhigley opened this issue Jun 16, 2022 · 4 comments
Assignees
Labels
Milestone

Comments

@karlhigley
Copy link
Contributor

karlhigley commented Jun 16, 2022

Problem:

A number of customers only want to use our dataloaders. They're a thin wedge that we can use to get Merlin adoption amongst teams.

The PyTorch recommendations framework TorchRec would like to make the Merlin dataloader their default without depending on all of Merlin Models. They'd like to publish blog posts about the framework, which creates an opportunity to co-promote one part of the Merlin ecosystem.

The Spark team want to use our dataloaders to accelerate their workflows in TensorFlow and have coordinated with the horovod team to make it an optional dataloader that's natively included with horovod.

Goal:

  • Publish a Merlin package specifically for the dataloaders separate from Merlin Models and NVT that both MM and TorchRec can then depend on.
  • Integrate the dataloader into as many upstream packages as possible.

Scope:

Publish Merlin dataloaders under a new 'dataloader' package

Constraints:

  • TorchRec is using the old (deprecated) version of the dataloaders from NVT
  • TorchRec isn't set up to use the new version of the dataloaders in Merlin Models
  • Merlin Models still needs to maintain access to the dataloaders too
  • Horovod container size was a concern. They're interested in Pip install of RAPIDS. (coming in 22.10)

Blockers

Create a new repo from the Merlin repo template
Note: Julio is blocked on this. Ben has to create a new repo 'dataloader' repo created

Starting Point:

v22.08

v22.09

@karlhigley karlhigley added this to the Merlin 22.07 milestone Jun 16, 2022
@karlhigley karlhigley changed the title [RMP] Create a separate Merlin package for the dataloaders [UNP] Create a separate Merlin package for the dataloaders Jun 16, 2022
@karlhigley karlhigley added the unplanned Work that wasn't on the roadmap that we've ended up doing anyway label Jun 16, 2022
@viswa-nvidia viswa-nvidia changed the title [UNP] Create a separate Merlin package for the dataloaders [RMP] Create a separate Merlin package for the dataloaders Jun 21, 2022
@karlhigley karlhigley modified the milestones: Merlin 22.07, Merlin 22.08 Jul 1, 2022
@EvenOldridge
Copy link
Member

I've created the repo under the name 'dataloader' primarily for seo and to make sure its got a narrow focus but we can rename later.

@viswa-nvidia
Copy link

@benfred @jperez999 , who is doing which part of this work ?

Create a new repo from the Merlin repo template
Dataloaders-Set up CI builds for the new repo
Dataloaders-Migrate dataloader code from NVT or Merlin Models(?)
Move the orchestration for each python GPU context on to the target GPU
#375
Dataloaders-Publish new PyPi/Conda packages
Dataloaders-Update Merlin Models to use the new package
NVIDIA-Merlin/dataloader#12
Dataloaders-Comparison to petastorm, tensorflow, default pytorch dataloader
Dataloaders-Readme
Dataloaders-Docs update
Dataloaders-CI infrastructure

@viswa-nvidia
Copy link

Pushing it to prioritization to check if we will continue with examples in 22.09

@EvenOldridge EvenOldridge removed the unplanned Work that wasn't on the roadmap that we've ended up doing anyway label Jul 29, 2022
@EvenOldridge
Copy link
Member

@benfred to create issues from the bullet point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants