Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2Q2024] Update CXG conversion script [previously - Change cxg conversion script to include uns data] #6659

Closed
kaloster opened this issue Feb 13, 2024 · 28 comments
Assignees
Labels
backend Backend work P0 Priority 0 - Critical, fix ASAP!

Comments

@kaloster
Copy link
Contributor

kaloster commented Feb 13, 2024

Context:

  1. 5.1 schema change log that's relevant to Explorer is here
  2. 5.1 Schema is here
  3. Joyce and Alec last updated the CXG conversion script, so they should be very helpful for this task

DoD:

  1. Update h5ad -> cxg conversion script to pull Explorer needed spatial data from the final 5.1 schema (see change log) - uns, obms, etc.
  2. Update CXG data format specs to capture where we store the new spatial data here
  3. All tests that may break as a result should be fixed and pass
  4. Update .ts schema and types accordingly (?)
@tihuan
Copy link
Contributor

tihuan commented Feb 14, 2024

@Bento007 @metakuni headsup that we might need to collaborate on updating the conversion script? Thank you!

@tihuan
Copy link
Contributor

tihuan commented Feb 14, 2024

Also Jason from the Lattice Team said that they expect to get 5had updated to conform with schema 4.2, but we'll still need to convert them to cxg (manually?)

@metakuni
Copy link
Contributor

@Bento007 @metakuni headsup that we might need to collaborate on updating the conversion script? Thank you!

Thanks for the advanced ping, @tihuan. By 'collaborate', did you just mean some consultation and maybe PR review? The spatial work for Data Platform isn't scheduled to start until Mar 4, earliest.

Also note:

  • The schema version for spatial work has been renamed from 4.2 to 5.1 (but timelines haven't changed)
  • The 5.1 schema is still in draft, so may change.

@tihuan
Copy link
Contributor

tihuan commented Feb 14, 2024

Thanks so much for all the context, @metakuni !

I think mostly I just want to make sure that we don't do double work 😆

@kaloster has done some investigation on how to update the cxg converstion script to work with spatial data already, and we assumed that Data Platform probably will need to work on something similar when you guys tackle the spatial work stuff in March

So maybe we should just talk about how we should update the conversion script when Data Platform is ready to work on it!

@brianraymor
Copy link

@tihuan - Just FYI - Applications is responsible for the CXG validation and conversion per a decision that Arathi made prior to 4.0.

@tihuan
Copy link
Contributor

tihuan commented Feb 14, 2024

Hey @brianraymor ! Ooh thanks for the headsup 💡 I see, so I should see who on the Applications team know more about validation and conversion, and go from there 😄

In that case, @metakuni sorry for the false ping! 🙏

@metakuni
Copy link
Contributor

Hey @brianraymor ! Ooh thanks for the headsup 💡 I see, so I should see who on the Applications team know more about validation and conversion, and go from there 😄

In that case, @metakuni sorry for the false ping! 🙏

No worries, @tihuan !

As for whom on the Apps team... I believe that person was @joyceyan. 😏 For the schema 4.0 migration, she worked on the Apps portion.

@metakuni
Copy link
Contributor

One more thing worth noting: We'll have to coordinate when to merge and deploy your CXG conversion changes since that's part of the 5.1 schema, and we need to release all of those changes simultaneously. Hopefully, that won't complicate your development or testing schedules.

@tihuan
Copy link
Contributor

tihuan commented Feb 15, 2024

Awesomee thanks for saving me time figuring out who I should reach out 😆

Okay! Yeah coordination sounds good! I heard that Jason from the Lattice Team said that all the h5ad files should be updated to conform with 5.1 by Match 29, but I suppose we'd still need to deploy the new 5.1 schema conversion script to convert h5ad to cxg on Data Portal, and trigger the data processing pipeline to make sure everything is up to date?

@brianraymor
Copy link

I heard that Jason from the Lattice Team said that all the h5ad files should be updated to conform with 5.1 by Match 29

Yes - Jason was updating me on their readiness for re-curation for platform planning purposes. The existing visium datasets will fail validation if they're not updated.

but I suppose we'd still need to deploy the new 5.1 schema conversion script to convert h5ad to cxg on Data Portal, and trigger the data processing pipeline to make sure everything is up to date?

Absolutely. When we migrate the data corpus to a new schema version, we need to re-create all the artifacts like seurat downloads and CXG(s) because the underlying data has changed (new/modified/deprecated metadata fields, replaced/obsolete ontology terms, new dataset citations, etc.)

@atarashansky
Copy link
Contributor

atarashansky commented Feb 15, 2024

@tihuan I'm intimately familiar with the CXG converter and also orchestrated a CXG remastering effort for the explorer performance improvements. Feel free to ping me as a consultant for this work if needed.

@tihuan
Copy link
Contributor

tihuan commented Feb 15, 2024

Oh you're a savior, @atarashansky 🤩🙌 Thanks so much! We'll definitely need your help with this lol

Can you help us with a quick rundown on the steps?

So far the pieces I know are:

  1. Lattice Team will update all the h5ad files with the 5.1 schema

  2. Apps Team needs to update the h5ad to cxg conversion script

And the goal is to update the data portal datasets with the latest schema, so we can enable Explorer spatial mode

I'm assuming the steps would be something like:

  1. Apps Team update cxg conversion script, get it deployed to prod

  2. Curators upload all the datasets that are now 5.1 schema

  3. Manually trigger the processing pipeline to run the conversion script

  4. Once the pipeline is done, all the datasets will now have cxg format available

  5. Profit 💰

Something like that? I don't know if 5.1 schema is backward compatible, if not, it feels like more coordination will be needed lol

CC: @kaloster @seve something we need to keep an eye on for rolling out the changes!

@tihuan tihuan assigned tihuan and unassigned kaloster Feb 28, 2024
@tihuan
Copy link
Contributor

tihuan commented Feb 28, 2024

Assigning this to me for now to drive the discussion with the schema team on potentially having them include uns data in the conversion script. Will re-assign back to Ronen, depending on the discusison outcome

@brianraymor
Copy link

RE

Curators upload all the datasets that are now 5.1 schema

Manually trigger the processing pipeline to run the conversion script

The migration of of the corpus from 5.0 -> 5.1 is automated. Curators do not download, revise, and upload datasets. There is no manual trigger for CXG and Seurat conversions. We can review the process on the DP-Apps call.

If you're curious, please see:

https://docs.google.com/document/d/1QNWicEb_C7W5B-L6Lb9jbnX897nUnoamEdaebznB_jw/edit#heading=h.jcf27gbiblyg
https://app.zenhub.com/workspaces/single-cell-5e2a191dad828d52cc78b028/issues/gh/chanzuckerberg/single-cell/441
https://app.zenhub.com/workspaces/single-cell-5e2a191dad828d52cc78b028/issues/gh/chanzuckerberg/single-cell/365

@tihuan
Copy link
Contributor

tihuan commented Feb 28, 2024

Reached out to Dan about the collab possibility

@tihuan
Copy link
Contributor

tihuan commented Feb 28, 2024

Chatting with BrianR on Slack for help now!

@tihuan
Copy link
Contributor

tihuan commented Feb 29, 2024

Sync'd with BrianR on this!

He confirmed that the APS team is indeed responsible for updating and maintaining any APS needed fields in the schema. And once Dan is done with Schema 5.0 migration, he'll schedule time for a DPS/APS sync, so we can coordinate on Schema 5.1 rollout

@tihuan tihuan assigned kaloster and unassigned tihuan Mar 1, 2024
@tihuan
Copy link
Contributor

tihuan commented Mar 1, 2024

Putting this back to Disorder until DPT is ready for collab!

@tihuan tihuan added P0 Priority 0 - Critical, fix ASAP! backend Backend work labels Mar 19, 2024
@tihuan tihuan changed the title Change cxg conversion script to include uns data [2Q2024] Change cxg conversion script to include uns data Mar 19, 2024
@tihuan tihuan changed the title [2Q2024] Change cxg conversion script to include uns data [2Q2024] Update CXG conversion script [previously - Change cxg conversion script to include uns data] Mar 26, 2024
@tihuan
Copy link
Contributor

tihuan commented Mar 26, 2024

I've updated the PR title and description given the meeting yesterday. Thanks so much again for the thorough presentation and info, @brianraymor !!

@tihuan
Copy link
Contributor

tihuan commented Mar 26, 2024

=== OLD TICKET DESCRIPTION ===
Right now, uns metadata that is needed for spatial embedding does not exist in cxg. Add the necessary code fro capturing this data and including it in cxg formatted file.

DoD:

  • uns data is included and accessible in the cxg formatted dataset
  • all tests that may break as a result should be fixed and pass

@tihuan
Copy link
Contributor

tihuan commented Mar 26, 2024

CC: @kaloster please help review the ticket and see if anything else is needed. Thanks so much!

@brianraymor
Copy link

Is there any additional processing/filtering required for these new obs fields?

  • array_row
  • array_col

@tihuan
Copy link
Contributor

tihuan commented Mar 26, 2024

Oh great question! I might need @kaloster to help answer this question 😆

@tihuan
Copy link
Contributor

tihuan commented Mar 27, 2024

Hey team! Please add your planning poker estimate with Zenhub @kaloster @seve

@tihuan
Copy link
Contributor

tihuan commented Mar 29, 2024

@kaloster @seve please help point this ticket thank you!

@tihuan
Copy link
Contributor

tihuan commented Apr 1, 2024

@seve gentle ping thank you!

@tihuan
Copy link
Contributor

tihuan commented Apr 2, 2024

@kaloster just missing your vote now 😊 Thanks!

@tihuan
Copy link
Contributor

tihuan commented Apr 24, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend work P0 Priority 0 - Critical, fix ASAP!
Projects
None yet
Development

No branches or pull requests

5 participants