Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take pictures using LERF embeddings; visualize grounding results in 3D mesh; parallelization #17

Merged
merged 18 commits into from
Jun 1, 2023

Conversation

jedyang97
Copy link
Contributor

@jedyang97 jedyang97 commented Jun 1, 2023

Try the newest demo at here!

Specifically, we have made improvement on:

  • Use LERF embeddings + DBSCAN clustering to determine camera poses
  • Take a picture for each object instance
  • Use LLaVA-13B to caption each picture
  • GPT-4 reads all captions and reason internally or ask user for clarification to ground object
  • Display grounding results to user: object instances highlighted in a 3D mesh using bounding sphere
  • Significantly speed up the pipeline with parallelization on rendering and LLaVA inference

"How many doors are there in this room?"
image

"find all the chairs"
image

@jedyang97 jedyang97 requested a review from XuweiyiChen June 1, 2023 07:45
@jedyang97 jedyang97 self-assigned this Jun 1, 2023
@jedyang97 jedyang97 requested a review from JasonQSY June 1, 2023 07:49
@jedyang97 jedyang97 merged commit f419c7b into main Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants