-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always return the same photos #15
Comments
Hi @kwea123 thanks for your feedback! This is a small optimization we did just for the demo page: we pre-rendered 6 images for each scene (standing in the middle of the room, rotating 6 times) to speed up the image rendering step so that demo users won't wait too long while the agent is reasoning (rendering pictures in real time using nerfstudio+LERF takes about 30 seconds). We are aware this is big pain point for this pipeline, therefore immediately next on our TODO list is a "smarter" rendering step, specifically,
Hopefully, this will speed up the process so that we will be able to use real-time rendered images instead of 6 pre-rendered photos for the demo, and also that the photos are taken with better camera poses because they are now conditioned on the text query instead of a fixed point in the room. We are also open to better ideas and implementations for this rendering step. Would love to hear what you think! |
NeRF rendering is slow, and GPT4 matching and searching are also slow. So I think real-time feedback is nearly impossible here. |
@mu-cai I believe it should be doable with the above-proposed pipeline. Based on our experience, it takes 3-5 seconds to render a 512 * 512 picture in NeRF. So if we calculate the relevancy scores and only take pictures around the the relevant areas (only 1-2 pics for a text query), the experience should be near-real-time. We can also add streaming for the rendering process (i.e., display a picture once it's done instead of waiting for all images to finish) which can further reduced the perceived latency from user side. |
Hi @kwea123 @mu-cai, we have implemented the above features in #17. You can try out these new features with our latest demo to see them in action! Some quick screenshots: |
No matter what I ask, it always returns the same set of photos. Not always the same photo but from a set of maybe 5~6 same images.
Like this one for "office". I intentionally gave a prompt that couldn't be explained with this image: "show me the ceiling", but it still shows this same photo
The text was updated successfully, but these errors were encountered: