Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking suggestion regarding combining VC features with other visual features #13

Open
siyamsajeebkhan opened this issue Feb 12, 2021 · 3 comments

Comments

@siyamsajeebkhan
Copy link

Hi,
Thanks for your fantastic work! I am trying to apply your work for videos. For that, I am trying to combine the VC features with the I3D features. While doing so, I am facing a few challenges. First of all, I have seen that for each frame of a video I get VC features with Nx1024 size where N represents the detected bounding boxes in the object which doesn't match with the size of I3D features. So, I was doing elementwise addition of all the features of the N bounding boxes to get a single feature representation of shape 1024.

Do you think it's a good idea? Will the features be preserved if I do addition like this? If not, do you have a better idea on how to do it so that I can combine with the I3D features?

Thanks!

@Wangt-CN
Copy link
Owner

Hi, I think use feature mean is not a bad idea. However, a more important thing you may notice is that, if the video frame has clear objects. Since VC R-CNN is build on the faster rcnn framework which extracts object features based on the detected/given bounding boxes. If the video frame is not very clear or has few objects, the feature extracted by faster rcnn may be trivial. (Maybe you can add a bounding box which is a whole image size to ensure extract the whole frame feature) .

Another thing is, the distribution of the frame images may be quite different from that of the training samples for pretrained VC RCNN (e.g., MSCOCO).

Does the I3D model you used is a pretrained model, which means you just use it to extract features? Or you need to train the I3D model during training?

@siyamsajeebkhan
Copy link
Author

Hi, Thanks a lot for your reply.

Hi, I think use feature mean is not a bad idea. However, a more important thing you may notice is that, if the video frame has clear objects. Since VC R-CNN is build on the faster rcnn framework which extracts object features based on the detected/given bounding boxes. If the video frame is not very clear or has few objects, the feature extracted by faster rcnn may be trivial. (Maybe you can add a bounding box which is a whole image size to ensure extract the whole frame feature) .

Firstly, I am using YOLOv5 for extracting the bounding boxes from the video frames and then I was feeding the BBox coordinates to VC R-CNN. Do you think, using YOLOv5 is a very bad idea? I opted for YOLO as it's very fast which is specially required for videos which contain huge number of frames.

Also, by adding a bounding box equaling the whole image size, do you mean to add it to all the frames no matter how many objects were detected for that frame or just for those frames where no object or a very few objects were detected?

Another thing is, the distribution of the frame images may be quite different from that of the training samples for pretrained VC RCNN (e.g., MSCOCO).

Do you suggest retraining the whole VC R-CNN architecture in this case for my custom dataset?

Does the I3D model you used is a pretrained model, which means you just use it to extract features? Or you need to train the I3D model during training?

I am not training the I3D model. I am using a pretrained one just for feature extraction from videos.

@Wangt-CN
Copy link
Owner

Hi,

  1. Yolov5 can be ok for extracting the bounding box.
  2. For the whole image size bounding box, I think 2 options you mentioned can both be ok.
  3. Yes, if you have data (annotation) to fine-tune the VC RCNN on your own custom dataset, this is the best choice. If you don't have the annotation, you can just use the pretrained VC R-CNN model.

If you have any other questions, feel free to ask me. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants