Webcamdinov2
leverages the DINOv2 architecture to enhance video inferencing capabilities using a webcam, processing videos close to real-time. This implementation employs the MPEG-4 codec to display feature-extracted video sequences alongside the original sequences.
Repository: Webcamdinov2 on GitHub
- Flexible Backbone: Utilizes DINOv2 as the default backbone, but supports changes to different backbones as needed.
- Optional Dependencies: The system can optionally operate without the
xFormers
library. - Environment Compatibility: Designed for use within a DINOv2 Conda environment.
- Clone the repository:
git clone https://github.com/1ssb/webcamdino
- Setup the Conda environment:
- Create a DINOv2 environment image in Conda due to compatibility issues with OpenCV.
- Install requirements from
requirements2.txt
via pip:pip install -r requirements2.txt
The inference pipeline integrates components adapted from Meta's Facebook Research. Note that the current implementation does not achieve real-time performance due to the computationally intensive nature of the inferencing process. Efforts to reduce latency are ongoing, with potential future improvements through computational acceleration.
- Please report any systemic issues or suggestions to Subhransu Bhattacharjee.
- For detailed discussions and known issues, refer to the official DINOv2 issues page.
If you find this project useful, leave a star, or please cite it as:
@misc{bhattacharjee2023webcamdinov2,
author = {Bhattacharjee, Subhransu S.},
title = {{Webcamdinov2: Video Inferencing with Webcam using DINOv2}},
year = {2023},
howpublished = {\url{https://github.com/1ssb/webcamdino}},
note = {Accessed: [Insert date here]}
}