Design of a transformer-based architecture for object detection conditioned by metadata:
- DEtection TRanformer (DETR)
- You Only Look at One Sequence (YOLOS)
To install the project, simply clone the repository and get the necessary dependencies. Then, create a new project on Weights & Biases. Log in and paste your API key when prompted.
# clone repo
git clone https://github.com/MarcoParola/conditioning-transformer.git
cd conditioning-transformer
mkdir models data
# Create virtual environment and install dependencies
python -m venv env
. env/bin/activate
python -m pip install -r requirements.txt
# Weights&Biases login
wandb login
To perform a training run by setting model
parameter that can assume the following value detr
, early-sum-detr
, early-concat-detr
, yolos
, early-sum-yolos
, early-concat-yolos
python train.py model=detr
The command could also be run specifying the cropBackground
option by setting it at true
or false
resulting on the following training image.
Whole image | Cropped image |
---|---|
To run inference on test set to compute some metrics, specify the weight model path by setting weight
parameter (I ususally download it from wandb and I copy it in checkpoint
folder).
python test.py model=detr weight=checkpoint/best.pt
Training hyperparams can be edited in the config file or ovewrite by shell
Params | Value |
---|---|
batchSize | 16 |
lr | 1e-6 |
Special thanks to @clive819 for making an implementation of DETR public here. Special thanks to @hustvl for YOLOS original implementation