Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YoloNAS training time #2056

Open
shekarneo opened this issue Oct 4, 2024 · 7 comments
Open

YoloNAS training time #2056

shekarneo opened this issue Oct 4, 2024 · 7 comments

Comments

@shekarneo
Copy link

💡 Your Question

i am training the yolonas key point detection model on a custome dataset with 1000 images and the training time taking 45minutes per epochs, is the model using the original image size or like 640x640. or is this behavior normal

Versions

No response

@BloodAxe
Copy link
Contributor

BloodAxe commented Oct 4, 2024

Depends on many factors - model size, number on workers you have set, your GPU and CPU

@shekarneo
Copy link
Author

okay, after training 20 epochs my AP was 62% and AR was 100% and still there are no key points detected.

@shekarneo
Copy link
Author

shekarneo commented Oct 7, 2024

The dataset i have annotated using cvat tool and exported to coco format.
these are the below training results.

SUMMARY OF EPOCH 24
├── Train
│   ├── Yolonasposeloss/loss_cls = 2.464
│   │   ├── Epoch N-1      = 2.6261 (↘ -0.1621)
│   │   └── Best until now = 2.6261 (↘ -0.1621)
│   ├── Yolonasposeloss/loss_iou = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   ├── Yolonasposeloss/loss_dfl = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   ├── Yolonasposeloss/loss_pose_cls = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   ├── Yolonasposeloss/loss_pose_reg = 0.0
│   │   ├── Epoch N-1      = 0.0    (= 0.0)
│   │   └── Best until now = 0.0    (= 0.0)
│   └── Yolonasposeloss/loss = 2.464
│       ├── Epoch N-1      = 2.6261 (↘ -0.1621)
│       └── Best until now = 2.6261 (↘ -0.1621)
└── Validation
    ├── Yolonasposeloss/loss_cls = nan
    │   ├── Epoch N-1      = nan    (= nan)
    │   └── Best until now = nan    (= nan)
    ├── Yolonasposeloss/loss_iou = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss_dfl = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss_pose_cls = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss_pose_reg = 0.0
    │   ├── Epoch N-1      = 0.0    (= 0.0)
    │   └── Best until now = 0.0    (= 0.0)
    ├── Yolonasposeloss/loss = nan
    │   ├── Epoch N-1      = nan    (= nan)
    │   └── Best until now = nan    (= nan)
    ├── Ap = 0.5355
    │   ├── Epoch N-1      = 0.5786 (↘ -0.0431)
    │   └── Best until now = 0.8088 (↘ -0.2732)
    └── Ar = 1.0
        ├── Epoch N-1      = 1.0    (= 0.0)
        └── Best until now = 1.0    (= 0.0)
        

and my yaml file is

num_joints: 12

oks_sigmas: [0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.072, 0.072, 0.025, 0.025, 0.025]

edge_links:
  - [3,9]
  - [10,2]
  - [0,7]
  - [9,6]
  - [4,0]
  - [10,7]
  - [1,2]
  - [11,5]
  - [4,9]
  - [7,8]
  - [1,11]
  - [11,4]
  - [6,10]
  - [8,3]

edge_colors:
 - [214, 39, 40]  
 - [214, 39, 40] 
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]
 - [214, 39, 40] 
 - [214, 39, 40] 
 - [214, 39, 40]  
 - [214, 39, 40] 
 - [214, 39, 40]  
 - [214, 39, 40]  
 - [214, 39, 40]  


keypoint_colors:
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 50, 83]
  - [250, 250, 55]
  - [250, 250, 55]
  - [250, 250, 55]
  - [250, 250, 55]
  - [250, 250, 55]

@BloodAxe
Copy link
Contributor

BloodAxe commented Oct 7, 2024

What worries me in the reported loss - is zero values for pose/bbox regression for loss.
Which may indicate there are 0 matches between gt boxes/poses and predicted boxes/poses by a model.
Can you attach example image and annotation json that you've exported? I would double-check that there are no export issues in the first place.

You probably aware of, but this notebook shows fine tuning of YoloNAS-Pose on the animals - https://github.com/Deci-AI/super-gradients/blob/master/notebooks/YoloNAS_Pose_Fine_Tuning_Animals_Pose_Dataset.ipynb which is working well.
So my best guess for the root cause of your problem is the data.

@shekarneo
Copy link
Author

shekarneo commented Oct 7, 2024

Hi i am not able to share the images here i can share the annotation file and also attaching the python script for training.

yolo_nas_pose_fine_tuning_custom_dataset.py.txt
Training.json.txt
validation.json.txt

@shekarneo
Copy link
Author

and in my case, I don't need or required to have joint connections between the key points

@Sriparna2024
Copy link

Hi, I am also facing similar issue. Please guide on how to export .json file which is compatible with yolonas pose from CVAT tool. Is the .yaml file stated is correctly formatted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants