about self.orig_loss #12

FingerRec · 2019-05-01T08:44:20Z

i found baseline_exp/async_tf_i3d_charades.py cannot run directly.
so i modify line 81 in models/criteria/async_tf_criterion.py as follow
idtime = [] for i in range(len(meta)): idtime.append((meta[i]['id'], meta[i]['time']))

i was confused about line 105 in models/criteria/async_tf_criterion.py
loss += self.loss(torch.nn.Sigmoid()(a), target) * self.orig_loss
what's the self.orig_loss mean?

The text was updated successfully, but these errors were encountered:

gsig · 2019-05-07T02:50:48Z

Hi!

This baseline definitely needed some updating, I just added fixes in commit ded24bd and it's running now on 4 gpus.

self.orig_loss was just a legacy parameter that had been set to 1, so it could safely be removed.. it was historically to adjust for the difference between the original softmax loss and the new sigmoid loss.

This baseline includes my experiments with simplifying asynchronous temporal fields, and extending to multi-label sigmoid loss, and i3d base architecture etc. I hope it helps! Let me know if you have any questions.

FingerRec · 2019-05-07T12:19:52Z

Thanks for your reply!

This code works very well now, just two small problem, as i use the pertained model, at the begin, the Prec@5 is often bigger than 100, like bellow

Train Epoch: [0][60/2660(2660)] Time 1.629 (2.227) Data 0.032 (0.119) Loss 0.0362 (0.0438) Prec@1 2.051 (47.684) Prec@5 168.718 (135.191)

Another question is

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).

may be need to lower the memory_size or video_size?

gsig · 2019-05-07T19:11:30Z

That's just due to how I extended Prec@1 and Prec@5 to work with multi-label ground truth. It's easy to add your own metrics under metrics/ and then just include them under --metrics in the config. My extension just counts all the are correct, either in top 1 or top 5. I just use if for analyzing training and over/underfitting, but then I use mAP for all proper evaluations.

This error is due to memory usage of the dataloading threads. The way multithreading works in pytorch/python is that it requires duplicating some of the data across the threads etc, and furthermore the images are queued into memory while they are waiting to be used, and the number of queued images is proportional to the number of workers (2x?). The easiest fix is to reduce the number of --workers. You can also try optimizing the dataloader by using torch.Tensors where possible (they aren't duplicated like lists of strings/numpy arrays/etc I believe).

If this error is happening at the start of the val_video phase you can try changing the number of workers in the val_video phase ( datasets/get.py ) either by just manually setting a number there or creating a new args parameter for it. This is because each dataloader is loading in much larger batch (whole video) in the val_video phase, and thus requires much more memory to store the queue of images.

Hope that helps!

FingerRec · 2019-05-09T01:23:06Z

fixed, thanks a lot

gsig mentioned this issue May 7, 2019

Updating api for asynctf #14

Merged

FingerRec closed this as completed May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about self.orig_loss #12

about self.orig_loss #12

FingerRec commented May 1, 2019

gsig commented May 7, 2019 •

edited

Loading

FingerRec commented May 7, 2019 •

edited

Loading

gsig commented May 7, 2019

FingerRec commented May 9, 2019

about self.orig_loss #12

about self.orig_loss #12

Comments

FingerRec commented May 1, 2019

gsig commented May 7, 2019 • edited Loading

FingerRec commented May 7, 2019 • edited Loading

gsig commented May 7, 2019

FingerRec commented May 9, 2019

gsig commented May 7, 2019 •

edited

Loading

FingerRec commented May 7, 2019 •

edited

Loading