The purpose of bucket when training #16

zealscott · 2020-08-13T09:53:33Z

Hi,

Thanks for your insight paper and code!

I am wondering why use different bucket to generate the train data according to the probility of each bucket.

And how to determine the size of each bucket?

t2vec/data_utils.py

Line 175 in 0d46731

## select bucket

Thanks a lot!

boathit · 2020-08-13T12:55:08Z

First, we should make it clear that the bucket trick is not necessary and it is only used to do a more efficient batching, i.e., we want the sequences on each batch have similar lengths otherwise we are wasting the computation resource (note that the training step for each batch is determined by the longest sequence in that batch).

Next, we should guarantee that each sequence should be picked by with equal chance in the training. Once we use the buckets, we actually split the sequence picking into two steps: selecting the bucket and then selecting the sequence from that bucket. Thus, the probability of a sequence being selected is the product of the probability its bucket being selected and the probability of it being selected within that bucket. That's why the probability of each bucket being selected is proportional to its size.

zealscott · 2020-08-17T13:20:45Z

Thanks for your reply! And its a clever design.

Another question is that I am confused about the number of iteration:

t2vec/preprocessing/preprocess.jl

Line 60 in f518c3e

    
           createTrainVal(region, "$datapath/$cityname.h5", datapath, downsamplingDistort, 1_000_000, 10_000)

In default settings, 100w data(x20 for noise and distort) for training, so we have 2,000w training data. But the iteration_num = 67000*128, which is fewer than the training data.

t2vec/train.py

Line 251 in f518c3e

num_iteration = 67000*128 // args.batch

So I am curious how to determine the iteration number? Is it enough to use fewer than one epoch training?

Looking forward to your reply, and thanks again.

boathit · 2020-08-23T07:31:55Z

It is okay to use fewer than one epoch in the cases when the dataset contains redundant samples. You can check the convergence on the validation dataset.

zealscott · 2020-08-26T12:58:08Z

Ok, I got your points. Last question, in trainning we use gen_loss of each word, in val data we use the pre_loss of each trajectory, is there any significance to do that differently?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The purpose of bucket when training #16

The purpose of bucket when training #16

zealscott commented Aug 13, 2020 •

edited

Loading

boathit commented Aug 13, 2020

zealscott commented Aug 17, 2020

boathit commented Aug 23, 2020

zealscott commented Aug 26, 2020

The purpose of bucket when training #16

The purpose of bucket when training #16

Comments

zealscott commented Aug 13, 2020 • edited Loading

boathit commented Aug 13, 2020

zealscott commented Aug 17, 2020

boathit commented Aug 23, 2020

zealscott commented Aug 26, 2020

zealscott commented Aug 13, 2020 •

edited

Loading