-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The purpose of bucket when training #16
Comments
First, we should make it clear that the bucket trick is not necessary and it is only used to do a more efficient batching, i.e., we want the sequences on each batch have similar lengths otherwise we are wasting the computation resource (note that the training step for each batch is determined by the longest sequence in that batch). Next, we should guarantee that each sequence should be picked by with equal chance in the training. Once we use the buckets, we actually split the sequence picking into two steps: selecting the bucket and then selecting the sequence from that bucket. Thus, the probability of a sequence being selected is the product of the probability its bucket being selected and the probability of it being selected within that bucket. That's why the probability of each bucket being selected is proportional to its size. |
Thanks for your reply! And its a clever design. Another question is that I am confused about the number of iteration: t2vec/preprocessing/preprocess.jl Line 60 in f518c3e
In default settings, 100w data(x20 for noise and distort) for training, so we have 2,000w training data. But the iteration_num = 67000*128, which is fewer than the training data. Line 251 in f518c3e
So I am curious how to determine the iteration number? Is it enough to use fewer than one epoch training? Looking forward to your reply, and thanks again. |
It is okay to use fewer than one epoch in the cases when the dataset contains redundant samples. You can check the convergence on the validation dataset. |
Ok, I got your points. Last question, in trainning we use |
Hi,
Thanks for your insight paper and code!
I am wondering why use different bucket to generate the train data according to the probility of each bucket.
And how to determine the size of each bucket?
t2vec/data_utils.py
Line 175 in 0d46731
Thanks a lot!
The text was updated successfully, but these errors were encountered: