You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When persistent workers are enabled, the epoch that's set via the IterableDataset instance held by the training process is ignored by the workers as they are disconnected across processes.
PyTorch samplers for non-iterable datasets have a mechanism to sync this, datasets.IterableDataset does not.
In my own use of IterableDatasets I usually track the epoch count which crosses process boundaries in a multiprocessing.Value
Steps to reproduce the bug
Use a streaming dataset (Iterable) w/ the recommended pattern below and persistent_workers=True in the torch DataLoader.
for epoch in range(epochs):
shuffled_dataset.set_epoch(epoch)
for example in shuffled_dataset:
...
Expected behavior
When the canonical bit of code above is used with num_workers > 0 and persistent_workers=True, the epoch set via set_epoch() is propagated to the IterableDataset instances in the worker processes
Environment info
N/A
The text was updated successfully, but these errors were encountered:
Describe the bug
When persistent workers are enabled, the epoch that's set via the IterableDataset instance held by the training process is ignored by the workers as they are disconnected across processes.
PyTorch samplers for non-iterable datasets have a mechanism to sync this, datasets.IterableDataset does not.
In my own use of IterableDatasets I usually track the epoch count which crosses process boundaries in a multiprocessing.Value
Steps to reproduce the bug
Use a streaming dataset (Iterable) w/ the recommended pattern below and
persistent_workers=True
in the torch DataLoader.Expected behavior
When the canonical bit of code above is used with
num_workers > 0
andpersistent_workers=True
, the epoch set viaset_epoch()
is propagated to the IterableDataset instances in the worker processesEnvironment info
N/A
The text was updated successfully, but these errors were encountered: