A question about parameter server training #1

SearchVera · 2022-11-04T11:51:41Z

Hi, your code really helps! I have one question:
In the coordinator(train_dataset_fn), you use shard to split data to each worker, but the input param(input_context.input_pipeline_id) indicates which worker index is, so I think every worker should call the function(train_dataset_fn) to get his part of data. But your code show that only the coordinator use the train_dataset_fn function.
Can you explain to me how this param(input_context.input_pipeline_id) works
thx!!!

18520339 · 2023-03-04T08:22:50Z

Hi, sorry for replying late.

Users of ParameterServerStrategy with the Model.fit API need to use a DatasetCreator as the input. An instance of this class will be passed to fit when using a callable (with a input_context argument) that returns a tf.data.Dataset. According to TensorFlow's document:

If you instead create your dataset with tf.keras.utils.experimental.DatasetCreator, the code in dataset_fn will be invoked on the input device, which is usually the CPU, on each of the worker machines.

So Model.fit usage with DatasetCreator is intended to work across all tf.distribute.Strategy, as long as Strategy.scope is used at model creation. tf.distribute will call the input function on the CPU device of each of the workers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about parameter server training #1

A question about parameter server training #1

SearchVera commented Nov 4, 2022 •

edited

Loading

18520339 commented Mar 4, 2023

A question about parameter server training #1

A question about parameter server training #1

Comments

SearchVera commented Nov 4, 2022 • edited Loading

18520339 commented Mar 4, 2023

SearchVera commented Nov 4, 2022 •

edited

Loading