You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey there!
Just wanted to clarify something regarding the code for ps and worker. I've recently started working with such distributed training, so pardon my silly queries.
As much as I've come to know, ps serve parameters to the workers while the later one fetches them. Aside from the difference in the tf_config, I've noticed no code for fetching/serving of parameters particularly dedicated to only ps or only workers. Both share the same code.
I wanted to know how are they coordinating with one another?
The text was updated successfully, but these errors were encountered:
Hi, as far as I know, the ParameterServerStrategy employs an underlying communication protocol (like gRPC) to coordinate the variable updates and synchronization. So, when using that strategy, the coordination between ps and workers is handled behind the scenes by TF's runtime, and you don't need to write explicit code to fetch or serve parameters.
Hey there!
Just wanted to clarify something regarding the code for ps and worker. I've recently started working with such distributed training, so pardon my silly queries.
As much as I've come to know, ps serve parameters to the workers while the later one fetches them. Aside from the difference in the tf_config, I've noticed no code for fetching/serving of parameters particularly dedicated to only ps or only workers. Both share the same code.
I wanted to know how are they coordinating with one another?
The text was updated successfully, but these errors were encountered: