Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about synthesizing Allreduce #36

Open
JASUEXIII opened this issue Aug 16, 2024 · 0 comments
Open

Question about synthesizing Allreduce #36

JASUEXIII opened this issue Aug 16, 2024 · 0 comments

Comments

@JASUEXIII
Copy link

Hi. Thanks for previous prompt response. I'm currently tring to synthesize the Allreduce for a custom topology(let's say a ring with 4 or 8 nodes as an example). Some strange problems occurs when doing so. I wonder if you can help.

My Codes:

topology = Ring(num_Nodes=4)
from msccl.collectives import allgather,allreduce,reduce_scatter,reduce,alltoall
collective_allgather = allgather(topology.num_nodes())
collective_reduce_scatter = reduce_scatter(topology.num_nodes())
save_msccl_object(topology,'SG2260_topo_Ring4.json')
save_msccl_object(collective_allgather,'coll_allgather.json')
save_msccl_object(collective_reduce_scatter,'coll_reducescatter.json')
assert 0 == os.system('msccl solve pareto-optimal custom custom --topology-file SG2260_topo_Ring4.json --collective-file coll_allgather.json')
assert 0 == os.system('msccl solve pareto-optimal custom custom --topology-file SG2260_topo_Ring4.json --collective-file coll_reducescatter.json')
assert 0 == os.system('msccl compose allreduce ReduceScatter.n4-MYTP-steps2.rounds3.chunks2.msccl.json Allgather.n4-MYTP-steps2.rounds3.chunks2.msccl.json -o allreduce_ring4.json')

I stored the collective also into json file for better debug. The logged allreduce json has strange input and output map as follows:

"input_map": { "0": [0, 1], "1": [0, 1], "2": [0, 1], "3": [0, 1] },
  "output_map": { "0": [0, 1], "1": [0, 1], "2": [0, 1], "3": [0, 1] },
  "steps": [
    {
      "msccl_type": "step",
      "rounds": 1,
      "sends": [
        [0, 2, 1],
        [1, 2, 3],
        [2, 3, 0],
        [3, 3, 2],
        [4, 0, 3],
        [5, 0, 1],
        [6, 1, 0],
        [7, 1, 2]
      ]
    },
    {
      "msccl_type": "step",
      "rounds": 2,
      "sends": [
        [0, 1, 0],
        [0, 3, 0],
        [1, 1, 0],
        [1, 3, 0],
        [2, 0, 1],
        [2, 2, 1],
        [3, 0, 1],
        [3, 2, 1],
        [4, 1, 2],
        [4, 3, 2],
        [5, 1, 2],
        [5, 3, 2],
        [6, 0, 3],
        [6, 2, 3],
        [7, 0, 3],
        [7, 2, 3]
      ]
    },
    {
      "msccl_type": "step",
      "rounds": 2,
      "sends": [
        [0, 0, 1],
        [0, 0, 3],
        [1, 0, 1],
        [1, 0, 3],
        [2, 1, 0],
        [2, 1, 2],
        [3, 1, 0],
        [3, 1, 2],
        [4, 2, 1],
        [4, 2, 3],
        [5, 2, 1],
        [5, 2, 3],
        [6, 3, 0],
        [6, 3, 2],
        [7, 3, 0],
        [7, 3, 2]
      ]
    },
    {
      "msccl_type": "step",
      "rounds": 1,
      "sends": [
        [0, 1, 2],
        [1, 3, 2],
        [2, 0, 3],
        [3, 2, 3],
        [4, 3, 0],
        [5, 1, 0],
        [6, 0, 1],
        [7, 2, 1]
      ]
    }
  ],
  "collective": {
    "msccl_type": "collective",
    "name": "Allreduce(n=4)",
    "nodes": 4,
    "chunks": [
      { "msccl_type": "chunk", "pre": [0], "post": [0, 1, 2, 3], "addr": 0 },
      { "msccl_type": "chunk", "pre": [1], "post": [0, 1, 2, 3], "addr": 0 },
      { "msccl_type": "chunk", "pre": [2], "post": [0, 1, 2, 3], "addr": 0 },
      { "msccl_type": "chunk", "pre": [3], "post": [0, 1, 2, 3], "addr": 0 }

I want to know how to understand this output. The chunck id seems to not match with each other. And the input/output map is not a proper solution for allreduce.
I'll be really appreciated and happy to offer other trail logs if anyone can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant