MoE infra question/suggestion #287

Lucien-Script · 2025-01-15T16:07:08Z

Idea is to have multiple instances of each expert, so a pool of expert number M that router request to. So that router instance number scales with user requests number and expert instances number scales at a slower pace (i.e. more optimized expert use). (But maybe you already implement something similar?)

Then you can have stats of which expert is used the most and train new models accordingly.

I have a question about why token-level routing is used and not full-user-request routing? Is latency an issue then?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MoE infra question/suggestion #287

MoE infra question/suggestion #287

Lucien-Script commented Jan 15, 2025

MoE infra question/suggestion #287

MoE infra question/suggestion #287

Comments

Lucien-Script commented Jan 15, 2025