You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great project! A few unrelated questions you might be able to answer.
Ive been working on a related approach and I wonder how you think it might compare. That is, instead of providing a connectivity pattern for small blocks, and optimising this at a lower level, I figured I should get similar benefits from using larger blocks (say 128; batch128128 is still a respectable amount of work for a GPU), and connecting those with some configurable pattern on the computation-graph level.
I am pretty sure your approach blows this naive approach out of the water for small blocks; but it would add some flexibility benefits; like different-sized blocks. But if this works efficiently down to 8x8 blocks then that latter argument is mostly moot as well.
Personally I was mostly inspired by grouped-convolutions, and the desire to have a similar control over the ratio of neural-bandwidth-to-number-of-connections, for dense networks too. By extension of the work on grouped convolutions, I have been thinking about such architectures in a way where most layers are straightforward diagonal, block-to-block operations, periodically interspersed with 'lateral', or off-diagonal connections.
But I suppose with your approach there is really no performance benefit in doing so, and this architectural question is subsumed by the tuning of the proportion of vertical/lateral, or diagonal/off-diagonal connections. That said it still could be that restricting the lateral flow of information and having connected block 'do their own thing' for multiple layers could be a beneficial strategy. Analogous to the work on multigrid/multiscale convnets, it seems likely that having fully connected blocks at a range of bandwidth-scales, with restricted off-diagonal communication between them, would effectively separate the signal into a high level contextual descriptor for the low bandwidth paths, and delegate the detail to the higher bandwidth blocks. Thatd be pretty cool from an unsupervised learning perspective.
Anyway, no shortage of potential applications I think, and kudos to openai for releasing this early instead of milking it like an academic researcher would. The application I am working on atm is proprietary, but I think work such as this will be pivotal to getting people to look at anything other than convnets again. With fractal/resnet like architectures giving unlimited depth, and work like this giving essentially unlimited width, I think a lot of possibilities are opening up.
Personally, I was experimenting with hypercube and tree-like topologies as connectivity patterns (both of which have log(n_blocks) longest path length). Have you compared these type of structured connectivities to the more random connectivities you explore? Is there a benefit to the randomness, or is it just a good vehicle for demonstrating generality of the approach?
Concluding my ramblings; I hope to find the time to try this out soon! Which leads me to another question; are you aware of a battle-tested keras wrapper? And if not, do you foresee any gotchas in writing one, or should it be straightforward? If I get around to it ill make sure to make a PR for it.
The text was updated successfully, but these errors were encountered:
Great project! A few unrelated questions you might be able to answer.
Ive been working on a related approach and I wonder how you think it might compare. That is, instead of providing a connectivity pattern for small blocks, and optimising this at a lower level, I figured I should get similar benefits from using larger blocks (say 128; batch128128 is still a respectable amount of work for a GPU), and connecting those with some configurable pattern on the computation-graph level.
I am pretty sure your approach blows this naive approach out of the water for small blocks; but it would add some flexibility benefits; like different-sized blocks. But if this works efficiently down to 8x8 blocks then that latter argument is mostly moot as well.
Personally I was mostly inspired by grouped-convolutions, and the desire to have a similar control over the ratio of neural-bandwidth-to-number-of-connections, for dense networks too. By extension of the work on grouped convolutions, I have been thinking about such architectures in a way where most layers are straightforward diagonal, block-to-block operations, periodically interspersed with 'lateral', or off-diagonal connections.
But I suppose with your approach there is really no performance benefit in doing so, and this architectural question is subsumed by the tuning of the proportion of vertical/lateral, or diagonal/off-diagonal connections. That said it still could be that restricting the lateral flow of information and having connected block 'do their own thing' for multiple layers could be a beneficial strategy. Analogous to the work on multigrid/multiscale convnets, it seems likely that having fully connected blocks at a range of bandwidth-scales, with restricted off-diagonal communication between them, would effectively separate the signal into a high level contextual descriptor for the low bandwidth paths, and delegate the detail to the higher bandwidth blocks. Thatd be pretty cool from an unsupervised learning perspective.
Anyway, no shortage of potential applications I think, and kudos to openai for releasing this early instead of milking it like an academic researcher would. The application I am working on atm is proprietary, but I think work such as this will be pivotal to getting people to look at anything other than convnets again. With fractal/resnet like architectures giving unlimited depth, and work like this giving essentially unlimited width, I think a lot of possibilities are opening up.
Personally, I was experimenting with hypercube and tree-like topologies as connectivity patterns (both of which have log(n_blocks) longest path length). Have you compared these type of structured connectivities to the more random connectivities you explore? Is there a benefit to the randomness, or is it just a good vehicle for demonstrating generality of the approach?
Concluding my ramblings; I hope to find the time to try this out soon! Which leads me to another question; are you aware of a battle-tested keras wrapper? And if not, do you foresee any gotchas in writing one, or should it be straightforward? If I get around to it ill make sure to make a PR for it.
The text was updated successfully, but these errors were encountered: