Extend CheckpointFunction to track all tensor input/output #1148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
The current activation checkpointing implementation would require the input/output argument to be a Tensor to be properly tracked by the auto grad, however for pyspeech nn layers we often use aux_input as a dict and output state which is a list.
This diff enables serialization of a python container: given an input that could be any python "container" (tuple, list, dict), perform a (depth first search) DFS to extract the pytorch tensors from the container and serialize the output to a tuple of tensors. At the original location replace with a index to the serialized list of tensors. As such, the original input can be easily reconstructed.
Before checkpointed_forward, the serialization happens and the tuple of tensors is use as input to forward (thus tracked); during checkpointed_forward, the original input is reconstructed by deserialization and pass in the original forward; the output of the original forward is serialized in the same manner and returned (so that the output is also tracked). After checkpointed_forward, the serialized output is deserialized to the desired format.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.