diff --git a/scala-package/memory-management.md b/scala-package/memory-management.md new file mode 100644 index 000000000000..351a0d4ae761 --- /dev/null +++ b/scala-package/memory-management.md @@ -0,0 +1,69 @@ +## JVM Memory Management +The Scala binding of Apache MXNet uses native memory(C++ Heap either in RAM or GPU memory) in most of the MXNet Scala objects such as NDArray, Symbol, Executor, KVStore, Data Iterators, etc.,. the Scala classes associated with them act as wrappers, +the operations on these objects are directed to the MXNet C++ backend via JNI for performance , so the bytes are also stored in the native heap for fast access. + +The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deAllocation of the native memory has to be managed by MXNet Scala. +Allocating native memory is straight forward and is done during the construction of the object by a calling the associated C++ API through JNI, however since JVM languages do not have destructors, De-Allocation of these objects becomes problematic and has to explicitly de-allocated. +To make it easy, MXNet Scala provides a few modes of operation. + +### [ResourceScope.using](https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/ResourceScope.scala#L106) (Highly recommended) +`ResourceScope.using` provides the familiar Java try-with-resources primitive in Scala and also extends to automatically manage the memory of all the MXNet objects created in the code block (`body`) associated with it by tracking the allocations in a stack. +If an MXNet object or an Iterable containing MXNet objects is returned from the code-block, it is automatically excluded from de-allocation in the current scope and moved to +an outer scope if ResourceScope's are stacked. + +**Usage** +``` +ResourceScope.using() { + ResourceScope.using() { + val r1 = NDArray.ones(Shape(2, 2)) + val r2 = NDArray.ones(Shape(3, 4)) + val r3 = NDArray.ones(Shape(5, 6)) + val r4 = NDArray.ones(Shape(7, 8)) + (r3, r4) + } + r4 +} +``` +In the example above, we have two ResourceScopes stacked together, 4 NDArrays `(r1, r2, r3, r4)` are created in the inner scope, the inner scope returns +`(r3, r4)`. The ResourceScope code recognizes that it should not de-allocate these objects and automatically moves `r3` and `r4` to the outer scope. The outer scope +returns `r4` from its code-block, so ResourceScope.using removes this from its list of objects to be de-allocated. All other objects are automatically released(native memory) by calling the associated C++ Backend's Free API. + +**Note:** +You should consider stacking ResourceScope when you have layers of functionality in your application code which creates a lot of MXNet objects like NDArray. +This is because you don't want to hold onto all the memory that is created for the entire training loop and you will most likely run out of memory especially on GPUs which have limited memory in order 8 to 16 GB. +For example if you were writing Training code in MXNet Scala, it is recommended not to use one-uber ResourceScope block that runs the entire training code, +instead you should stack multiple scopes one where you run forward backward passes on each batch, +and 2nd scope for each epoch and an outer scope that runs the entire training script, like the example below +``` +ResourceScope.using() { + val m = Module(...) + m.bind() + val k = KVStore(...) + ResourceScope.using() { + val itr = MXIterator(..) + val num_epochs: Int = 100 + ... + for (i <- 0 until num_epoch) { + ResourceScope.using() { + val dataBatch = itr.next() + while(itr.next()) { + m.forward(dataBatch) + m.backward(dataBatch) + m.update() + } + } + } +} + +``` + +### Using Phantom References (mildly Recommended) + +### Using dispose Pattern (least Recommended) + +When the Garbage Collector runs, it identifies unreachable objects in the JVM Heap, finalizes and automatically manages their memory, we take advantag +when the GC runs, finds the objects are not reachable and finalizes. Even if we were to have this destructor, we still have to wait for the GC to run +to find the objects are not reachable, the Iterables that contain + + +