Approach for performing Deep Learning Inference inside Trusted Enclave #5

saxenauts · 2019-03-06T14:08:50Z

Ran MNIST, but running Deep Learning Models (MobileNet for ex.) fails because of the limited memory. What approaches are you guys seeking to enable heavier computations ?
Outsourcing the Compute to a GPU as described in Slalom ? or keep it in the CPU and release neural network layers in chunks to the TEE as described in ML Capsule?
We are currently experimenting with chunking layers and sending to the enclave in batches and processing them. Let me know your thoughts

bendecoste · 2019-03-06T15:27:16Z

Hey! Thanks for the issue. We have played around with Slalom in the past but outside of this repo, it seems promising but I am not sure it will mitigate the memory issue you are describing.

ML Capsule sounds quite interesting, my immediate thought would be slowness by going in and out of the enclave environment so much.

My understanding is that SGX machines should support page swaps natively, but again quite slow. Were you seeing a crash?

@justin1121 has ran some larger models in the past, is there a stack trace or anything that you are seeing?

Thanks!

~ Ben

v7t-codes · 2019-03-12T05:05:05Z

Hey @bendecoste,
here's the message I get when I try to load a mobilenet

GetModelLoad rpc failed.
Received message larger than max (5359656 vs. 4194304)
Traceback (most recent call last):
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Issue with grpc connection
[[{{node ModelLoadEnclave}} = ModelLoadEnclave[_device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelLoadEnclave/model_name, ModelLoadEnclave/model)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/psi/anaconda3/envs/v1/bin/model_run.py", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/psi/sgx_ml/tf-trusted/tf_trusted_custom_op/model_run.py", line 122, in
load_node.run()
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2368, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5192, in _run_using_default_session
session.run(operation, feed_dict)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Issue with grpc connection
[[node ModelLoadEnclave (defined at :42) = ModelLoadEnclave[_device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelLoadEnclave/model_name, ModelLoadEnclave/model)]]

Caused by op 'ModelLoadEnclave', defined at:
File "/home/psi/anaconda3/envs/]v1/bin/model_run.py", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/psi/sgx_ml/tf-trusted/tf_trusted_custom_op/model_run.py", line 121, in
load_node = model_load(model_name, tflite_bytes)
File "", line 42, in model_load_enclave
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/psi/anaconda3/envs/v1/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Issue with grpc connection
[[node ModelLoadEnclave (defined at :42) = ModelLoadEnclave[_device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelLoadEnclave/model_name, ModelLoadEnclave/model)]]

v7t-codes · 2019-03-13T08:12:34Z

Hey @justin1121 !
Which were the larger models you tested? (I'm working with a mobilenet0.5)

justin1121 · 2019-03-13T18:57:28Z

Hey! I don't recall 100%. Seems like it's a problem with the model size in megabytes. Grpc has a switch where you can change this limit to be unlimited. I'm away until Monday but can take a quick look at this then. Feel free to submit a PR if you find a fix sooner.

…

On Wed, Mar 13, 2019, 1:12 AM BlazingBoosh ***@***.***> wrote: Hey @justin1121 <https://github.com/justin1121> ! Which were larger models you tested? (I'm working with a mobilenet0.5) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABWgan2VRg1Gvn0_xoK8d1ihjV1J25K_ks5vWLLygaJpZM4bg9w2> .

justin1121 · 2019-04-26T12:58:13Z

@Vishwajit123 in #11 I was able to fix the problem you were having with mobile0.5. I ran mobilenet_v2_1.0_224 which I believe is bigger than mobile0.5.

@saxenauts I'm curious if you've had any luck in what you were talking about above. Running mobilenet_v2_1.0_224 seems to have pretty reasonable performance. (Haven't done a proper benchmark though).

tgamal · 2019-11-21T00:52:15Z

@justin1121 : I was able to run mobilenet_v2_1.0_224 and it runs in about 1.3 seconds per image in SGX. Does that sound reasonable to you? I understand that it should run considerably slower than CPU but Just want to make sure that I do not have a problem that unnecessarily slows down the processing.

Mobilnet inference time seems to be 120x slower than the inference time in CPU, I also tried other models like InceptionV1 (http://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip) and it runs in about 2.2 sec per image in SGX (around 150x slower than CPU). Let me know if that sounds similar to what you got.

justin1121 · 2019-11-21T08:05:28Z

Hey @tgamal that seems reasonable to me. I know there are ways to tweak the performance in the enclave that we haven't explored yet.

tgamal · 2019-11-21T15:23:07Z

Thanks @justin1121 , let me know if you have something specific to explore in terms of performance. I am happy to explore some ideas.

justin1121 mentioned this issue Apr 26, 2019

support even bigger models #11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approach for performing Deep Learning Inference inside Trusted Enclave #5

Approach for performing Deep Learning Inference inside Trusted Enclave #5

saxenauts commented Mar 6, 2019 •

edited

Loading

bendecoste commented Mar 6, 2019

v7t-codes commented Mar 12, 2019 •

edited

Loading

v7t-codes commented Mar 13, 2019 •

edited

Loading

justin1121 commented Mar 13, 2019 via email

justin1121 commented Apr 26, 2019

tgamal commented Nov 21, 2019

justin1121 commented Nov 21, 2019

tgamal commented Nov 21, 2019

Approach for performing Deep Learning Inference inside Trusted Enclave #5

Approach for performing Deep Learning Inference inside Trusted Enclave #5

Comments

saxenauts commented Mar 6, 2019 • edited Loading

bendecoste commented Mar 6, 2019

v7t-codes commented Mar 12, 2019 • edited Loading

v7t-codes commented Mar 13, 2019 • edited Loading

justin1121 commented Mar 13, 2019 via email

justin1121 commented Apr 26, 2019

tgamal commented Nov 21, 2019

justin1121 commented Nov 21, 2019

tgamal commented Nov 21, 2019

saxenauts commented Mar 6, 2019 •

edited

Loading

v7t-codes commented Mar 12, 2019 •

edited

Loading

v7t-codes commented Mar 13, 2019 •

edited

Loading