-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebooks lose context after a couple of seconds #132
Comments
this is definitely not as expected and the the session should live longer than 30 seconds. can you also check the Output window for |
just ran some quick tests and it works as expected in my environment which connection manager are you using? |
That might be the case, however even when I start a single cluster and I'm the only one using it, I still have the same issue. It's worth noting that this never happens if I use the databricks notebook inside databricks.
Something here is fishy When I run some code, initially things seem to work as expected:
After some seconds, this happens:
After this a ridiculously long response appears, that finishes with this one below.
After this all context is lost |
I am not sure how to check this, or what are the possible answers. As I mentioned, it's hard to believe this is related to the cluster because it works without any issue in the databricks UI. Is there anything that you think I might look at? |
so you can ignore the the problem is that it somehow runs the
if it were the first issue, you would see additional log messages (which dont appear) so it must be the second case it might be that the |
I have only a couple of local Python kernels available, but there are plenty of clusters from Databricks (over 30) that appear as options (even though I only have proper access to 3 or 4 of them). |
I have the exact same problem, when working in vscode, and I get the error at around 30 second.
notice the "null" Full error:
|
seems like the root cause is some huge resultset/output that you are trying to display - can you give me a rough estimate of how big it is (num cols * num rows) - I may be able to reproduce the issue on my side then it could be that this leads to the disposal of the Notebook controller and the removal of the Execution context |
In this case i was running .head(), so 5 lines and maybe 5 columns, but it doesn't matter in this case because the cell could have easily ran and the next cell get the error. the same .head() worked in the previous cell. I can even get the when defining a variable, and no output is expected. |
When looking at the standard error in driver logs, the error is this
|
So after a couple of days of trying, it sometimes work like a charm and sometimes it restarts the context. One thing I did was make sure that the offline and online notebooks are consistent, but recently the error returned, looking at the event log, it seems that it gets restarted every time it tries to autoscale up. I noticed that after i added some init_scripts, which you can see running every time it autoscales which means it was restarted. When using the same notebook on azure databricks I do not face this problem. |
this can very well be the reason! good catch! |
Okay, I don't have the technical ability to check if there is a solution for this when looking at the different scripts where autoscaling is mentioned (such as this one DatabricksKernel.ts). But it could be that it needs to constantly fetch the cluster information (new target_workers?), or the connection is being killed from the extension side while its waiting for a response. |
Hi, I am also having an issue with a cluster that disconnects. This isn't an issue with any of my other clusters. As far as I can tell, the only difference between this cluster and the other clusters is Autoscaling. It seems to be worse the longer I leave the notebook sitting there without another execution. This makes me think something like (excuse the simplistic terminology) pinging the cluster would help solve this. Keen to hear your thoughts or feedback on this. It is really quite frustrating and would love a solution (definitely not a criticism - Databricks Power Tools is AMAZING!!). |
I will have another look - but no promise I can fix this or this can be fixed at all using the current approach/APIs |
Yes - I totally understand and appreciate that. You can only work with what DataBricks exposes! Thanks, appreciated it! |
Something is weird about the notebooks session using a cluster kernel. After a couple of seconds without running any cell, all stored objects and even modules imported are lost.
Is this expected behavior? If not, how would you be able to work around it?
In this print below I ran both cells within 30 seconds and still got this error.
If I run the first two cells immediately and, 30 seconds later, try to run the next one, again I have an issue.
The text was updated successfully, but these errors were encountered: