Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphQL SSE crashes on large message stanza payload #4208

Closed
stsisnevich-hj opened this issue Jan 12, 2024 · 2 comments
Closed

GraphQL SSE crashes on large message stanza payload #4208

stsisnevich-hj opened this issue Jan 12, 2024 · 2 comments

Comments

@stsisnevich-hj
Copy link

MongooseIM version: 6.2.0-44-g07a168ed1
Installed from: Helm Chart using mongooseim/mongooseim:latest image
Erlang/OTP version: Erlang/OTP 26 [erts-14.1.1] [source] [64-bit] [smp:8:1] [ds:8:1:10] [async-threads:1] [jit:ns]

Context

I am currently running MongooseIM in a Kubernetes cluster (EKS AWS) using the official Helm chart from ESL/MongooseHelm to provide a chat feature for my application. The client chat binding utilizes WebSocket to interact with chat. To make the chat interactive, a system bot is running and interacting with the chat using c2s XML stream, consuming and responding to user messages. Recently, I started a migration to GraphQL API to unify the communication approach on both sides and simplify the client implementation by achieving segregation between the consuming of messages and actions in the chat.

Issue

I have encountered a problem since migrating to GraphQL which is related to the SSE stream in MongooseIM. The problem occurs when broadcasting a large message stanza payload (~2000 characters) via GraphQL SSE stream. This action leads to a crash in the SSE stream which leads to the closing of EventSource, but the user's session becomes stuck in a live state (cannot be kicked) for an indefinite period of time. Subsequent reconnection attempts do not resolve the issue, and the SSE does not work (no messages broadcasted) for the affected user until the server is restarted. Meanwhile, WS and c2s don't crash on such messages and continue to work when SSE stream is broken.

Additional Information:

  • I intentionally use the MongooseIM latest version from the master branch (docker image latest tag), because I was struggling with SSE connection timeouts and they were addressed by merging the following MR: Increase idle_timeout for SSE connections #4196.
  • I am relying on mod_muc with mnesia backend and mod_mam with rdbms backend
  • Usually GraphQL SSE stream fails on 1600+ chars in message body, but always on 2000+

Attachments:

mongooseim.json.1

{"when":"2024-01-11T12:09:41.025376+00:00","what":"mongoose_graphql_sse_handler_failed","text":{"id":"28","data":["{\"data\":{\"stanza\":{\"subscribeForMessages\":{\"timestamp\":\"2024-01-11T12:09:41.025198Z\",\"stanza_id\":\"a99f7e0ee96f429bbeabb6e5b06fdcc9\",\"stanza\":\"<message id='a99f7e0ee96f429bbeabb6e5b06fdcc9' to='[email protected]' from='alice@xmpp-dev/1704-974935-740798-752907a8916cb029' type='groupchat'><body>test 1600 tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt</body></message>\",\"sender\":","\"[email protected]/Alice\"}}}}"]},"reason":"badarg","meta":{"pid":"<0.1362.0>","mfa":"{mongoose_graphql_sse_handler,handle_error,3}","line":"94","file":"/home/circleci/project/src/graphql/mongoose_graphql_sse_handler.erl"},"level":"error"}
{"when":"2024-01-11T12:09:41.715586+00:00","report":[["{initial_call,{cowboy_stream_h,request_process,['Argument__1','Argument__2','Argument__3']}}","{pid,<0.1362.0>}","{registered_name,[]}","{error_info,{error,{badrecord,ok},[{lasse_handler,terminate,3,...},{cowboy_loop,call,...},{cowboy_stream_h,...},{...}|...]}}","{ancestors,[<0.1361.0>,<0.1110.0>,<0.1107.0>,<0.1106.0>,ranch_sup,<0.250.0>]}","{message_queue_len,0}","{messages,[]}","{links,[<0.1361.0>]}","{dictionary,[{rand_seed,{#{type => exsss,next => #Fun<rand.0.65977474>,bits => 58,...},[29127266007832741|184983864797656587]}}]}","{trap_exit,true}","{status,running}","{heap_size,4185}","{stack_size,28}","{reductions,64816}"],""],"meta":{"pid":"<0.1362.0>","mfa":"{proc_lib,crash_report,4}","logger_formatter":{"title":"CRASH REPORT"},"line":"584","file":"proc_lib.erl","error_logger":{"type":"crash_report","tag":"error_report"},"domain":["otp","sasl"]},"level":"error","label":"{proc_lib,crash}"}
{"when":"2024-01-11T12:09:41.716895+00:00","unstructured_log":"Ranch listener 'ejabberd_cowboy_0.0.0.0_8088', connection process <0.1361.0>, stream 1 had its request process <0.1362.0> exit with reason {badrecord,ok} and stacktrace [{lasse_handler,terminate,3,[{file,\"/home/circleci/project/_build/default/lib/lasse/src/lasse_handler.erl\"},{line,106}]},{cowboy_loop,call,5,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_loop.erl\"},{line,81}]},{cowboy_stream_h,execute,3,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl\"},{line,306}]},{cowboy_stream_h,request_process,3,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl\"},{line,295}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,241}]}]\n","meta":{"pid":"<0.1361.0>","error_logger":{"tag":"error"}},"level":"error"}

error.log

2024-01-11 12:09:41.715 [error] <0.1362.0>@lasse_handler:terminate:106 CRASH REPORT Process <0.1362.0> with 0 neighbours crashed with reason: bad record ok in lasse_handler:terminate/3 line 106
2024-01-11 12:09:41.717 [error] <0.1362.0>@lasse_handler:terminate:106 Cowboy stream 1 with ranch listener 'ejabberd_cowboy_0.0.0.0_8088' and connection process <0.1361.0> had its request process exit with reason: bad record ok in lasse_handler:terminate/3 line 106

crash.log

2024-01-11 12:09:41 =CRASH REPORT====
  crasher:
    initial call: cowboy_stream_h:request_process/3
    pid: <0.1362.0>
    registered_name: []
    exception error: {{badrecord,ok},[{lasse_handler,terminate,3,[{file,"/home/circleci/project/_build/default/lib/lasse/src/lasse_handler.erl"},{line,106}]},{cowboy_loop,call,5,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_loop.erl"},{line,81}]},{cowboy_stream_h,execute,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,306}]},{cowboy_stream_h,request_process,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,295}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}
    ancestors: [<0.1361.0>,<0.1110.0>,<0.1107.0>,<0.1106.0>,ranch_sup,<0.250.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1361.0>]
    dictionary: [{rand_seed,{#{type => exsss,next => #Fun<rand.0.65977474>,bits => 58,uniform => #Fun<rand.1.65977474>,uniform_n => #Fun<rand.2.65977474>,jump => #Fun<rand.3.65977474>},[29127266007832741|184983864797656587]}}]
    trap_exit: true
    status: running
    heap_size: 4185
    stack_size: 28
    reductions: 64816
  neighbours:
2024-01-11 12:09:41 =ERROR REPORT====
Ranch listener 'ejabberd_cowboy_0.0.0.0_8088', connection process <0.1361.0>, stream 1 had its request process <0.1362.0> exit with reason {badrecord,ok} and stacktrace [{lasse_handler,terminate,3,[{file,"/home/circleci/project/_build/default/lib/lasse/src/lasse_handler.erl"},{line,106}]},{cowboy_loop,call,5,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_loop.erl"},{line,81}]},{cowboy_stream_h,execute,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,306}]},{cowboy_stream_h,request_process,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,295}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]

Questions

Might something be wrong on the configuration level in my case?
Are there any suggestions to mitigate this issue?

Thank you for your attention to this matter. Let me know if further clarification or details are needed.

@gustawlippa
Copy link
Contributor

Thank you for reporting the issue.
It seems that this is a problem with the Cowboy library, but we'll investigate, as we've already encountered some problems with SSE configuration in the past.

@JanuszJakubiec
Copy link
Contributor

Hi, the error was investigated and fixed in PR #4364.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants