You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently running MongooseIM in a Kubernetes cluster (EKS AWS) using the official Helm chart from ESL/MongooseHelm to provide a chat feature for my application. The client chat binding utilizes WebSocket to interact with chat. To make the chat interactive, a system bot is running and interacting with the chat using c2s XML stream, consuming and responding to user messages. Recently, I started a migration to GraphQL API to unify the communication approach on both sides and simplify the client implementation by achieving segregation between the consuming of messages and actions in the chat.
Issue
I have encountered a problem since migrating to GraphQL which is related to the SSE stream in MongooseIM. The problem occurs when broadcasting a large message stanza payload (~2000 characters) via GraphQL SSE stream. This action leads to a crash in the SSE stream which leads to the closing of EventSource, but the user's session becomes stuck in a live state (cannot be kicked) for an indefinite period of time. Subsequent reconnection attempts do not resolve the issue, and the SSE does not work (no messages broadcasted) for the affected user until the server is restarted. Meanwhile, WS and c2s don't crash on such messages and continue to work when SSE stream is broken.
Additional Information:
I intentionally use the MongooseIM latest version from the master branch (docker image latest tag), because I was struggling with SSE connection timeouts and they were addressed by merging the following MR: Increase idle_timeout for SSE connections #4196.
I am relying on mod_muc with mnesia backend and mod_mam with rdbms backend
Usually GraphQL SSE stream fails on 1600+ chars in message body, but always on 2000+
Attachments:
mongooseim.json.1
{"when":"2024-01-11T12:09:41.025376+00:00","what":"mongoose_graphql_sse_handler_failed","text":{"id":"28","data":["{\"data\":{\"stanza\":{\"subscribeForMessages\":{\"timestamp\":\"2024-01-11T12:09:41.025198Z\",\"stanza_id\":\"a99f7e0ee96f429bbeabb6e5b06fdcc9\",\"stanza\":\"<message id='a99f7e0ee96f429bbeabb6e5b06fdcc9' to='[email protected]' from='alice@xmpp-dev/1704-974935-740798-752907a8916cb029' type='groupchat'><body>test 1600 tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt</body></message>\",\"sender\":","\"[email protected]/Alice\"}}}}"]},"reason":"badarg","meta":{"pid":"<0.1362.0>","mfa":"{mongoose_graphql_sse_handler,handle_error,3}","line":"94","file":"/home/circleci/project/src/graphql/mongoose_graphql_sse_handler.erl"},"level":"error"}
{"when":"2024-01-11T12:09:41.715586+00:00","report":[["{initial_call,{cowboy_stream_h,request_process,['Argument__1','Argument__2','Argument__3']}}","{pid,<0.1362.0>}","{registered_name,[]}","{error_info,{error,{badrecord,ok},[{lasse_handler,terminate,3,...},{cowboy_loop,call,...},{cowboy_stream_h,...},{...}|...]}}","{ancestors,[<0.1361.0>,<0.1110.0>,<0.1107.0>,<0.1106.0>,ranch_sup,<0.250.0>]}","{message_queue_len,0}","{messages,[]}","{links,[<0.1361.0>]}","{dictionary,[{rand_seed,{#{type => exsss,next => #Fun<rand.0.65977474>,bits => 58,...},[29127266007832741|184983864797656587]}}]}","{trap_exit,true}","{status,running}","{heap_size,4185}","{stack_size,28}","{reductions,64816}"],""],"meta":{"pid":"<0.1362.0>","mfa":"{proc_lib,crash_report,4}","logger_formatter":{"title":"CRASH REPORT"},"line":"584","file":"proc_lib.erl","error_logger":{"type":"crash_report","tag":"error_report"},"domain":["otp","sasl"]},"level":"error","label":"{proc_lib,crash}"}
{"when":"2024-01-11T12:09:41.716895+00:00","unstructured_log":"Ranch listener 'ejabberd_cowboy_0.0.0.0_8088', connection process <0.1361.0>, stream 1 had its request process <0.1362.0> exit with reason {badrecord,ok} and stacktrace [{lasse_handler,terminate,3,[{file,\"/home/circleci/project/_build/default/lib/lasse/src/lasse_handler.erl\"},{line,106}]},{cowboy_loop,call,5,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_loop.erl\"},{line,81}]},{cowboy_stream_h,execute,3,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl\"},{line,306}]},{cowboy_stream_h,request_process,3,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl\"},{line,295}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,241}]}]\n","meta":{"pid":"<0.1361.0>","error_logger":{"tag":"error"}},"level":"error"}
error.log
2024-01-11 12:09:41.715 [error] <0.1362.0>@lasse_handler:terminate:106 CRASH REPORT Process <0.1362.0> with 0 neighbours crashed with reason: bad record ok in lasse_handler:terminate/3 line 106
2024-01-11 12:09:41.717 [error] <0.1362.0>@lasse_handler:terminate:106 Cowboy stream 1 with ranch listener 'ejabberd_cowboy_0.0.0.0_8088' and connection process <0.1361.0> had its request process exit with reason: bad record ok in lasse_handler:terminate/3 line 106
Thank you for reporting the issue.
It seems that this is a problem with the Cowboy library, but we'll investigate, as we've already encountered some problems with SSE configuration in the past.
MongooseIM version: 6.2.0-44-g07a168ed1
Installed from: Helm Chart using
mongooseim/mongooseim:latest
imageErlang/OTP version: Erlang/OTP 26 [erts-14.1.1] [source] [64-bit] [smp:8:1] [ds:8:1:10] [async-threads:1] [jit:ns]
Context
I am currently running MongooseIM in a Kubernetes cluster (EKS AWS) using the official Helm chart from ESL/MongooseHelm to provide a chat feature for my application. The client chat binding utilizes WebSocket to interact with chat. To make the chat interactive, a system bot is running and interacting with the chat using c2s XML stream, consuming and responding to user messages. Recently, I started a migration to GraphQL API to unify the communication approach on both sides and simplify the client implementation by achieving segregation between the consuming of messages and actions in the chat.
Issue
I have encountered a problem since migrating to GraphQL which is related to the SSE stream in MongooseIM. The problem occurs when broadcasting a large message stanza payload (~2000 characters) via GraphQL SSE stream. This action leads to a crash in the SSE stream which leads to the closing of EventSource, but the user's session becomes stuck in a live state (cannot be kicked) for an indefinite period of time. Subsequent reconnection attempts do not resolve the issue, and the SSE does not work (no messages broadcasted) for the affected user until the server is restarted. Meanwhile, WS and c2s don't crash on such messages and continue to work when SSE stream is broken.
Additional Information:
master
branch (docker imagelatest
tag), because I was struggling with SSE connection timeouts and they were addressed by merging the following MR: Increase idle_timeout for SSE connections #4196.mod_muc
withmnesia
backend andmod_mam
withrdbms
backendAttachments:
mongooseim.json.1
error.log
crash.log
Questions
Might something be wrong on the configuration level in my case?
Are there any suggestions to mitigate this issue?
Thank you for your attention to this matter. Let me know if further clarification or details are needed.
The text was updated successfully, but these errors were encountered: