Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rabbitmq can reach an unusable state. Derex should help with that #132

Open
silviot opened this issue Jun 3, 2020 · 0 comments
Open

Rabbitmq can reach an unusable state. Derex should help with that #132

silviot opened this issue Jun 3, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@silviot
Copy link
Contributor

silviot commented Jun 3, 2020

RabbitMq might end up stuck in a loop while booting due to a corrupt file in its mnesia store.

Derex should provide a command to fix this (removing the 0 sized file) since it's very hard to google for that solution if you have no previous rabbitmq experience.

The command

docker run -ti --rm -v derex_rabbitmq:/var/lib/rabbitmq alpine find /var/lib/rabbitmq/mnesia/ -name '*.dets' -type f -size 0 -delete

solves the problem below. It would be nice to somehow not require users who run into this problem to run it manually.

Something like this will show up:

              RabbitMQ 3.6.16. Copyright (C) 2007-2018 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: tty
  ######  ##        tty
  ##########
              Starting broker...

=INFO REPORT==== 3-Jun-2020::10:52:41 ===
Starting RabbitMQ 3.6.16 on Erlang 20.3.4
Copyright (C) 2007-2018 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 3-Jun-2020::10:52:41 ===
node           : rabbit@8dc392bfeba9
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : rmTZsNFw1Trt4kHGx1nSjA==
log            : tty
sasl log       : tty
database dir   : /var/lib/rabbitmq/mnesia/rabbit@8dc392bfeba9

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Memory high watermark set to 6370 MiB (6679809228 bytes) of 15925 MiB (16699523072 bytes) total

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Enabling free disk space monitoring

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Disk free limit set to 50MB

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Limiting to approx 1048476 file handles (943626 sockets)

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
FHC read buffering:  OFF
FHC write buffering: ON

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
Starting rabbit_node_monitor

=CRASH REPORT==== 3-Jun-2020::10:52:42 ===
  crasher:
    initial call: rabbit_recovery_terms:init/1
    pid: <0.247.0>
    registered_name: []
    exception error: no match of right hand side value {error,
                                                        {not_a_dets_file,
                                                         "/var/lib/rabbitmq/mnesia/rabbit@8dc392bfeba9/recovery.dets"}}
      in function  rabbit_recovery_terms:open_table/0 (src/rabbit_recovery_terms.erl, line 126)
      in call from rabbit_recovery_terms:init/1 (src/rabbit_recovery_terms.erl, line 107)
      in call from gen_server:init_it/2 (gen_server.erl, line 365)
      in call from gen_server:init_it/6 (gen_server.erl, line 333)
    ancestors: [rabbit_sup,<0.175.0>]
    message_queue_len: 0
    messages: []
    links: [<0.176.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 383
  neighbours:

=CRASH REPORT==== 3-Jun-2020::10:52:42 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.174.0>
    registered_name: []
    exception exit: {bad_return,
                     {{rabbit,start,[normal,[]]},
                      {'EXIT',
                       {{badmatch,
                         {error,
                          {{{badmatch,
                             {error,
                              {not_a_dets_file,
                               "/var/lib/rabbitmq/mnesia/rabbit@8dc392bfeba9/recovery.dets"}}},
                            [{rabbit_recovery_terms,open_table,0,
                              [{file,"src/rabbit_recovery_terms.erl"},
                               {line,126}]},
                             {rabbit_recovery_terms,init,1,
                              [{file,"src/rabbit_recovery_terms.erl"},
                               {line,107}]},
                             {gen_server,init_it,2,
                              [{file,"gen_server.erl"},{line,365}]},
                             {gen_server,init_it,6,
                              [{file,"gen_server.erl"},{line,333}]},
                             {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,247}]}]},
                           {child,undefined,rabbit_recovery_terms,
                            {rabbit_recovery_terms,start_link,[]},
                            transient,30000,worker,
                            [rabbit_recovery_terms]}}}},
                        [{rabbit_queue_index,start,1,
                          [{file,"src/rabbit_queue_index.erl"},{line,491}]},
                         {rabbit_variable_queue,start,1,
                          [{file,"src/rabbit_variable_queue.erl"},{line,466}]},
                         {rabbit_priority_queue,start,1,
                          [{file,"src/rabbit_priority_queue.erl"},{line,92}]},
                         {rabbit_amqqueue,recover,0,
                          [{file,"src/rabbit_amqqueue.erl"},{line,240}]},
                         {rabbit,recover,0,
                          [{file,"src/rabbit.erl"},{line,813}]},
                         {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                          [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
                         {rabbit_boot_steps,run_step,2,
                          [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
                         {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
                          [{file,"src/rabbit_boot_steps.erl"},{line,26}]}]}}}}
      in function  application_master:init/4 (application_master.erl, line 134)
    ancestors: [<0.173.0>]
    message_queue_len: 1
    messages: [{'EXIT',<0.175.0>,normal}]
    links: [<0.173.0>,<0.33.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 273
  neighbours:

=INFO REPORT==== 3-Jun-2020::10:52:42 ===
    application: rabbit
    exited: {bad_return,
             {{rabbit,start,[normal,[]]},
              {'EXIT',
               {{badmatch,
                 {error,
                  {{{badmatch,
                     {error,
                      {not_a_dets_file,
                       "/var/lib/rabbitmq/mnesia/rabbit@8dc392bfeba9/recovery.dets"}}},
                    [{rabbit_recovery_terms,open_table,0,
                      [{file,"src/rabbit_recovery_terms.erl"},{line,126}]},
                     {rabbit_recovery_terms,init,1,
                      [{file,"src/rabbit_recovery_terms.erl"},{line,107}]},
                     {gen_server,init_it,2,
                      [{file,"gen_server.erl"},{line,365}]},
                     {gen_server,init_it,6,
                      [{file,"gen_server.erl"},{line,333}]},
                     {proc_lib,init_p_do_apply,3,
                      [{file,"proc_lib.erl"},{line,247}]}]},
                   {child,undefined,rabbit_recovery_terms,
                    {rabbit_recovery_terms,start_link,[]},
                    transient,30000,worker,
                    [rabbit_recovery_terms]}}}},
                [{rabbit_queue_index,start,1,
                  [{file,"src/rabbit_queue_index.erl"},{line,491}]},
                 {rabbit_variable_queue,start,1,
                  [{file,"src/rabbit_variable_queue.erl"},{line,466}]},
                 {rabbit_priority_queue,start,1,
                  [{file,"src/rabbit_priority_queue.erl"},{line,92}]},
                 {rabbit_amqqueue,recover,0,
                  [{file,"src/rabbit_amqqueue.erl"},{line,240}]},
                 {rabbit,recover,0,[{file,"src/rabbit.erl"},{line,813}]},
                 {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                  [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
                 {rabbit_boot_steps,run_step,2,
                  [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
                 {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
                  [{file,"src/rabbit_boot_steps.erl"},{line,26}]}]}}}}
    type: transient
2020-06-03 10:52:43 Error in process ~p with exit value:~n~p~n
<0.5.0>
{badarg,[{ets,lookup,[ac_tab,{env,rabbit,error_logger}],[]},{application_controller,get_env,2,[{file,"application_controller.erl"},{line,332}]},{rabbit,log_location,1,[{file,"src/rabbit.erl"},{line,893}]},{rabbit,boot_error,2,[{file,"src/rabbit.erl"},{line,786}]},{rabbit,start_it,1,[{file,"src/rabbit.erl"},{line,430}]},{init,start_em,1,[]},{init,do_boot,3,[]}]}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{badmatch,{error,{{{badmatch,{error,{not_a_dets_file,\"/var/lib/rabbitmq/mnesia/rabbit@8dc392bfeba9/recovery.dets\"}}},[{rabbit_recovery_terms,open_table,0,[{file,\"src/rabbit_recovery_terms.erl\"},{line,126}]},{rabbit_recovery_terms,init,1,[{file,\"src/rabbit_recovery_terms.erl\"},{line,107}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,365}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,333}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,247}]}]},{child,undefined,rabbit_recovery_terms,{rabbit_recovery_terms,start_link,[]},transient,30000,worker,[rabbit_recovery_terms]}}}},[{rabbit_queue_index,start,1,[{file,\"src/rabbit_queue_index.erl\"},{line,491}]},{rabbit_variable_queue,start,1,[{file,\"src/rabbit_variable_queue.erl\"},{line,466}]},{rabbit_priority_queue,start,1,[{file,\"src/rabbit_priority_queue.erl\"},{line,92}]},{rabbit_amqqueue,recover,0,[{file,\"src/rabbit_amqqueue.erl\"},{line,240}]},{rabbit,recover,0,[{file,\"src/rabbit.erl\"},{line,813}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]}]}}}}}"}
		Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{badmatch,{error,{{{badmatch,{error,{not_a_dets_file,"/var/lib/rabbit

Crash dump is being written to: erl_crash.dump...
@chiruzzimarco chiruzzimarco added the enhancement New feature or request label Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants