You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spun up some new load balancers docker hosts last night and attempted to migrate the keepalived service to those hosts but the VIP would never come up.
This is a snippet of the logs:
9/19/2018 1:56:29 PMWed Sep 19 13:56:29 2018: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(8,9)]
9/19/2018 1:56:29 PMWed Sep 19 13:56:29 2018: Script `chk_haproxy` now returning 2
9/19/2018 1:56:29 PMWed Sep 19 13:56:29 2018: VRRP_Script(chk_haproxy) failed (exited with status 2)
9/19/2018 1:56:29 PMWed Sep 19 13:56:29 2018: (lb-vips) Entering FAULT STATE
9/19/2018 1:56:29 PMWed Sep 19 13:56:29 2018: Kernel/system configuration issue causing multicast packets to be received but IP_MULTICAST_ALL unset
9/19/2018 1:56:31 PMDisplaying resulting /etc/keepalived/keepalived.conf contents...
9/19/2018 1:56:31 PMWed Sep 19 13:56:31 2018: Starting Keepalived v2.0.4 (06/24,2018), git commit v3.8.0_rc8-47-g5ec10636b6
9/19/2018 1:56:31 PMWed Sep 19 13:56:31 2018: WARNING - keepalived was build for newer Linux 4.4.6, running on Linux 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018
9/19/2018 1:56:31 PMWed Sep 19 13:56:31 2018: Opening file '/etc/keepalived/keepalived.conf'.
9/19/2018 1:56:31 PM global_defs {
9/19/2018 1:56:31 PM #Hostname will be used by default
9/19/2018 1:56:31 PM #router_id your_name
9/19/2018 1:56:31 PM vrrp_version 2
9/19/2018 1:56:31 PM vrrp_garp_master_delay 1
9/19/2018 1:56:31 PM vrrp_garp_master_refresh 60
9/19/2018 1:56:31 PM #Uncomment the next line if you'd like to use unique multicast groups
9/19/2018 1:56:31 PM #vrrp_mcast_group4 224.0.0.12
9/19/2018 1:56:31 PM script_user root
9/19/2018 1:56:31 PM }
9/19/2018 1:56:31 PM
9/19/2018 1:56:31 PM vrrp_script chk_haproxy {
9/19/2018 1:56:31 PM script "iptables -t nat -nL CATTLE_PREROUTING | grep ':80'"
9/19/2018 1:56:31 PM timeout 1
9/19/2018 1:56:31 PM interval 1 # check every 1 second
9/19/2018 1:56:31 PM fall 2 # require 2 failures for KO
9/19/2018 1:56:31 PM rise 2 # require 2 successes for OK
9/19/2018 1:56:31 PM }
9/19/2018 1:56:31 PM
9/19/2018 1:56:31 PM vrrp_instance lb-vips {
9/19/2018 1:56:31 PM state BACKUP
9/19/2018 1:56:31 PM interface eth0
9/19/2018 1:56:31 PM virtual_router_id 12
9/19/2018 1:56:31 PM priority 100
9/19/2018 1:56:31 PM advert_int 1
9/19/2018 1:56:31 PM nopreempt #Prevent fail-back
9/19/2018 1:56:31 PM track_script {
9/19/2018 1:56:31 PM chk_haproxy
9/19/2018 1:56:31 PM }
9/19/2018 1:56:31 PM authentication {
9/19/2018 1:56:31 PM auth_type PASS
9/19/2018 1:56:31 PM auth_pass blahblah
9/19/2018 1:56:31 PM }
9/19/2018 1:56:31 PM virtual_ipaddress {
9/19/2018 1:56:31 PM 10.XX.XX.12/24 dev eth0
9/19/2018 1:56:31 PM }
9/19/2018 1:56:31 PM }
9/19/2018 1:56:31 PMStarting Keepalived in the background...
9/19/2018 1:56:31 PMWed Sep 19 13:56:31 2018: daemon is already running
9/19/2018 1:56:31 PM/usr/bin/keepalived.sh: line 101: wait: pid 19 is not a child of this shell
I saw that the new hosts were using an image that was created 5 weeks ago. I went to the previous host that had the image that was created 13 months ago, tagged it & pushed it to our Docker image server. I configured the service to use that tagged image and the VIP came up on the new hosts so there's something in this new image since it's the only thing that changed.
Also, the check port script should probably be changed from grep ':${CHECK_PORT}'" to grep 'dpt:${CHECK_PORT} '" because otherwise the script could show a false positive when something is also running on port 8000 (i.e.traefik) on that host:
Thanks for your patience. Any chance you can try replacing lines 100-103 in the keepalived.sh file with what follows, rebuilding the container and seeing if that works better:
while true; do
# Check if Keepalived is STILL running by recording it's PID (if it's not running $pid will be null):
pid=$(pidof keepalived)
# If it is not, lets kill our PID1 process (this script) by breaking out of this while loop:
# This ensures Docker 'sees' the failure and handles it as necessary
if [ -z "$pid" ]; then
echo "Keepalived is no longer running, exiting so Docker can restart the container..."
break
fi
# If it is, give the CPU a rest
sleep 0.5
done
I can do so myself and test accordingly but it might be a couple of days.
Hey Cory, thanks again for your patience, I've made the necessary changes. Please rebuild, test as appropriate and let me know if you have any further issues. I've tested and it works for me.
Spun up some new load balancers docker hosts last night and attempted to migrate the keepalived service to those hosts but the VIP would never come up.
This is a snippet of the logs:
I saw that the new hosts were using an image that was created 5 weeks ago. I went to the previous host that had the image that was created 13 months ago, tagged it & pushed it to our Docker image server. I configured the service to use that tagged image and the VIP came up on the new hosts so there's something in this new image since it's the only thing that changed.
Also, the check port script should probably be changed from
grep ':${CHECK_PORT}'"
togrep 'dpt:${CHECK_PORT} '"
because otherwise the script could show a false positive when something is also running on port 8000 (i.e.traefik) on that host:The text was updated successfully, but these errors were encountered: