5.关闭MHA 

masterha_stop --conf=/usr/local/mha/ha1/ha1.cnf

三、配置Manage

四、配置relay_log的清除方式(在种种Node上)

(一)全数Node的cnf配置文件加上

relay_log_purge=0

MHA在发出切换的长河中,从库的复原进度中依据于relay
log的相干新闻,所以这里要将relay
log的活动清除装置为OFF,采纳手动清除relay log的主意。

在暗许情状下,从服务器上的连结日志会在SQL线程施行完毕后被活动删除。不过在MHA遭遇中,这个中继日志在还原其余从服务器时只怕会被用到,由此要求禁止使用中继日志的自发性删除功用。定期清除中继日志须要考虑到复制延时的标题。在ext三的文件系统下,删除大的公文要求一定的光阴,会导致严重的复制延时。为了防止复制延时,必要临时为过渡日志制造硬链接,因为在linux系统中通过硬链接删除大文件速度会比较快。

唤醒:在mysql数据库中,删除大表时,经常也利用创建硬链接的艺术

MHA节点中富含了pure_relay_logs命令工具,它可感到过渡日志创设硬链接,实践SET
GLOBAL
relay_log_purge=1,等待几分钟以便SQL线程切换成新的对接日志,再实行SET
GLOBAL relay_log_purge=0。

pure_relay_logs脚本参数如下所示:

--user mysql                      用户名
--password mysql                  密码
--port                            端口号
--workdir                         指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge         默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。

(2)在每台slave Node上创建

vim /usr/local/mha/purge_relay_log.sh 

#!/bin/bash
user=root
passwd=root  ####确保用户和密码能通过127.0.0.1登入
host='127.0.0.1'
port=3306
work_dir='/mysql/data'
purge='/usr/local/mha/bin/purge_relay_logs'

$purge --user=$user --password=$passwd --host=$host --disable_relay_log_purge --port=$port --workdir=$work_dir >> /usr/local/mha/purge_relay_logs.log 2>&1

chmod u+x /usr/local/mha/purge_relay_log.sh 

将脚本加入到os定期任务中

图片 1 

四、配置relay_log的解除格局(在各种Node上)

(壹)全数Node的cnf配置文件加上

relay_log_purge=0

MHA在发出切换的经过中,从库的卷土而来进度中依靠于relay
log的连带音讯,所以那边要将relay
log的电动清除装置为OFF,选拔手动清除relay log的主意。

在私下认可景况下,从服务器上的连通日志会在SQL线程实行完成后被电动删除。不过在MHA遭遇中,那一个中继日志在平复其余从服务器时大概会被用到,因而必要禁止使用中继日志的自行删除功用。定时清除中继日志须要思量到复制延时的主题素材。在ext三的文件系统下,删除大的文件需求一定的光阴,会促成严重的复制延时。为了防止复制延时,须求权且为过渡日志创制硬链接,因为在linux系统中经过硬链接删除大文件速度会极快。

提示:在mysql数据库中,删除大表时,平常也利用创建硬链接的不二秘籍

MHA节点中隐含了pure_relay_logs命令工具,它可感觉过渡日志创设硬链接,施行SET
GLOBAL
relay_log_purge=一,等待几分钟以便SQL线程切换来新的对接日志,再进行SET
GLOBAL relay_log_purge=0。

pure_relay_logs脚本参数如下所示:

--user mysql                      用户名
--password mysql                  密码
--port                            端口号
--workdir                         指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge         默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。

(2)在每台slave Node上创建

vim /usr/local/mha/purge_relay_log.sh 

#!/bin/bash
user=root
passwd=root  ####确保用户和密码能通过127.0.0.1登入
host='127.0.0.1'
port=3306
work_dir='/mysql/data'
purge='/usr/local/mha/bin/purge_relay_logs'

$purge --user=$user --password=$passwd --host=$host --disable_relay_log_purge --port=$port --workdir=$work_dir >> /usr/local/mha/purge_relay_logs.log 2>&1

chmod u+x /usr/local/mha/purge_relay_log.sh 

将脚本出席到os定期职务中

图片 2 

1.检查ssh配置

masterha_check_ssh  --conf=/usr/local/mha/ha1/ha1.cnf

[root@monitor ha1]# masterha_check_ssh --conf=/usr/local/mha/ha1/ha1.cnf
Thu Aug 25 14:53:30 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Aug 25 14:53:30 2016 - [info] Reading application default configurations from /usr/local/mha/ha1/ha1.cnf..
Thu Aug 25 14:53:30 2016 - [info] Reading server configurations from /usr/local/mha/ha1/ha1.cnf..
Thu Aug 25 14:53:30 2016 - [info] Starting SSH connection tests..
Thu Aug 25 14:53:35 2016 - [debug] 
Thu Aug 25 14:53:31 2016 - [debug]  Connecting via SSH from root@192.168.137.20(192.168.137.20:22) to root@192.168.137.10(192.168.137.10:22)..
Thu Aug 25 14:53:33 2016 - [debug]   ok.
Thu Aug 25 14:53:33 2016 - [debug]  Connecting via SSH from root@192.168.137.20(192.168.137.20:22) to root@192.168.137.30(192.168.137.30:22)..
Thu Aug 25 14:53:34 2016 - [debug]   ok.
Thu Aug 25 14:53:35 2016 - [debug] 
Thu Aug 25 14:53:31 2016 - [debug]  Connecting via SSH from root@192.168.137.30(192.168.137.30:22) to root@192.168.137.10(192.168.137.10:22)..
Thu Aug 25 14:53:33 2016 - [debug]   ok.
Thu Aug 25 14:53:33 2016 - [debug]  Connecting via SSH from root@192.168.137.30(192.168.137.30:22) to root@192.168.137.20(192.168.137.20:22)..
Thu Aug 25 14:53:34 2016 - [debug]   ok.
Thu Aug 25 14:53:36 2016 - [debug] 
Thu Aug 25 14:53:30 2016 - [debug]  Connecting via SSH from root@192.168.137.10(192.168.137.10:22) to root@192.168.137.20(192.168.137.20:22)..
Thu Aug 25 14:53:34 2016 - [debug]   ok.
Thu Aug 25 14:53:34 2016 - [debug]  Connecting via SSH from root@192.168.137.10(192.168.137.10:22) to root@192.168.137.30(192.168.137.30:22)..
Thu Aug 25 14:53:35 2016 - [debug]   ok.
Thu Aug 25 14:53:36 2016 - [info] All SSH connection tests passed successfully.

能够见到各类Node到别的的Node都是相通的。

4.send_report

图片 3#!/usr/bin/perl
# Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software;
you can redistribute it and/or modify # it under the terms of the GNU
General Public License as published by # the Free Software Foundation;
either version 2 of the License, or # (at your option) any later
version. # # This program is distributed in the hope that it will be
useful, # but WITHOUT ANY WARRANTY; without even the implied warranty
of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
GNU General Public License for more details. # # You should have
received a copy of the GNU General Public License # along with this
program; if not, write to the Free Software # Foundation, Inc., # 51
Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This
is a sample script and is not complete. Modify the script based on your
environment. use strict; use warnings FATAL => ‘all’; use
Mail::Sender; use Getopt::Long; #new_master_host and
new_slave_hosts are set only when recovering master succeeded my (
$dead_master_host, $new_master_host, $new_slave_hosts, $subject,
$body ); my $smtp=’smtp.163.com’; my $mail_from=’xxxx’; my
$mail_user=’xxxxx’; my $mail_pass=’xxxxx’; my
$mail_to=[‘xxxx’,’xxxx’]; GetOptions( ‘orig_master_host=s’ =>
\$dead_master_host, ‘new_master_host=s’ => \$new_master_host,
‘new_slave_hosts=s’ => \$new_slave_hosts, ‘subject=s’ =>
\$subject, ‘body=s’ => \$body, );
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts { my ( $smtp, $mail_from, $user, $passwd, $mail_to,
$subject, $msg ) = @_; open my $DEBUG, “> /tmp/monitormail.log” or
die “Can’t open the debug file:$!\n”; my $sender = new Mail::Sender {
ctype => ‘text/plain; charset=utf-8’, encoding => ‘utf-8’, smtp
=> $smtp, from => $mail_from, auth => ‘LOGIN’, TLS_allowed
=> ‘0’, authid => $user, authpwd => $passwd, to =>
$mail_to, subject => $subject, debug => $DEBUG };
$sender->MailMsg( { msg => $msg, debug => $DEBUG } ) or print
$Mail::Sender::Error; return 1; } # Do whatever you want here exit 0;
View Code
这里得先安装mutt,安装格局这里不做牵线  

一.配备SSH无密码登6

(壹)在manage配置到具备Node节点的无密码登6

ssh-keygen -t rsa   一直enter,会在/root/.ssh/下面生成id_rsa.pub
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.10 
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.20
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.30

(二)在Node 10布局到Node 20,30的无密码登六

ssh-keygen -t rsa 
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.20
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.30

(三)在Node 20计划到Node 十,30的无密码登录

ssh-keygen -t rsa 
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.10
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.30

(4)在Node 30安排到Node 十,20的无密码登六

ssh-keygen -t rsa 
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.10
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.137.20

2.不在线手动Failover 

专注:前提条件是mha未有运维,且存在dead的master,MHA
manager检查测试到没有dead的server,将报错,并终止failover。

手动failover,这种气象意味着在事情上未有启用MHA自动切换功用,当主服务器故障时,人工手动调用MHA来开始展览故障切换操作,具体命令如下:

语句如下:

masterha_master_switch --master_state=dead --conf=/usr/local/mha/ha1/ha1.cnf --dead_master_host=192.168.137.10 --dead_master_port=3306 --new_master_host=192.168.137.20 --new_master_port=3306 --ignore_fail_on_start  --ignore_last_failover

切换的进程中会存在五回索要输入“yes”举行下一步

图片 4[[email protected]
ha1]# masterha_master_switch –master_state=dead
–conf=/usr/local/mha/ha1/ha1.cnf –dead_master_host=192.168.137.10
–dead_master_port=3306 –new_master_host=192.168.137.20
–new_master_port=3306 –ignore_fail_on_start
–ignore_last_failover –dead_master_ip=<dead_master_ip> is
not set. Using 192.168.137.10. Fri Aug 26 17:44:10 2016 – [warning]
Global configuration file /etc/masterha_default.cnf not found.
Skipping. Fri Aug 26 17:44:10 2016 – [info] Reading application
default configurations from /usr/local/mha/ha1/ha1.cnf.. Fri Aug 26
17:44:10 2016 – [info] Reading server configurations from
/usr/local/mha/ha1/ha1.cnf.. Fri Aug 26 17:44:10 2016 – [info]
MHA::MasterFailover version 0.55. Fri Aug 26 17:44:10 2016 – [info]
Starting master failover. Fri Aug 26 17:44:10 2016 – [info] Fri Aug 26
17:44:10 2016 – [info] * Phase 1: Configuration Check Phase.. Fri Aug
26 17:44:10 2016 – [info] Fri Aug 26 17:44:11 2016 – [info] Dead
Servers: Fri Aug 26 17:44:11 2016 – [info]
192.168.137.10(192.168.137.10:3306) Fri Aug 26 17:44:11 2016 – [info]
Checking master reachability via mysql(double check).. Fri Aug 26
17:44:11 2016 – [info] ok. Fri Aug 26 17:44:11 2016 – [info] Alive
Servers: Fri Aug 26 17:44:11 2016 – [info]
192.168.137.20(192.168.137.20:3306) Fri Aug 26 17:44:11 2016 – [info]
192.168.137.30(192.168.137.30:3306) Fri Aug 26 17:44:11 2016 – [info]
Alive Slaves: Fri Aug 26 17:44:11 2016 – [info]
192.168.137.20(192.168.137.20:3306) Version=5.6.15-log (oldest major
version between slaves) log-bin:enabled Fri Aug 26 17:44:11 2016 –
[info] Replicating from 192.168.137.10(192.168.137.10:3306) Fri Aug 26
17:44:11 2016 – [info] Primary candidate for the new Master
(candidate_master is set) Fri Aug 26 17:44:11 2016 – [info]
192.168.137.30(192.168.137.30:3306) Version=5.6.15-log (oldest major
version between slaves) log-bin:enabled Fri Aug 26 17:44:11 2016 –
[info] Replicating from 192.168.137.10(192.168.137.10:3306) Fri Aug 26
17:44:11 2016 – [info] Not candidate for the new Master (no_master is
set) Master 192.168.137.10 is dead. Proceed? (yes/NO): yes Fri Aug 26
17:44:18 2016 – [info] ** Phase 1: Configuration Check Phase
completed. Fri Aug 26 17:44:18 2016 – [info] Fri Aug 26 17:44:18 2016

  • [info] * Phase 2: Dead Master Shutdown Phase.. Fri Aug 26 17:44:18
    2016 – [info] Fri Aug 26 17:44:20 2016 – [info] HealthCheck: SSH to
    192.168.137.10 is reachable. Fri Aug 26 17:44:22 2016 – [info] Forcing
    shutdown so that applications never connect to the current master.. Fri
    Aug 26 17:44:22 2016 – [info] Executing master IP deactivatation
    script: Fri Aug 26 17:44:22 2016 – [info]
    /usr/local/mha/ha1/fail_script/master_ip_failover
    –orig_master_host=192.168.137.10 –orig_master_ip=192.168.137.10
    –orig_master_port=3306 –command=stopssh –ssh_user=root IN SCRIPT
    TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1
    192.168.137.50/24=== Disabling the VIP on old master: 192.168.137.10 Fri
    Aug 26 17:44:23 2016 – [info] done. Fri Aug 26 17:44:23 2016 –
    [warning] shutdown_script is not set. Skipping explicit shutting down
    of the dead master. Fri Aug 26 17:44:23 2016 – [info] * Phase 2: Dead
    Master Shutdown Phase completed. Fri Aug 26 17:44:23 2016 – [info] Fri
    Aug 26 17:44:23 2016 – [info] * Phase 3: Master Recovery Phase.. Fri
    Aug 26 17:44:23 2016 – [info] Fri Aug 26 17:44:23 2016 – [info] *
    Phase 3.1: Getting Latest Slaves Phase.. Fri Aug 26 17:44:23 2016 –
    [info] Fri Aug 26 17:44:23 2016 – [info] The latest binary log
    file/position on all slaves is mysql-bin.000144:120 Fri Aug 26 17:44:23
    2016 – [info] Latest slaves (Slaves that received relay log files to
    the latest): Fri Aug 26 17:44:23 2016 – [info]
    192.168.137.20(192.168.137.20:3306) Version=5.6.15-log (oldest major
    version between slaves) log-bin:enabled Fri Aug 26 17:44:23 2016 –
    [info] Replicating from 192.168.137.10(192.168.137.10:3306) Fri Aug 26
    17:44:23 2016 – [info] Primary candidate for the new Master
    (candidate_master is set) Fri Aug 26 17:44:23 2016 – [info]
    192.168.137.30(192.168.137.30:3306) Version=5.6.15-log (oldest major
    version between slaves) log-bin:enabled Fri Aug 26 17:44:23 2016 –
    [info] Replicating from 192.168.137.10(192.168.137.10:3306) Fri Aug 26
    17:44:23 2016 – [info] Not candidate for the new Master (no_master is
    set) Fri Aug 26 17:44:23 2016 – [info] The oldest binary log
    file/position on all slaves is mysql-bin.000144:120 Fri Aug 26 17:44:23
    2016 – [info] Oldest slaves: Fri Aug 26 17:44:23 2016 – [info]
    192.168.137.20(192.168.137.20:3306) Version=5.6.15-log (oldest major
    version between slaves) log-bin:enabled Fri Aug 26 17:44:23 2016 –
    [info] Replicating from 192.168.137.10(192.168.137.10:3306) Fri Aug 26
    17:44:23 2016 – [info] Primary candidate for the new Master
    (candidate_master is set) Fri Aug 26 17:44:23 2016 – [info]
    192.168.137.30(192.168.137.30:3306) Version=5.6.15-log (oldest major
    version between slaves) log-bin:enabled Fri Aug 26 17:44:23 2016 –
    [info] Replicating from 192.168.137.10(192.168.137.10:3306) Fri Aug 26
    17:44:23 2016 – [info] Not candidate for the new Master (no_master is
    set) Fri Aug 26 17:44:23 2016 – [info] Fri Aug 26 17:44:23 2016 –
    [info] * Phase 3.2: Saving Dead Master’s Binlog Phase.. Fri Aug 26
    17:44:23 2016 – [info] Fri Aug 26 17:44:24 2016 – [info] Fetching
    dead master’s binary logs.. Fri Aug 26 17:44:24 2016 – [info]
    Executing command on the dead master
    192.168.137.10(192.168.137.10:3306): save_binary_logs –command=save
    –start_file=mysql-bin.000144 –start_pos=120 –binlog_dir=/mysql/log
    –output_file=/tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    –handle_raw_binlog=1 –disable_log_bin=0 –manager_version=0.55
    Creating /tmp if not exists.. ok. Concat binary/relay logs from
    mysql-bin.000144 pos 120 to mysql-bin.000144 EOF into
    /tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    .. Dumping binlog format description event, from position 0 to 120.. ok.
    Dumping effective binlog data from /mysql/log/mysql-bin.000144 position
    120 to tail(143).. ok. Concat succeeded.
    saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    100% 143 0.1KB/s 00:00 Fri Aug 26 17:44:27 2016 – [info] scp from
    [email protected]192.168.137.10:/tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    to
    local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    succeeded. Fri Aug 26 17:44:29 2016 – [info] HealthCheck: SSH to
    192.168.137.20 is reachable. Fri Aug 26 17:44:31 2016 – [info]
    HealthCheck: SSH to 192.168.137.30 is reachable. Fri Aug 26 17:44:31
    2016 – [info] Fri Aug 26 17:44:31 2016 – [info] * Phase 3.3:
    Determining New Master Phase.. Fri Aug 26 17:44:31 2016 – [info] Fri
    Aug 26 17:44:31 2016 – [info] Finding the latest slave that has all
    relay logs for recovering other slaves.. Fri Aug 26 17:44:31 2016 –
    [info] All slaves received relay logs to the same position. No need to
    resync each other. Fri Aug 26 17:44:31 2016 – [info] 192.168.137.20
    can be new master. Fri Aug 26 17:44:31 2016 – [info] New master is
    192.168.137.20(192.168.137.20:3306) Fri Aug 26 17:44:31 2016 – [info]
    Starting master failover.. Fri Aug 26 17:44:31 2016 – [info] From:
    192.168.137.10 (current master) +–192.168.137.20 +–192.168.137.30 To:
    192.168.137.20 (new master) +–192.168.137.30 Starting master switch
    from 192.168.137.10(192.168.137.10:3306) to
    192.168.137.20(192.168.137.20:3306)? (yes/NO): yes Fri Aug 26 17:44:40
    2016 – [info] New master decided manually is
    192.168.137.20(192.168.137.20:3306) Fri Aug 26 17:44:40 2016 – [info]
    Fri Aug 26 17:44:40 2016 – [info] * Phase 3.3: New Master Diff Log
    Generation Phase.. Fri Aug 26 17:44:40 2016 – [info] Fri Aug 26
    17:44:40 2016 – [info] This server has all relay logs. No need to
    generate diff files from the latest slave. Fri Aug 26 17:44:40 2016 –
    [info] Sending binlog..
    saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    100% 143 0.1KB/s 00:00 Fri Aug 26 17:44:42 2016 – [info] scp from
    local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    to
    [email protected]192.168.137.20:/tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    succeeded. Fri Aug 26 17:44:42 2016 – [info] Fri Aug 26 17:44:42 2016
  • [info] * Phase 3.4: Master Log Apply Phase.. Fri Aug 26 17:44:42
    2016 – [info] Fri Aug 26 17:44:42 2016 – [info] *NOTICE: If any
    error happens from this phase, manual recovery is needed. Fri Aug 26
    17:44:42 2016 – [info] Starting recovery on
    192.168.137.20(192.168.137.20:3306).. Fri Aug 26 17:44:42 2016 –
    [info] Generating diffs succeeded. Fri Aug 26 17:44:42 2016 – [info]
    Waiting until all relay logs are applied. Fri Aug 26 17:44:42 2016 –
    [info] done. Fri Aug 26 17:44:42 2016 – [info] Getting slave
    status.. Fri Aug 26 17:44:42 2016 – [info] This
    slave(192.168.137.20)’s Exec_Master_Log_Pos equals to
    Read_Master_Log_Pos(mysql-bin.000144:120). No need to recover from
    Exec_Master_Log_Pos. Fri Aug 26 17:44:42 2016 – [info] Connecting
    to the target slave host 192.168.137.20, running recover script.. Fri
    Aug 26 17:44:42 2016 – [info] Executing command:
    apply_diff_relay_logs –command=apply –slave_user=’root’
    –slave_host=192.168.137.20 –slave_ip=192.168.137.20
    –slave_port=3306
    –apply_files=/tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    –workdir=/tmp –target_version=5.6.15-log –timestamp=20160826174410
    –handle_raw_binlog=1 –disable_log_bin=0 –manager_version=0.55
    –slave_pass=xxx Fri Aug 26 17:44:43 2016 – [info] MySQL client
    version is 5.6.15. Using –binary-mode. Applying differential
    binary/relay log files
    /tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    on 192.168.137.20:3306. This may take long time… Applying log files
    succeeded. Fri Aug 26 17:44:43 2016 – [info] All relay logs were
    successfully applied. Fri Aug 26 17:44:43 2016 – [info] Getting new
    master’s binlog name and position.. Fri Aug 26 17:44:43 2016 – [info]
    mysql-bin.000075:120 Fri Aug 26 17:44:43 2016 – [info] All other
    slaves should start replication from here. Statement should be: CHANGE
    MASTER TO MASTER_HOST=’192.168.137.20′, MASTER_PORT=3306,
    MASTER_LOG_FILE=’mysql-bin.000075′, MASTER_LOG_POS=120,
    MASTER_USER=’repl’, MASTER_PASSWORD=’xxx’; Fri Aug 26 17:44:43 2016 –
    [info] Executing master IP activate script: Fri Aug 26 17:44:43 2016 –
    [info] /usr/local/mha/ha1/fail_script/master_ip_failover
    –command=start –ssh_user=root –orig_master_host=192.168.137.10
    –orig_master_ip=192.168.137.10 –orig_master_port=3306
    –new_master_host=192.168.137.20 –new_master_ip=192.168.137.20
    –new_master_port=3306 –new_master_user=’root’
    –new_master_password=’root’ IN SCRIPT TEST====/sbin/ifconfig eth0:1
    down==/sbin/ifconfig eth0:1 192.168.137.50/24=== Enabling the VIP –
    192.168.137.50/24 on the new master – 192.168.137.20 Fri Aug 26 17:44:44
    2016 – [info] OK. Fri Aug 26 17:44:44 2016 – [info] ** Finished
    master recovery successfully. Fri Aug 26 17:44:44 2016 – [info] *
    Phase 3: Master Recovery Phase completed. Fri Aug 26 17:44:44 2016 –
    [info] Fri Aug 26 17:44:44 2016 – [info] * Phase 4: Slaves Recovery
    Phase.. Fri Aug 26 17:44:44 2016 – [info] Fri Aug 26 17:44:44 2016 –
    [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation
    Phase.. Fri Aug 26 17:44:44 2016 – [info] Fri Aug 26 17:44:44 2016 –
    [info] — Slave diff file generation on host
    192.168.137.30(192.168.137.30:3306) started, pid: 5354. Check tmp log
    /usr/local/mha/ha1/192.168.137.30_3306_20160826174410.log if it takes
    time.. Fri Aug 26 17:44:45 2016 – [info] Fri Aug 26 17:44:45 2016 –
    [info] Log messages from 192.168.137.30 … Fri Aug 26 17:44:45 2016 –
    [info] Fri Aug 26 17:44:44 2016 – [info] This server has all relay
    logs. No need to generate diff files from the latest slave. Fri Aug 26
    17:44:45 2016 – [info] End of log messages from 192.168.137.30. Fri
    Aug 26 17:44:45 2016 – [info] — 192.168.137.30(192.168.137.30:3306)
    has the latest relay log events. Fri Aug 26 17:44:45 2016 – [info]
    Generating relay diff files from the latest slave succeeded. Fri Aug 26
    17:44:45 2016 – [info] Fri Aug 26 17:44:45 2016 – [info] * Phase
    4.2: Starting Parallel Slave Log Apply Phase.. Fri Aug 26 17:44:45 2016
  • [info] Fri Aug 26 17:44:45 2016 – [info] — Slave recovery on host
    192.168.137.30(192.168.137.30:3306) started, pid: 5356. Check tmp log
    /usr/local/mha/ha1/192.168.137.30_3306_20160826174410.log if it takes
    time..
    saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    100% 143 0.1KB/s 00:00 Fri Aug 26 17:44:47 2016 – [info] Fri Aug 26
    17:44:47 2016 – [info] Log messages from 192.168.137.30 … Fri Aug 26
    17:44:47 2016 – [info] Fri Aug 26 17:44:45 2016 – [info] Sending
    binlog.. Fri Aug 26 17:44:45 2016 – [info] scp from
    local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    to
    [email protected]192.168.137.30:/tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    succeeded. Fri Aug 26 17:44:45 2016 – [info] Starting recovery on
    192.168.137.30(192.168.137.30:3306).. Fri Aug 26 17:44:45 2016 –
    [info] Generating diffs succeeded. Fri Aug 26 17:44:45 2016 – [info]
    Waiting until all relay logs are applied. Fri Aug 26 17:44:45 2016 –
    [info] done. Fri Aug 26 17:44:45 2016 – [info] Getting slave
    status.. Fri Aug 26 17:44:45 2016 – [info] This
    slave(192.168.137.30)’s Exec_Master_Log_Pos equals to
    Read_Master_Log_Pos(mysql-bin.000144:120). No need to recover from
    Exec_Master_Log_Pos. Fri Aug 26 17:44:45 2016 – [info] Connecting
    to the target slave host 192.168.137.30, running recover script.. Fri
    Aug 26 17:44:45 2016 – [info] Executing command:
    apply_diff_relay_logs –command=apply –slave_user=’root’
    –slave_host=192.168.137.30 –slave_ip=192.168.137.30
    –slave_port=3306
    –apply_files=/tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    –workdir=/tmp –target_version=5.6.15-log –timestamp=20160826174410
    –handle_raw_binlog=1 –disable_log_bin=0 –manager_version=0.55
    –slave_pass=xxx Fri Aug 26 17:44:45 2016 – [info] MySQL client
    version is 5.6.15. Using –binary-mode. Applying differential
    binary/relay log files
    /tmp/saved_master_binlog_from_192.168.137.10_3306_20160826174410.binlog
    on 192.168.137.30:3306. This may take long time… Applying log files
    succeeded. Fri Aug 26 17:44:45 2016 – [info] All relay logs were
    successfully applied. Fri Aug 26 17:44:45 2016 – [info] Resetting
    slave 192.168.137.30(192.168.137.30:3306) and starting replication from
    the new master 192.168.137.20(192.168.137.20:3306).. Fri Aug 26 17:44:46
    2016 – [info] Executed CHANGE MASTER. Fri Aug 26 17:44:46 2016 –
    [info] Slave started. Fri Aug 26 17:44:47 2016 – [info] End of log
    messages from 192.168.137.30. Fri Aug 26 17:44:47 2016 – [info] —
    Slave recovery on host 192.168.137.30(192.168.137.30:3306) succeeded.
    Fri Aug 26 17:44:47 2016 – [info] All new slave servers recovered
    successfully. Fri Aug 26 17:44:47 2016 – [info] Fri Aug 26 17:44:47
    2016 – [info] * Phase 5: New master cleanup phase.. Fri Aug 26
    17:44:47 2016 – [info] Fri Aug 26 17:44:47 2016 – [info] Resetting
    slave info on the new master.. Fri Aug 26 17:44:47 2016 – [info]
    192.168.137.20: Resetting slave info succeeded. Fri Aug 26 17:44:47 2016
  • [info] Master failover to 192.168.137.20(192.168.137.20:3306)
    completed successfully. Fri Aug 26 17:44:47 2016 – [info] —–
    Failover Report —– ha1: MySQL Master failover 192.168.137.10 to
    192.168.137.20 succeeded Master 192.168.137.10 is down! Check MHA
    Manager logs at monitor for details. Started manual(interactive)
    failover. Invalidated master IP address on 192.168.137.10. The latest
    slave 192.168.137.20(192.168.137.20:3306) has all relay logs for
    recovery. Selected 192.168.137.20 as a new master. 192.168.137.20: OK:
    Applying all logs succeeded. 192.168.137.20: OK: Activated master IP
    address. 192.168.137.30: This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    192.168.137.30: OK: Applying all logs succeeded. Slave started,
    replicating from 192.168.137.20. 192.168.137.20: Resetting slave info
    succeeded. Master failover to 192.168.137.20(192.168.137.20:3306)
    completed successfully. View
    Code

二.检查整个复制遇到 

masterha_check_repl --conf=/usr/local/mha/ha1/ha1.cnf 

图片 5图片 6

[root@monitor ha1]# masterha_check_repl --conf=/usr/local/mha/ha1/ha1.cnf 
Thu Aug 25 16:09:19 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Aug 25 16:09:19 2016 - [info] Reading application default configurations from /usr/local/mha/ha1/ha1.cnf..
Thu Aug 25 16:09:19 2016 - [info] Reading server configurations from /usr/local/mha/ha1/ha1.cnf..
Thu Aug 25 16:09:19 2016 - [info] MHA::MasterMonitor version 0.55.
Thu Aug 25 16:09:20 2016 - [info] Dead Servers:
Thu Aug 25 16:09:20 2016 - [info] Alive Servers:
Thu Aug 25 16:09:20 2016 - [info]   192.168.137.10(192.168.137.10:3306)
Thu Aug 25 16:09:20 2016 - [info]   192.168.137.20(192.168.137.20:3306)
Thu Aug 25 16:09:20 2016 - [info]   192.168.137.30(192.168.137.30:3306)
Thu Aug 25 16:09:20 2016 - [info] Alive Slaves:
Thu Aug 25 16:09:20 2016 - [info]   192.168.137.20(192.168.137.20:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Thu Aug 25 16:09:20 2016 - [info]     Replicating from 192.168.137.10(192.168.137.10:3306)
Thu Aug 25 16:09:20 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Aug 25 16:09:20 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Thu Aug 25 16:09:20 2016 - [info]     Replicating from 192.168.137.10(192.168.137.10:3306)
Thu Aug 25 16:09:20 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Aug 25 16:09:20 2016 - [info] Current Alive Master: 192.168.137.10(192.168.137.10:3306)
Thu Aug 25 16:09:20 2016 - [info] Checking slave configurations..
Thu Aug 25 16:09:20 2016 - [info]  read_only=1 is not set on slave 192.168.137.20(192.168.137.20:3306).
Thu Aug 25 16:09:20 2016 - [info] Checking replication filtering settings..
Thu Aug 25 16:09:20 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Aug 25 16:09:20 2016 - [info]  Replication filtering check ok.
Thu Aug 25 16:09:20 2016 - [info] Starting SSH connection tests..
Thu Aug 25 16:09:25 2016 - [info] All SSH connection tests passed successfully.
Thu Aug 25 16:09:25 2016 - [info] Checking MHA Node version..
Thu Aug 25 16:09:26 2016 - [info]  Version check ok.
Thu Aug 25 16:09:26 2016 - [info] Checking SSH publickey authentication settings on the current master..
Thu Aug 25 16:09:27 2016 - [info] HealthCheck: SSH to 192.168.137.10 is reachable.
Thu Aug 25 16:09:29 2016 - [info] Master MHA Node version is 0.54.
Thu Aug 25 16:09:29 2016 - [info] Checking recovery script configurations on the current master..
Thu Aug 25 16:09:29 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/log --output_file=/tmp/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000138 
Thu Aug 25 16:09:29 2016 - [info]   Connecting to root@192.168.137.10(192.168.137.10).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /mysql/log, up to mysql-bin.000138
Thu Aug 25 16:09:30 2016 - [info] Master setting check done.
Thu Aug 25 16:09:30 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Aug 25 16:09:30 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.137.20 --slave_ip=192.168.137.20 --slave_port=3306 --workdir=/tmp --target_version=5.6.15-log --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Thu Aug 25 16:09:30 2016 - [info]   Connecting to root@192.168.137.20(192.168.137.20:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000006
    Temporary relay log file is /mysql/data/mysql-relay-bin.000006
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Aug 25 16:09:31 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.137.30 --slave_ip=192.168.137.30 --slave_port=3306 --workdir=/tmp --target_version=5.6.15-log --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Thu Aug 25 16:09:31 2016 - [info]   Connecting to root@192.168.137.30(192.168.137.30:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000002
    Temporary relay log file is /mysql/data/mysql-relay-bin.000002
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Aug 25 16:09:32 2016 - [info] Slaves settings check done.
Thu Aug 25 16:09:32 2016 - [info] 
192.168.137.10 (current master)
 +--192.168.137.20
 +--192.168.137.30

Thu Aug 25 16:09:32 2016 - [info] Checking replication health on 192.168.137.20..
Thu Aug 25 16:09:32 2016 - [info]  ok.
Thu Aug 25 16:09:32 2016 - [info] Checking replication health on 192.168.137.30..
Thu Aug 25 16:09:32 2016 - [info]  ok.
Thu Aug 25 16:09:32 2016 - [info] Checking master_ip_failover_script status:
Thu Aug 25 16:09:32 2016 - [info]   /usr/local/mha/ha1/fail_script/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.137.10 --orig_master_ip=192.168.137.10 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.137.50/24===

Checking the Status of the script.. OK 
Thu Aug 25 16:09:32 2016 - [info]  OK.
Thu Aug 25 16:09:32 2016 - [warning] shutdown_script is not defined.
Thu Aug 25 16:09:32 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

View Code

–ignore_fail_on_start: 当有slave 节点宕掉时,暗许是运营不了的,加上
–ignore_fail_on_start
即便有节点宕掉也能运营MHA,加上该参数会忽视运营文件中配备ignore_fail=1的server

二、配置MHA

一、安装MHA 

1.检查ssh配置

masterha_check_ssh  --conf=/usr/local/mha/ha1/ha1.cnf

[root@monitor ha1]# masterha_check_ssh --conf=/usr/local/mha/ha1/ha1.cnf
Thu Aug 25 14:53:30 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Aug 25 14:53:30 2016 - [info] Reading application default configurations from /usr/local/mha/ha1/ha1.cnf..
Thu Aug 25 14:53:30 2016 - [info] Reading server configurations from /usr/local/mha/ha1/ha1.cnf..
Thu Aug 25 14:53:30 2016 - [info] Starting SSH connection tests..
Thu Aug 25 14:53:35 2016 - [debug] 
Thu Aug 25 14:53:31 2016 - [debug]  Connecting via SSH from [email protected]192.168.137.20(192.168.137.20:22) to [email protected]192.168.137.10(192.168.137.10:22)..
Thu Aug 25 14:53:33 2016 - [debug]   ok.
Thu Aug 25 14:53:33 2016 - [debug]  Connecting via SSH from [email protected]192.168.137.20(192.168.137.20:22) to [email protected]192.168.137.30(192.168.137.30:22)..
Thu Aug 25 14:53:34 2016 - [debug]   ok.
Thu Aug 25 14:53:35 2016 - [debug] 
Thu Aug 25 14:53:31 2016 - [debug]  Connecting via SSH from [email protected]192.168.137.30(192.168.137.30:22) to [email protected]192.168.137.10(192.168.137.10:22)..
Thu Aug 25 14:53:33 2016 - [debug]   ok.
Thu Aug 25 14:53:33 2016 - [debug]  Connecting via SSH from [email protected]192.168.137.30(192.168.137.30:22) to [email protected]192.168.137.20(192.168.137.20:22)..
Thu Aug 25 14:53:34 2016 - [debug]   ok.
Thu Aug 25 14:53:36 2016 - [debug] 
Thu Aug 25 14:53:30 2016 - [debug]  Connecting via SSH from [email protected]192.168.137.10(192.168.137.10:22) to [email protected]192.168.137.20(192.168.137.20:22)..
Thu Aug 25 14:53:34 2016 - [debug]   ok.
Thu Aug 25 14:53:34 2016 - [debug]  Connecting via SSH from [email protected]192.168.137.10(192.168.137.10:22) to [email protected]192.168.137.30(192.168.137.30:22)..
Thu Aug 25 14:53:35 2016 - [debug]   ok.
Thu Aug 25 14:53:36 2016 - [info] All SSH connection tests passed successfully.

能够观看各种Node到其余的Node都以相通的。

2.master_ip_failover

VIP的布置能够使用keepalived也能够写剧本,keepalived对互连网的渴求相当高不然轻易脑裂,在本人眼前搭建双主情形讲过keepalived的搭建方法,作者这里运用脚本的主意。

图片 7图片 8

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port, $new_master_user, $new_master_password
);

my $vip = '192.168.137.50/24';  ###VIP
my $key = '1';                  ###用于区别本身的eth0  
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
    'new_master_user=s'     => \$new_master_user,
    'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

View Code

留意:必要手动先在master服务器上边加多VIP

/sbin/ifconfig eth0:1 192.168.137.50/24

图片 9

1.自动failover

自己那边是异步复制,①3柒.20是时下的master,然后在13柒.20上试行出现插入,同时关闭一叁7.10和13七.30的IO线程,在壹三7.20上压测1段时间,然后先打开13柒.30的IO线程,过几分钟再张开一3七.10的IO线程;保障一三柒.30的binlog比候选的壹3七.拾的binlog更新。

master 137.20(22497564)

candidate slave:137.10(pos=9857376)

new replay slave:137.30(pos=22461852)

Fri Aug 26 11:57:36 2016 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Fri Aug 26 11:57:36 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/log --output_file=/tmp/save_binary_logs_test --manager_version=0.55 --binlog_prefix=mysql-bin
Fri Aug 26 11:57:36 2016 - [info] Executing seconary network check script: /usr/local/mha/bin/masterha_secondary_check -s backup -s master --user=root --master_host=master --master_ip=192.168.137.10 --master_port=3306  --user=root  --master_host=192.168.137.20  --master_ip=192.168.137.20  --master_port=3306
Fri Aug 26 11:57:37 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.137.20' (111))
Fri Aug 26 11:57:37 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 26 11:57:38 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.137.20' (111))
Fri Aug 26 11:57:38 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 26 11:57:38 2016 - [info] HealthCheck: SSH to 192.168.137.20 is reachable.
Monitoring server backup is reachable, Master is not reachable from backup. OK.
Fri Aug 26 11:57:39 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.137.20' (111))
Fri Aug 26 11:57:39 2016 - [warning] Connection failed 3 time(s)..
Monitoring server master is reachable, Master is not reachable from master. OK.
Fri Aug 26 11:57:41 2016 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Aug 26 11:57:41 2016 - [warning] Master is not reachable from health checker!
Fri Aug 26 11:57:41 2016 - [warning] Master 192.168.137.20(192.168.137.20:3306) is not reachable!
Fri Aug 26 11:57:41 2016 - [warning] SSH is reachable.
Fri Aug 26 11:57:41 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /usr/local/mha/ha1/ha1.cnf again, and trying to connect to all servers to check server status..
Fri Aug 26 11:57:41 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Aug 26 11:57:41 2016 - [info] Reading application default configurations from /usr/local/mha/ha1/ha1.cnf..
Fri Aug 26 11:57:41 2016 - [info] Reading server configurations from /usr/local/mha/ha1/ha1.cnf..
Fri Aug 26 11:57:42 2016 - [info] Dead Servers:
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:42 2016 - [info] Alive Servers:
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.10(192.168.137.10:3306)
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.30(192.168.137.30:3306)
Fri Aug 26 11:57:42 2016 - [info] Alive Slaves:
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:42 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:42 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:42 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:42 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:42 2016 - [info] Checking slave configurations..
Fri Aug 26 11:57:42 2016 - [info]  read_only=1 is not set on slave 192.168.137.10(192.168.137.10:3306).
Fri Aug 26 11:57:42 2016 - [info] Checking replication filtering settings..
Fri Aug 26 11:57:42 2016 - [info]  Replication filtering check ok.
Fri Aug 26 11:57:42 2016 - [info] Master is down!
Fri Aug 26 11:57:42 2016 - [info] Terminating monitoring script.
Fri Aug 26 11:57:42 2016 - [info] Got exit code 20 (Master dead).
Fri Aug 26 11:57:42 2016 - [info] MHA::MasterFailover version 0.55.
Fri Aug 26 11:57:42 2016 - [info] Starting master failover.
Fri Aug 26 11:57:42 2016 - [info] 
Fri Aug 26 11:57:42 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Aug 26 11:57:42 2016 - [info] 
Fri Aug 26 11:57:44 2016 - [info] Dead Servers:
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:44 2016 - [info] Checking master reachability via mysql(double check)..
Fri Aug 26 11:57:44 2016 - [info]  ok.
Fri Aug 26 11:57:44 2016 - [info] Alive Servers:
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.10(192.168.137.10:3306)
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.30(192.168.137.30:3306)
Fri Aug 26 11:57:44 2016 - [info] Alive Slaves:
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:44 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:44 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:44 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:44 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:44 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Aug 26 11:57:44 2016 - [info] 
Fri Aug 26 11:57:44 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Aug 26 11:57:44 2016 - [info] 
Fri Aug 26 11:57:44 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Aug 26 11:57:44 2016 - [info] Executing master IP deactivatation script:
Fri Aug 26 11:57:44 2016 - [info]   /usr/local/mha/ha1/fail_script/master_ip_failover --orig_master_host=192.168.137.20 --orig_master_ip=192.168.137.20 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.137.50/24===

Disabling the VIP on old master: 192.168.137.20 
Fri Aug 26 11:57:45 2016 - [info]  done.
Fri Aug 26 11:57:45 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Aug 26 11:57:45 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] The latest binary log file/position on all slaves is mysql-bin.000074:22461852
Fri Aug 26 11:57:45 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Aug 26 11:57:45 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:45 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:45 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:45 2016 - [info] The oldest binary log file/position on all slaves is mysql-bin.000074:9857376
Fri Aug 26 11:57:45 2016 - [info] Oldest slaves:
Fri Aug 26 11:57:45 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:45 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:45 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:46 2016 - [info] Fetching dead master's binary logs..
Fri Aug 26 11:57:46 2016 - [info] Executing command on the dead master 192.168.137.20(192.168.137.20:3306): save_binary_logs --command=save --start_file=mysql-bin.000074  --start_pos=22461852 --binlog_dir=/mysql/log --output_file=/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
  Creating /tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000074 pos 22461852 to mysql-bin.000074 EOF into /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /mysql/log/mysql-bin.000074 position 22461852 to tail(22497564).. ok.
 Concat succeeded.
Fri Aug 26 11:57:49 2016 - [info] scp from [email protected]:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.
Fri Aug 26 11:57:52 2016 - [info] HealthCheck: SSH to 192.168.137.10 is reachable.
Fri Aug 26 11:57:55 2016 - [info] HealthCheck: SSH to 192.168.137.30 is reachable.
Fri Aug 26 11:57:55 2016 - [info] 
Fri Aug 26 11:57:55 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Aug 26 11:57:55 2016 - [info] 
Fri Aug 26 11:57:55 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Aug 26 11:57:55 2016 - [info] Checking whether 192.168.137.30 has relay logs from the oldest position..
Fri Aug 26 11:57:55 2016 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000074 --latest_rmlp=22461852 --target_mlf=mysql-bin.000074 --target_rmlp=9857376 --server_id=30 --workdir=/tmp --timestamp=20160826115742 --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  :
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000003, start_pos:9857539.
Target relay log FOUND!
Fri Aug 26 11:57:56 2016 - [info] OK. 192.168.137.30 has all relay logs.
Fri Aug 26 11:57:56 2016 - [info] Searching new master from slaves..
Fri Aug 26 11:57:56 2016 - [info]  Candidate masters from the configuration file:
Fri Aug 26 11:57:56 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:56 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:56 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:56 2016 - [info]  Non-candidate masters:
Fri Aug 26 11:57:56 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:56 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:56 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:56 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Aug 26 11:57:56 2016 - [info]   Not found.
Fri Aug 26 11:57:56 2016 - [info]  Searching from all candidate_master slaves..
Fri Aug 26 11:57:56 2016 - [info] New master is 192.168.137.10(192.168.137.10:3306)
Fri Aug 26 11:57:56 2016 - [info] Starting master failover..
Fri Aug 26 11:57:56 2016 - [info] 
From:
192.168.137.20 (current master)
 +--192.168.137.10
 +--192.168.137.30

To:
192.168.137.10 (new master)
 +--192.168.137.30
Fri Aug 26 11:57:56 2016 - [info] 
Fri Aug 26 11:57:56 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Aug 26 11:57:56 2016 - [info] 
Fri Aug 26 11:57:56 2016 - [info] Server 192.168.137.10 received relay logs up to: mysql-bin.000074:9857376
Fri Aug 26 11:57:56 2016 - [info] Need to get diffs from the latest slave(192.168.137.30) up to: mysql-bin.000074:22461852 (using the latest slave's relay logs)
Fri Aug 26 11:57:56 2016 - [info] Connecting to the latest slave host 192.168.137.30, generating diff relay log files..
Fri Aug 26 11:57:56 2016 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.137.10 --latest_mlf=mysql-bin.000074 --latest_rmlp=22461852 --target_mlf=mysql-bin.000074 --target_rmlp=9857376 --server_id=30 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog --workdir=/tmp --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/ 
Fri Aug 26 11:58:02 2016 - [info] 
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000003, start_pos:9857539.
 Concat binary/relay logs from mysql-relay-bin.000003 pos 9857539 to mysql-relay-bin.000003 EOF into /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /mysql/data/mysql-relay-bin.000003 position 9857539 to tail(22462015).. ok.
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog .
 scp slave:/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to [email protected](22) succeeded.
Fri Aug 26 11:58:02 2016 - [info]  Generating diff files succeeded.
Fri Aug 26 11:58:02 2016 - [info] Sending binlog..
Fri Aug 26 11:58:04 2016 - [info] scp from local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to [email protected]:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.
Fri Aug 26 11:58:04 2016 - [info] 
Fri Aug 26 11:58:04 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Aug 26 11:58:04 2016 - [info] 
Fri Aug 26 11:58:04 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Aug 26 11:58:04 2016 - [info] Starting recovery on 192.168.137.10(192.168.137.10:3306)..
Fri Aug 26 11:58:04 2016 - [info]  Generating diffs succeeded.
Fri Aug 26 11:58:04 2016 - [info] Waiting until all relay logs are applied.
Fri Aug 26 12:00:06 2016 - [info]  done.
Fri Aug 26 12:00:06 2016 - [info] Getting slave status..
Fri Aug 26 12:00:06 2016 - [info] This slave(192.168.137.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000074:9857376). No need to recover from Exec_Master_Log_Pos.
Fri Aug 26 12:00:06 2016 - [info] Connecting to the target slave host 192.168.137.10, running recover script..
Fri Aug 26 12:00:06 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.137.10 --slave_ip=192.168.137.10  --slave_port=3306 --apply_files=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --workdir=/tmp --target_version=5.6.15-log --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Aug 26 12:04:22 2016 - [info] 
 Concat all apply files to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog ..
 Copying the first binlog file /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog.. dumped up to pos 120. ok.
 /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog position 120 to tail(35832).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog .
MySQL client version is 5.6.15. Using --binary-mode.
Applying differential binary/relay log files /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog on 192.168.137.10:3306. This may take long time...
Applying log files succeeded.
Fri Aug 26 12:04:22 2016 - [info]  All relay logs were successfully applied.
Fri Aug 26 12:04:22 2016 - [info] Getting new master's binlog name and position..
Fri Aug 26 12:04:22 2016 - [info]  mysql-bin.000143:22123166
Fri Aug 26 12:04:22 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.137.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000143', MASTER_LOG_POS=22123166, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Aug 26 12:04:22 2016 - [info] Executing master IP activate script:
Fri Aug 26 12:04:22 2016 - [info]   /usr/local/mha/ha1/fail_script/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.137.20 --orig_master_ip=192.168.137.20 --orig_master_port=3306 --new_master_host=192.168.137.10 --new_master_ip=192.168.137.10 --new_master_port=3306 --new_master_user='root' --new_master_password='root'  


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.137.50/24===

Enabling the VIP - 192.168.137.50/24 on the new master - 192.168.137.10 
Fri Aug 26 12:04:25 2016 - [info]  OK.
Fri Aug 26 12:04:25 2016 - [info] ** Finished master recovery successfully.
Fri Aug 26 12:04:25 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Aug 26 12:04:25 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Aug 26 12:04:25 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Aug 26 12:04:25 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info] -- Slave diff file generation on host 192.168.137.30(192.168.137.30:3306) started, pid: 5029. Check tmp log /usr/local/mha/ha1/192.168.137.30_3306_20160826115742.log if it takes time..
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] Log messages from 192.168.137.30 ...
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Aug 26 12:04:26 2016 - [info] End of log messages from 192.168.137.30.
Fri Aug 26 12:04:26 2016 - [info] -- 192.168.137.30(192.168.137.30:3306) has the latest relay log events.
Fri Aug 26 12:04:26 2016 - [info] Generating relay diff files from the latest slave succeeded.
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] -- Slave recovery on host 192.168.137.30(192.168.137.30:3306) started, pid: 5031. Check tmp log /usr/local/mha/ha1/192.168.137.30_3306_20160826115742.log if it takes time..
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:32 2016 - [info] Log messages from 192.168.137.30 ...
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] Sending binlog..
Fri Aug 26 12:04:28 2016 - [info] scp from local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to [email protected]:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.
Fri Aug 26 12:04:28 2016 - [info] Starting recovery on 192.168.137.30(192.168.137.30:3306)..
Fri Aug 26 12:04:28 2016 - [info]  Generating diffs succeeded.
Fri Aug 26 12:04:28 2016 - [info] Waiting until all relay logs are applied.
Fri Aug 26 12:04:28 2016 - [info]  done.
Fri Aug 26 12:04:28 2016 - [info] Getting slave status..
Fri Aug 26 12:04:28 2016 - [info] This slave(192.168.137.30)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000074:22461852). No need to recover from Exec_Master_Log_Pos.
Fri Aug 26 12:04:28 2016 - [info] Connecting to the target slave host 192.168.137.30, running recover script..
Fri Aug 26 12:04:28 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.137.30 --slave_ip=192.168.137.30  --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --workdir=/tmp --target_version=5.6.15-log --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Aug 26 12:04:30 2016 - [info] 
MySQL client version is 5.6.15. Using --binary-mode.
Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog on 192.168.137.30:3306. This may take long time...
Applying log files succeeded.
Fri Aug 26 12:04:30 2016 - [info]  All relay logs were successfully applied.
Fri Aug 26 12:04:30 2016 - [info]  Resetting slave 192.168.137.30(192.168.137.30:3306) and starting replication from the new master 192.168.137.10(192.168.137.10:3306)..
Fri Aug 26 12:04:31 2016 - [info]  Executed CHANGE MASTER.
Fri Aug 26 12:04:31 2016 - [info]  Slave started.
Fri Aug 26 12:04:32 2016 - [info] End of log messages from 192.168.137.30.
Fri Aug 26 12:04:32 2016 - [info] -- Slave recovery on host 192.168.137.30(192.168.137.30:3306) succeeded.
Fri Aug 26 12:04:32 2016 - [info] All new slave servers recovered successfully.
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:32 2016 - [info] * Phase 5: New master cleanup phase..
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:32 2016 - [info] Resetting slave info on the new master..
Fri Aug 26 12:04:32 2016 - [info]  192.168.137.10: Resetting slave info succeeded.
Fri Aug 26 12:04:32 2016 - [info] Master failover to 192.168.137.10(192.168.137.10:3306) completed successfully.
Fri Aug 26 12:04:32 2016 - [info] 

----- Failover Report -----

ha1: MySQL Master failover 192.168.137.20 to 192.168.137.10 succeeded

Master 192.168.137.20 is down!

Check MHA Manager logs at monitor:/usr/local/mha/ha1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.137.20.
The latest slave 192.168.137.30(192.168.137.30:3306) has all relay logs for recovery.
Selected 192.168.137.10 as a new master.
192.168.137.10: OK: Applying all logs succeeded.
192.168.137.10: OK: Activated master IP address.
192.168.137.30: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.137.30: OK: Applying all logs succeeded. Slave started, replicating from 192.168.137.10.
192.168.137.10: Resetting slave info succeeded.
Master failover to 192.168.137.10(192.168.137.10:3306) completed successfully.

表明:用革命标识了壹部分重中之重的管理进程,亮色加粗标识了各类步骤总共多少个步骤

Failover步骤如下:

1.fail论断,分别决断dead
master的mysql(Ping(SELECT))和ssh分别达到情状(之间会调用masterha_secondary_check脚本)→dead
master管理阶段

二.安排文件检查,会检讨整个集群配置文件配置(分别规定dead
server,候选的master和具备的server的布署意况以及配置是或不是满意条件)→dead
master处理阶段

三.宕机的master管理,包括虚拟ip摘除操作,主机关机操作(这里暂前卫未陈设关机操作)→dead
master管理阶段

/usr/local/mha/ha1/fail_script/master_ip_failover --orig_master_host=192.168.137.20 --orig_master_ip=192.168.137.20 --orig_master_port=3306 --command=stopssh --ssh_user=root 

4.找到含有最新relay log的slave(同时找到最旧的binlog的slave的position),
分别决断是或不是是候选的slave→new master还原阶段

伍.保留dead master(一3柒.20)和最新slave(壹三七.30)相差的relay log保存在dead
master的/tmp目录下(依照陈设文件配置的remote_workdir),然后鲜明那一部分差距binlog(saved_master_binlog_)是不是管用,也正是dead
master和流行的slave之间是不是存在binlog差距,存在差异则将转换的那么些差别binlog拷贝到mha的workdir(一三柒.40)下→new
master还原星等

Fri Aug 26 11:57:46 2016 - [info] Executing command on the dead master 192.168.137.20(192.168.137.20:3306): save_binary_logs --command=save --start_file=mysql-bin.000074  --start_pos=22461852 --binlog_dir=/mysql/log --output_file=/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
  Creating /tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000074 pos 22461852 to mysql-bin.000074 EOF into /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /mysql/log/mysql-bin.000074 position 22461852 to tail(22497564).. ok.
 Concat succeeded.
Fri Aug 26 11:57:49 2016 - [info] scp from [email protected]192.168.137.20:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.

七.规定新的master,并检讨最新slave(30)的relay
log是或不是足以用来还原其余的slave→new master还原阶段

8.生成最新slave(一三七.30)和new master(一3柒.10)之间的差距relay
log(在新式relay
log的slave的/tmp下转移和其余slave差距的binlog,是两者的“Read_Master_Log_Pos”的差,取名为“relay_from_read_to_latest_末端紧接的是目的slave的ip”),然后cp到对象(new
master)的slave的/tmp下,同时将mha
workdir下方才保存的”saved_master_binlog_”(如若存在)文件拷贝到new
master的/tmp下→new master还原阶段

Fri Aug 26 11:57:56 2016 - [info] Connecting to the latest slave host 192.168.137.30, generating diff relay log files..
Fri Aug 26 11:57:56 2016 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.137.10 --latest_mlf=mysql-bin.000074 --latest_rmlp=22461852 --target_mlf=mysql-bin.000074 --target_rmlp=9857376 --server_id=30 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog --workdir=/tmp --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/ 
Fri Aug 26 11:58:02 2016 - [info] 
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000003, start_pos:9857539.
 Concat binary/relay logs from mysql-relay-bin.000003 pos 9857539 to mysql-relay-bin.000003 EOF into /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /mysql/data/mysql-relay-bin.000003 position 9857539 to tail(22462015).. ok.
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog .
 scp slave:/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to [email protected]192.168.137.10(22) succeeded.

九.new master用到差距的relay
log(首先会咬定该salve原本Read_Master_Log_Pos”和“Exec_Master_Log_Pos是不是等于,由于不是半联合进行理并答复制所以slave即便读到了该pos不过只是由于复制是异步的保有还得拭目以俟master定期发送binlog到slave,假使那中档master故障了就能够形成双方的不等同),要是不相等会在该slave实施save_binary_logs命令保存之间差异的relay
log取名叫“relay_from_exec_to_read_末端紧接的是作者的ip”;然后使用”relay_from_read_to_latest_、saved_master_binlog_、relay_from_exec_to_read_”那八个分裂的relay
log,同时将那多个文件的剧情统毕生成1个新的binlog文件“total_binlog_for_”→new
master还原阶段

Fri Aug 26 12:00:06 2016 - [info] This slave(192.168.137.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000074:9857376). No need to recover from Exec_Master_Log_Pos.
Fri Aug 26 12:00:06 2016 - [info] Connecting to the target slave host 192.168.137.10, running recover script..
Fri Aug 26 12:00:06 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.137.10 --slave_ip=192.168.137.10  --slave_port=3306 --apply_files=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --workdir=/tmp --target_version=5.6.15-log --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Aug 26 12:04:22 2016 - [info] 
 Concat all apply files to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog ..
 Copying the first binlog file /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog.. dumped up to pos 120. ok.
 /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog position 120 to tail(35832).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog .
MySQL client version is 5.6.15. Using --binary-mode.
Applying differential binary/relay log files /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog on 192.168.137.10:3306. This may take long time...
Applying log files succeeded.
Fri Aug 26 12:04:22 2016 - [info]  All relay logs were successfully applied.

拾.生成任何slave到新master的change语句,执行master_ip_failover完结切换生成VIP→new
master还原阶段

  1. 别的的slave也重新new masters slave的步骤
    (从第八-九步,比方这里就能够从mha的workdir拷贝saved_master_binlog_到新型的slave(13七.30)上应用差距的relay
    log) →other slave还原阶段

  2. other slave change new master→other slave还原阶段

13.生成failover report

 

注意:这里的relay
log指的是slave已经读取到的master的binglog的岗位(也等于slave中的relay
log文件中早就保存了新式master binlog的职位)在show slave status
\G中也正是“Read_Master_Log_Pos”并不是“Exec_Master_Log_Pos”,所以新型relay
log的salve并不一定正是数据最新的(可是这种景观比较少),只可以表明它保存的master
binlog是流行的。

mha初步修复new
master(不管它是不是是最新的slave,所以借使候选的slave是前卫的slave那么自然是最佳的候选的slave就能够连忙修复好),然后再去修补其余的slave。

总结

主和备主之间须求半齐声复制才干确认保障mha的最大程度的数额不丢掉,不然使用MHA也就没优势了;还有就主和备主不要开启scheduler(作业),不然手动在线failover会败北。mha开始修复new
master(不管它是或不是是最新的slave,所以倘使候选的slave是前卫的slave那么自然是最佳的候选的slave就能飞快修复好),然后再去修复其余的slave。

留意:小说中对有的安插做了备注表明,在实际安顿中需求将那一个备注删掉

 

 

 

 

备注:

    作者:pursuer.chen

    博客:http://www.cnblogs.com/chenmh

本站点所有随笔都是原创,欢迎大家转载;但转载时必须注明文章来源,且在文章开头明显处给明链接。

《欢迎交流讨论》

4.删除fail文件(非必需)

是因为起步mha的时候增加了–ignore_last_failover参数,所以不删除failower生成的文书也能开发银行,不然须求删除failower生成的公文“ha一.failover.complete”。

rm -f /usr/local/mha/ha1/ha1.failover.complete

5、检测运转MHA

原理

图片 10

(一)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(三)应用差别的接入日志(relay log)到别的的slave;

(四)应用从master保存的2进制日志事件(binlog events);

(5)提高2个slave为新的master;

(陆)使任何的slave连接新的master进行理并答复制;

 

MHA软件由两有的组成,Manager工具包和Node工具包

Manager工具包主要不外乎以下多少个工具:

图片 11

masterha_check_ssh              检查MHA的SSH配置状况
masterha_check_repl             检查MySQL复制状况
masterha_manger                 启动MHA
masterha_check_status           检测当前MHA运行状态
masterha_master_monitor         检测master是否宕机
masterha_master_switch          控制故障转移(自动或者手动)
masterha_conf_host              添加或删除配置的server信息

图片 12

Node工具包(这几个工具平日由MHA
Manager的脚本触发,没有供给人工操作)重要归纳以下多少个工具:

save_binary_logs                保存和复制master的二进制日志
apply_diff_relay_logs           识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog              去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs                清除中继日志(不会阻塞SQL线程)

 

二. 搭建复制情形

复制情状以前早已搭建好,能够参见作者前边写过的篇章,复制用户和密码都是repl;各种Node上都必须创建这些repl账号,除非Node不作为故障专门的工作的master

一.在具有Node上创建Manage监察和控制用户

grant all privileges on *.* to 'root'@'192.168.137.%' identified  by 'root';

 

介绍

MHA(Master High
Availability)近期在MySQL高可用方面是一个针锋相对成熟的缓和方案,是壹套精美的当作MySQL高可用性情状下故障切换和主导升高的高可用软件。在MySQL故障切换进度中,MHA能成功在0~30秒之内自动完成数据库的故障切换操作,并且在进展故障切换的进度中,MHA能在最大程度上保证数据的1致性,以到达确实含义上的高可用。它由两部分构成:MHA
Manager(处理节点)和MHA Node(数据节点)。MHA
Manager能够单独安插在1台独立的机械上管住多少个master-slave集群,也能够配备在一台slave节点上。MHA
Node运维在每台MySQL服务器上,MHA
Manager会定时探测集群中的master节点,当master出现故障时,它能够活动将新型数据的slave提高为新的master,然后将富有别的的slave重新指向新的master。整个故障转移进度对应用程序完全透明。在MHA自动故障切换进度中,MHA试图从宕机的主服务器上保存2进制日志,最大程度的保险数据的不丢掉,但那并不连续实惠的。譬喻,如果主服务器硬件故障或不能够通过ssh访问,MHA无法保存2进制日志,只实行故障转移而不见了最新的数额。使用MySQL
五.五的半协同复制,可以大大下落数据丢失的风险。MHA能够与半一同复制结合起来。借使只有二个slave已经收取了新星的二进制日志,MHA可以将新型的贰进制日志应用于别的具备的slave服务器上,因而能够保障具有节点的数据1致性。

 

 

 

2.拍卖故障master

拍卖故障的master,将其布局为从库chang到新的master,能够从manager.log找到change语句。

 grep "CHANGE MASTER TO MASTER" /usr/local/mha/ha1/manager.log | tail -1

Fri Aug 26 12:04:22 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.137.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000143', MASTER_LOG_POS=22123166, MASTER_USER='repl', MASTER_PASSWORD='xxx';

留神:这里要保管slave的SQL_THREAD和IO_T宝马X三READ不奇怪,即使是布局半一齐复制要确定保证半一同复制运转健康,能够推行”show
status like ‘%rpl_%’;”,具体参谋后边半同步复制的搭建。

6、故障管理步骤 

 爆发主从切换后,MHA服务会活动停掉

三.修改ha壹.cnf配备文件

亟待修改”secondary_check_script”选项中的master_host、master_ip、master_port为新的master;假使两台计算机的配置都同1的话其余地点不用修改。

3.检查MHA Manager状态

masterha_check_status --conf=/usr/local/mha/ha1/ha1.cnf

图片 13

 由于mha还并没有运维,所以这里检查测试是stopped

5.check检查

检查SSH配置
masterha_check_ssh --conf=/usr/local/mha/ha1/ha1.cnf
检查复制
masterha_check_repl --conf=/usr/local/mha/ha1/ha1.cnf 
检查状态
masterha_check_status --conf=/usr/local/mha/ha1/ha1.cnf

必备有限辅助具备的反省都因此

3.master_ip_online_change

perl脚本

#!/usr/bin/env perl  
use strict;
use warnings FATAL =>'all';

use Getopt::Long;

my $vip = '192.168.137.50/24';  # Virtual IP  
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $exit_code = 0;

my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
);

exit &main();

sub main {

#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  

if ( $command eq "stop" || $command eq "stopssh" ) {

        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  
        # If you manage master ip address at global catalog database,  
        # invalidate orig_master_ip here.  
        my $exit_code = 1;
        eval {
            print "\n\n\n***************************************************************\n";
            print "Disabling the VIP - $vip on old master: $orig_master_host\n";
            print "***************************************************************\n\n\n\n";
&stop_vip();
            $exit_code = 0;
        };
        if ([email protected]) {
            warn "Got Error: [email protected]\n";
            exit $exit_code;
        }
        exit $exit_code;
}
elsif ( $command eq "start" ) {

        # all arguments are passed.  
        # If you manage master ip address at global catalog database,  
        # activate new_master_ip here.  
        # You can also grant write access (create user, set read_only=0, etc) here.  
my $exit_code = 10;
        eval {
            print "\n\n\n***************************************************************\n";
            print "Enabling the VIP - $vip on new master: $new_master_host \n";
            print "***************************************************************\n\n\n\n";
&start_vip();
            $exit_code = 0;
        };
        if ([email protected]) {
            warn [email protected];
            exit $exit_code;
        }
        exit $exit_code;
}

elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $orig_master_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
}
else {
&usage();
        exit 1;
}
}

# A simple system call that enable the VIP on the new master  
sub start_vip() {
`ssh $new_master_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master  
sub stop_vip() {
`ssh $orig_master_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

shell脚本

#/bin/bash  
#source /root/.bash_profile  

vip=`echo '192.168.137.50/24'`  # Virtual IP  
key=`echo '1'`  

command=`echo "$1" | awk -F = '{print $2}'`  
orig_master_host=`echo "$2" | awk -F = '{print $2}'`  
new_master_host=`echo "$7" | awk -F = '{print $2}'`    

stop_vip=`echo "ssh [email protected]$orig_master_host /sbin/ifconfig  eth0:$key  down"`  
start_vip=`echo "ssh [email protected]$new_master_host /sbin/ifconfig  eth0:$key  $vip"`  

if [ $command = 'stop' ]  
   then  
   echo -e "\n\n\n***************************************************************\n"  
   echo -e "Disabling the VIP - $vip on old master: $orig_master_host\n"  
   $stop_vip  
   if [ $? -eq 0 ]  
      then  
      echo "Disabled the VIP successfully"  
   else  
      echo "Disabled the VIP failed"  
   fi  
   echo -e "***************************************************************\n\n\n\n"  
fi  

if [ $command = 'start' -o $command = 'status' ]  
   then  
   echo -e "\n\n\n***************************************************************\n"  
   echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"  
   $start_vip  
   if [ $? -eq 0 ]  
      then  
      echo "Enabled the VIP successfully"  
   else  
      echo "Enabled the VIP failed"  
   fi  
   echo -e "***************************************************************\n\n\n\n"  
fi 

2.安装epel插件

使用yum形式安装,须求安装epel源

epel源

wget http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm

有着服务器都安装(mananage必要设置以下有所插件,node节点只需求安装perl-DBD-MySQL,cpan)

yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes cpan

也得以使用perl格局安装

#!/bin/bash
wget http://xrl.us/cpanm --no-check-certificate
mv cpanm /usr/bin
chmod 755 /usr/bin/cpanm
cat > /root/list << EOF
install DBD::mysql
install Config::Tiny
install Log::Dispatch
install Parallel::ForkManager
install Time::HiRes
install CPAN
install Digest::SHA
EOF
for package in `cat /root/list`
do
    cpanm $package
done

5.check检查

检查SSH配置
masterha_check_ssh --conf=/usr/local/mha/ha1/ha1.cnf
检查复制
masterha_check_repl --conf=/usr/local/mha/ha1/ha1.cnf 
检查状态
masterha_check_status --conf=/usr/local/mha/ha1/ha1.cnf

须求保障具有的检讨都通过

4.send_report

图片 14图片 15

#!/usr/bin/perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.163.com';
my $mail_from='xxxx';
my $mail_user='xxxxx';
my $mail_pass='xxxxx';
my $mail_to=['xxxx','xxxx'];
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };

    $sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}



# Do whatever you want here

exit 0;

View Code

此处得先安装mutt,安装方式这里不做牵线

 

总结

主和备主之间须要半共同复制工夫确保mha的最大程度的数据不丢掉,否则使用MHA也就没优势了;还有就主和备主不要开启scheduler(作业),不然手动在线failover会失利。mha初步修复new
master(不管它是或不是是最新的slave,所以要是候选的slave是流行的slave那么自然是最佳的候选的slave就能够很快修复好),然后再去修复此外的slave。

留神:小说中对有的布署做了备注表明,在其实计划中须要将这么些备注删掉

 

 

 

 

备注:

    作者:pursuer.chen

    博客:http://www.cnblogs.com/chenmh

本站点所有随笔都是原创,欢迎大家转载;但转载时必须注明文章来源,且在文章开头明显处给明链接。

《欢迎交流讨论》

MHA高可用方案,mysqlmha方案 介绍
MHA(MasterHigh
Availability)近期在MySQL高可用方面是3个相对成熟的化解方案,是壹套精美的作为MySQL高可…

一.反省日志

反省故障管理的日记,确定保障故障平常转移。

cat /usr/local/mha/ha1/manager.log

二.反省整个复制情况 

masterha_check_repl --conf=/usr/local/mha/ha1/ha1.cnf 

图片 16[root@monitor
ha1]# masterha_check_repl –conf=/usr/local/mha/ha1/ha1.cnf Thu Aug
25 16:09:19 2016 – [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping. Thu Aug 25 16:09:19 2016

  • [info] Reading application default configurations from
    /usr/local/mha/ha1/ha1.cnf.. Thu Aug 25 16:09:19 2016 – [info] Reading
    server configurations from /usr/local/mha/ha1/ha1.cnf.. Thu Aug 25
    16:09:19 2016 – [info] MHA::MasterMonitor version 0.55. Thu Aug 25
    16:09:20 2016 – [info] Dead Servers: Thu Aug 25 16:09:20 2016 –
    [info] Alive Servers: Thu Aug 25 16:09:20 2016 – [info]
    192.168.137.10(192.168.137.10:3306) Thu Aug 25 16:09:20 2016 – [info]
    192.168.137.20(192.168.137.20:3306) Thu Aug 25 16:09:20 2016 – [info]
    192.168.137.30(192.168.137.30:3306) Thu Aug 25 16:09:20 2016 – [info]
    Alive Slaves: Thu Aug 25 16:09:20 2016 – [info]
    192.168.137.20(192.168.137.20:3306) Version=5.6.15-log (oldest major
    version between slaves) log-bin:enabled Thu Aug 25 16:09:20 2016 –
    [info] Replicating from 192.168.137.10(192.168.137.10:3306) Thu Aug 25
    16:09:20 2016 – [info] Primary candidate for the new Master
    (candidate_master is set) Thu Aug 25 16:09:20 2016 – [info]
    192.168.137.30(192.168.137.30:3306) Version=5.6.15-log (oldest major
    version between slaves) log-bin:enabled Thu Aug 25 16:09:20 2016 –
    [info] Replicating from 192.168.137.10(192.168.137.10:3306) Thu Aug 25
    16:09:20 2016 – [info] Not candidate for the new Master (no_master is
    set) Thu Aug 25 16:09:20 2016 – [info] Current Alive Master:
    192.168.137.10(192.168.137.10:3306) Thu Aug 25 16:09:20 2016 – [info]
    Checking slave configurations.. Thu Aug 25 16:09:20 2016 – [info]
    read_only=1 is not set on slave 192.168.137.20(192.168.137.20:3306).
    Thu Aug 25 16:09:20 2016 – [info] Checking replication filtering
    settings.. Thu Aug 25 16:09:20 2016 – [info] binlog_do_db= ,
    binlog_ignore_db= Thu Aug 25 16:09:20 2016 – [info] Replication
    filtering check ok. Thu Aug 25 16:09:20 2016 – [info] Starting SSH
    connection tests.. Thu Aug 25 16:09:25 2016 – [info] All SSH
    connection tests passed successfully. Thu Aug 25 16:09:25 2016 –
    [info] Checking MHA Node version.. Thu Aug 25 16:09:26 2016 – [info]
    Version check ok. Thu Aug 25 16:09:26 2016 – [info] Checking SSH
    publickey authentication settings on the current master.. Thu Aug 25
    16:09:27 2016 – [info] HealthCheck: SSH to 192.168.137.10 is
    reachable. Thu Aug 25 16:09:29 2016 – [info] Master MHA Node version
    is 0.54. Thu Aug 25 16:09:29 2016 – [info] Checking recovery script
    configurations on the current master.. Thu Aug 25 16:09:29 2016 –
    [info] Executing command: save_binary_logs –command=test
    –start_pos=4 –binlog_dir=/mysql/log
    –output_file=/tmp/save_binary_logs_test –manager_version=0.55
    –start_file=mysql-bin.000138 Thu Aug 25 16:09:29 2016 – [info]
    Connecting to
    [email protected]192.168.137.10(192.168.137.10)..
    Creating /tmp if not exists.. ok. Checking output directory is
    accessible or not.. ok. Binlog found at /mysql/log, up to
    mysql-bin.000138 Thu Aug 25 16:09:30 2016 – [info] Master setting
    check done. Thu Aug 25 16:09:30 2016 – [info] Checking SSH publickey
    authentication and checking recovery script configurations on all alive
    slave servers.. Thu Aug 25 16:09:30 2016 – [info] Executing command :
    apply_diff_relay_logs –command=test –slave_user=’root’
    –slave_host=192.168.137.20 –slave_ip=192.168.137.20
    –slave_port=3306 –workdir=/tmp –target_version=5.6.15-log
    –manager_version=0.55 –relay_log_info=/mysql/data/relay-log.info
    –relay_dir=/mysql/data/ –slave_pass=xxx Thu Aug 25 16:09:30 2016 –
    [info] Connecting to
    [email protected]192.168.137.20(192.168.137.20:22)..
    Checking slave recovery environment settings.. Opening
    /mysql/data/relay-log.info … ok. Relay log found at /mysql/data, up to
    mysql-relay-bin.000006 Temporary relay log file is
    /mysql/data/mysql-relay-bin.000006 Testing mysql connection and
    privileges..Warning: Using a password on the command line interface can
    be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test
    file(s).. done. Thu Aug 25 16:09:31 2016 – [info] Executing command :
    apply_diff_relay_logs –command=test –slave_user=’root’
    –slave_host=192.168.137.30 –slave_ip=192.168.137.30
    –slave_port=3306 –workdir=/tmp –target_version=5.6.15-log
    –manager_version=0.55 –relay_log_info=/mysql/data/relay-log.info
    –relay_dir=/mysql/data/ –slave_pass=xxx Thu Aug 25 16:09:31 2016 –
    [info] Connecting to
    [email protected]192.168.137.30(192.168.137.30:22)..
    Checking slave recovery environment settings.. Opening
    /mysql/data/relay-log.info … ok. Relay log found at /mysql/data, up to
    mysql-relay-bin.000002 Temporary relay log file is
    /mysql/data/mysql-relay-bin.000002 Testing mysql connection and
    privileges..Warning: Using a password on the command line interface can
    be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test
    file(s).. done. Thu Aug 25 16:09:32 2016 – [info] Slaves settings
    check done. Thu Aug 25 16:09:32 2016 – [info] 192.168.137.10 (current
    master) +–192.168.137.20 +–192.168.137.30 Thu Aug 25 16:09:32 2016 –
    [info] Checking replication health on 192.168.137.20.. Thu Aug 25
    16:09:32 2016 – [info] ok. Thu Aug 25 16:09:32 2016 – [info]
    Checking replication health on 192.168.137.30.. Thu Aug 25 16:09:32 2016
  • [info] ok. Thu Aug 25 16:09:32 2016 – [info] Checking
    master_ip_failover_script status: Thu Aug 25 16:09:32 2016 – [info]
    /usr/local/mha/ha1/fail_script/master_ip_failover –command=status
    –ssh_user=root –orig_master_host=192.168.137.10
    –orig_master_ip=192.168.137.10 –orig_master_port=3306 IN SCRIPT
    TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1
    192.168.137.50/24=== Checking the Status of the script.. OK Thu Aug 25
    16:09:32 2016 – [info] OK. Thu Aug 25 16:09:32 2016 – [warning]
    shutdown_script is not defined. Thu Aug 25 16:09:32 2016 – [info] Got
    exit code 0 (Not master dead). MySQL Replication Health is OK. View Code

–ignore_fail_on_start: 当有slave 节点宕掉时,暗中认可是开发银行不了的,加上
–ignore_fail_on_start
纵然有节点宕掉也能开行MHA,加上该参数会忽略运维文件中布局ignore_fail=1的server

1.自动failover

自己那边是异步复制,一3七.20是时下的master,然后在一叁七.20上实践出现插入,同时关闭一三七.十和壹三7.30的IO线程,在一三7.20上压测1段时间,然后先张开一三7.30的IO线程,过几分钟再张开壹37.十的IO线程;保障137.30的binlog比候选的一三7.十的binlog更新。

master 137.20(22497564)

candidate slave:137.10(pos=9857376)

new replay slave:137.30(pos=22461852)

Fri Aug 26 11:57:36 2016 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Fri Aug 26 11:57:36 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/log --output_file=/tmp/save_binary_logs_test --manager_version=0.55 --binlog_prefix=mysql-bin
Fri Aug 26 11:57:36 2016 - [info] Executing seconary network check script: /usr/local/mha/bin/masterha_secondary_check -s backup -s master --user=root --master_host=master --master_ip=192.168.137.10 --master_port=3306  --user=root  --master_host=192.168.137.20  --master_ip=192.168.137.20  --master_port=3306
Fri Aug 26 11:57:37 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.137.20' (111))
Fri Aug 26 11:57:37 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 26 11:57:38 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.137.20' (111))
Fri Aug 26 11:57:38 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 26 11:57:38 2016 - [info] HealthCheck: SSH to 192.168.137.20 is reachable.
Monitoring server backup is reachable, Master is not reachable from backup. OK.
Fri Aug 26 11:57:39 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.137.20' (111))
Fri Aug 26 11:57:39 2016 - [warning] Connection failed 3 time(s)..
Monitoring server master is reachable, Master is not reachable from master. OK.
Fri Aug 26 11:57:41 2016 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Aug 26 11:57:41 2016 - [warning] Master is not reachable from health checker!
Fri Aug 26 11:57:41 2016 - [warning] Master 192.168.137.20(192.168.137.20:3306) is not reachable!
Fri Aug 26 11:57:41 2016 - [warning] SSH is reachable.
Fri Aug 26 11:57:41 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /usr/local/mha/ha1/ha1.cnf again, and trying to connect to all servers to check server status..
Fri Aug 26 11:57:41 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Aug 26 11:57:41 2016 - [info] Reading application default configurations from /usr/local/mha/ha1/ha1.cnf..
Fri Aug 26 11:57:41 2016 - [info] Reading server configurations from /usr/local/mha/ha1/ha1.cnf..
Fri Aug 26 11:57:42 2016 - [info] Dead Servers:
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:42 2016 - [info] Alive Servers:
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.10(192.168.137.10:3306)
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.30(192.168.137.30:3306)
Fri Aug 26 11:57:42 2016 - [info] Alive Slaves:
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:42 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:42 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:42 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:42 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:42 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:42 2016 - [info] Checking slave configurations..
Fri Aug 26 11:57:42 2016 - [info]  read_only=1 is not set on slave 192.168.137.10(192.168.137.10:3306).
Fri Aug 26 11:57:42 2016 - [info] Checking replication filtering settings..
Fri Aug 26 11:57:42 2016 - [info]  Replication filtering check ok.
Fri Aug 26 11:57:42 2016 - [info] Master is down!
Fri Aug 26 11:57:42 2016 - [info] Terminating monitoring script.
Fri Aug 26 11:57:42 2016 - [info] Got exit code 20 (Master dead).
Fri Aug 26 11:57:42 2016 - [info] MHA::MasterFailover version 0.55.
Fri Aug 26 11:57:42 2016 - [info] Starting master failover.
Fri Aug 26 11:57:42 2016 - [info] 
Fri Aug 26 11:57:42 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Aug 26 11:57:42 2016 - [info] 
Fri Aug 26 11:57:44 2016 - [info] Dead Servers:
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:44 2016 - [info] Checking master reachability via mysql(double check)..
Fri Aug 26 11:57:44 2016 - [info]  ok.
Fri Aug 26 11:57:44 2016 - [info] Alive Servers:
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.10(192.168.137.10:3306)
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.30(192.168.137.30:3306)
Fri Aug 26 11:57:44 2016 - [info] Alive Slaves:
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:44 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:44 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:44 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:44 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:44 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:44 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Aug 26 11:57:44 2016 - [info] 
Fri Aug 26 11:57:44 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Aug 26 11:57:44 2016 - [info] 
Fri Aug 26 11:57:44 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Aug 26 11:57:44 2016 - [info] Executing master IP deactivatation script:
Fri Aug 26 11:57:44 2016 - [info]   /usr/local/mha/ha1/fail_script/master_ip_failover --orig_master_host=192.168.137.20 --orig_master_ip=192.168.137.20 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.137.50/24===

Disabling the VIP on old master: 192.168.137.20 
Fri Aug 26 11:57:45 2016 - [info]  done.
Fri Aug 26 11:57:45 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Aug 26 11:57:45 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] The latest binary log file/position on all slaves is mysql-bin.000074:22461852
Fri Aug 26 11:57:45 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Aug 26 11:57:45 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:45 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:45 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:45 2016 - [info] The oldest binary log file/position on all slaves is mysql-bin.000074:9857376
Fri Aug 26 11:57:45 2016 - [info] Oldest slaves:
Fri Aug 26 11:57:45 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:45 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:45 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:45 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Aug 26 11:57:45 2016 - [info] 
Fri Aug 26 11:57:46 2016 - [info] Fetching dead master's binary logs..
Fri Aug 26 11:57:46 2016 - [info] Executing command on the dead master 192.168.137.20(192.168.137.20:3306): save_binary_logs --command=save --start_file=mysql-bin.000074  --start_pos=22461852 --binlog_dir=/mysql/log --output_file=/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
  Creating /tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000074 pos 22461852 to mysql-bin.000074 EOF into /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /mysql/log/mysql-bin.000074 position 22461852 to tail(22497564).. ok.
 Concat succeeded.
Fri Aug 26 11:57:49 2016 - [info] scp from root@192.168.137.20:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.
Fri Aug 26 11:57:52 2016 - [info] HealthCheck: SSH to 192.168.137.10 is reachable.
Fri Aug 26 11:57:55 2016 - [info] HealthCheck: SSH to 192.168.137.30 is reachable.
Fri Aug 26 11:57:55 2016 - [info] 
Fri Aug 26 11:57:55 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Aug 26 11:57:55 2016 - [info] 
Fri Aug 26 11:57:55 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Aug 26 11:57:55 2016 - [info] Checking whether 192.168.137.30 has relay logs from the oldest position..
Fri Aug 26 11:57:55 2016 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000074 --latest_rmlp=22461852 --target_mlf=mysql-bin.000074 --target_rmlp=9857376 --server_id=30 --workdir=/tmp --timestamp=20160826115742 --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  :
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000003, start_pos:9857539.
Target relay log FOUND!
Fri Aug 26 11:57:56 2016 - [info] OK. 192.168.137.30 has all relay logs.
Fri Aug 26 11:57:56 2016 - [info] Searching new master from slaves..
Fri Aug 26 11:57:56 2016 - [info]  Candidate masters from the configuration file:
Fri Aug 26 11:57:56 2016 - [info]   192.168.137.10(192.168.137.10:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:56 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:56 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug 26 11:57:56 2016 - [info]  Non-candidate masters:
Fri Aug 26 11:57:56 2016 - [info]   192.168.137.30(192.168.137.30:3306)  Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Fri Aug 26 11:57:56 2016 - [info]     Replicating from 192.168.137.20(192.168.137.20:3306)
Fri Aug 26 11:57:56 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug 26 11:57:56 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Aug 26 11:57:56 2016 - [info]   Not found.
Fri Aug 26 11:57:56 2016 - [info]  Searching from all candidate_master slaves..
Fri Aug 26 11:57:56 2016 - [info] New master is 192.168.137.10(192.168.137.10:3306)
Fri Aug 26 11:57:56 2016 - [info] Starting master failover..
Fri Aug 26 11:57:56 2016 - [info] 
From:
192.168.137.20 (current master)
 +--192.168.137.10
 +--192.168.137.30

To:
192.168.137.10 (new master)
 +--192.168.137.30
Fri Aug 26 11:57:56 2016 - [info] 
Fri Aug 26 11:57:56 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Aug 26 11:57:56 2016 - [info] 
Fri Aug 26 11:57:56 2016 - [info] Server 192.168.137.10 received relay logs up to: mysql-bin.000074:9857376
Fri Aug 26 11:57:56 2016 - [info] Need to get diffs from the latest slave(192.168.137.30) up to: mysql-bin.000074:22461852 (using the latest slave's relay logs)
Fri Aug 26 11:57:56 2016 - [info] Connecting to the latest slave host 192.168.137.30, generating diff relay log files..
Fri Aug 26 11:57:56 2016 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.137.10 --latest_mlf=mysql-bin.000074 --latest_rmlp=22461852 --target_mlf=mysql-bin.000074 --target_rmlp=9857376 --server_id=30 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog --workdir=/tmp --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/ 
Fri Aug 26 11:58:02 2016 - [info] 
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000003, start_pos:9857539.
 Concat binary/relay logs from mysql-relay-bin.000003 pos 9857539 to mysql-relay-bin.000003 EOF into /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /mysql/data/mysql-relay-bin.000003 position 9857539 to tail(22462015).. ok.
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog .
 scp slave:/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to root@192.168.137.10(22) succeeded.
Fri Aug 26 11:58:02 2016 - [info]  Generating diff files succeeded.
Fri Aug 26 11:58:02 2016 - [info] Sending binlog..
Fri Aug 26 11:58:04 2016 - [info] scp from local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to root@192.168.137.10:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.
Fri Aug 26 11:58:04 2016 - [info] 
Fri Aug 26 11:58:04 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Aug 26 11:58:04 2016 - [info] 
Fri Aug 26 11:58:04 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Aug 26 11:58:04 2016 - [info] Starting recovery on 192.168.137.10(192.168.137.10:3306)..
Fri Aug 26 11:58:04 2016 - [info]  Generating diffs succeeded.
Fri Aug 26 11:58:04 2016 - [info] Waiting until all relay logs are applied.
Fri Aug 26 12:00:06 2016 - [info]  done.
Fri Aug 26 12:00:06 2016 - [info] Getting slave status..
Fri Aug 26 12:00:06 2016 - [info] This slave(192.168.137.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000074:9857376). No need to recover from Exec_Master_Log_Pos.
Fri Aug 26 12:00:06 2016 - [info] Connecting to the target slave host 192.168.137.10, running recover script..
Fri Aug 26 12:00:06 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.137.10 --slave_ip=192.168.137.10  --slave_port=3306 --apply_files=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --workdir=/tmp --target_version=5.6.15-log --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Aug 26 12:04:22 2016 - [info] 
 Concat all apply files to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog ..
 Copying the first binlog file /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog.. dumped up to pos 120. ok.
 /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog position 120 to tail(35832).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog .
MySQL client version is 5.6.15. Using --binary-mode.
Applying differential binary/relay log files /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog on 192.168.137.10:3306. This may take long time...
Applying log files succeeded.
Fri Aug 26 12:04:22 2016 - [info]  All relay logs were successfully applied.
Fri Aug 26 12:04:22 2016 - [info] Getting new master's binlog name and position..
Fri Aug 26 12:04:22 2016 - [info]  mysql-bin.000143:22123166
Fri Aug 26 12:04:22 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.137.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000143', MASTER_LOG_POS=22123166, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Aug 26 12:04:22 2016 - [info] Executing master IP activate script:
Fri Aug 26 12:04:22 2016 - [info]   /usr/local/mha/ha1/fail_script/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.137.20 --orig_master_ip=192.168.137.20 --orig_master_port=3306 --new_master_host=192.168.137.10 --new_master_ip=192.168.137.10 --new_master_port=3306 --new_master_user='root' --new_master_password='root'  


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.137.50/24===

Enabling the VIP - 192.168.137.50/24 on the new master - 192.168.137.10 
Fri Aug 26 12:04:25 2016 - [info]  OK.
Fri Aug 26 12:04:25 2016 - [info] ** Finished master recovery successfully.
Fri Aug 26 12:04:25 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Aug 26 12:04:25 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Aug 26 12:04:25 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Aug 26 12:04:25 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info] -- Slave diff file generation on host 192.168.137.30(192.168.137.30:3306) started, pid: 5029. Check tmp log /usr/local/mha/ha1/192.168.137.30_3306_20160826115742.log if it takes time..
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] Log messages from 192.168.137.30 ...
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:25 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Aug 26 12:04:26 2016 - [info] End of log messages from 192.168.137.30.
Fri Aug 26 12:04:26 2016 - [info] -- 192.168.137.30(192.168.137.30:3306) has the latest relay log events.
Fri Aug 26 12:04:26 2016 - [info] Generating relay diff files from the latest slave succeeded.
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Aug 26 12:04:26 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] -- Slave recovery on host 192.168.137.30(192.168.137.30:3306) started, pid: 5031. Check tmp log /usr/local/mha/ha1/192.168.137.30_3306_20160826115742.log if it takes time..
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:32 2016 - [info] Log messages from 192.168.137.30 ...
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:26 2016 - [info] Sending binlog..
Fri Aug 26 12:04:28 2016 - [info] scp from local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to root@192.168.137.30:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.
Fri Aug 26 12:04:28 2016 - [info] Starting recovery on 192.168.137.30(192.168.137.30:3306)..
Fri Aug 26 12:04:28 2016 - [info]  Generating diffs succeeded.
Fri Aug 26 12:04:28 2016 - [info] Waiting until all relay logs are applied.
Fri Aug 26 12:04:28 2016 - [info]  done.
Fri Aug 26 12:04:28 2016 - [info] Getting slave status..
Fri Aug 26 12:04:28 2016 - [info] This slave(192.168.137.30)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000074:22461852). No need to recover from Exec_Master_Log_Pos.
Fri Aug 26 12:04:28 2016 - [info] Connecting to the target slave host 192.168.137.30, running recover script..
Fri Aug 26 12:04:28 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.137.30 --slave_ip=192.168.137.30  --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --workdir=/tmp --target_version=5.6.15-log --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Aug 26 12:04:30 2016 - [info] 
MySQL client version is 5.6.15. Using --binary-mode.
Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog on 192.168.137.30:3306. This may take long time...
Applying log files succeeded.
Fri Aug 26 12:04:30 2016 - [info]  All relay logs were successfully applied.
Fri Aug 26 12:04:30 2016 - [info]  Resetting slave 192.168.137.30(192.168.137.30:3306) and starting replication from the new master 192.168.137.10(192.168.137.10:3306)..
Fri Aug 26 12:04:31 2016 - [info]  Executed CHANGE MASTER.
Fri Aug 26 12:04:31 2016 - [info]  Slave started.
Fri Aug 26 12:04:32 2016 - [info] End of log messages from 192.168.137.30.
Fri Aug 26 12:04:32 2016 - [info] -- Slave recovery on host 192.168.137.30(192.168.137.30:3306) succeeded.
Fri Aug 26 12:04:32 2016 - [info] All new slave servers recovered successfully.
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:32 2016 - [info] * Phase 5: New master cleanup phase..
Fri Aug 26 12:04:32 2016 - [info] 
Fri Aug 26 12:04:32 2016 - [info] Resetting slave info on the new master..
Fri Aug 26 12:04:32 2016 - [info]  192.168.137.10: Resetting slave info succeeded.
Fri Aug 26 12:04:32 2016 - [info] Master failover to 192.168.137.10(192.168.137.10:3306) completed successfully.
Fri Aug 26 12:04:32 2016 - [info] 

----- Failover Report -----

ha1: MySQL Master failover 192.168.137.20 to 192.168.137.10 succeeded

Master 192.168.137.20 is down!

Check MHA Manager logs at monitor:/usr/local/mha/ha1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.137.20.
The latest slave 192.168.137.30(192.168.137.30:3306) has all relay logs for recovery.
Selected 192.168.137.10 as a new master.
192.168.137.10: OK: Applying all logs succeeded.
192.168.137.10: OK: Activated master IP address.
192.168.137.30: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.137.30: OK: Applying all logs succeeded. Slave started, replicating from 192.168.137.10.
192.168.137.10: Resetting slave info succeeded.
Master failover to 192.168.137.10(192.168.137.10:3306) completed successfully.

证实:用革命标志了1部分重中之重的处理进度,亮色加粗标识了各类步骤总共多少个步骤

Failover步骤如下:

1.fail论断,分别决断dead
master的mysql(Ping(SELECT))和ssh分别到达情形(之间会调用masterha_secondary_check脚本)→dead master管理阶段

贰.计划文件检查,会检讨整个集群配置文件配置(分别规定dead
server,候选的master和具有的server的布置情状以及配备是还是不是满足条件)→dead master管理阶段

3.宕机的master管理,包涵虚拟ip摘除操作,主机关机操作(这里暂时尚未计划关机操作)→dead master管理阶段

/usr/local/mha/ha1/fail_script/master_ip_failover --orig_master_host=192.168.137.20 --orig_master_ip=192.168.137.20 --orig_master_port=3306 --command=stopssh --ssh_user=root 

4.找到含有最新relay log的slave(同时找到最旧的binlog的slave的position),
分别决断是或不是是候选的slave→new
master还原阶段

5.保留dead master(1三七.20)和新颖slave(壹3七.30)相差的relay log保存在dead
master的/tmp目录下(依据安插文件配置的remote_workdir),然后鲜明那部分差距binlog(saved_master_binlog_)是还是不是有效,也正是dead
master和新星的slave之间是不是留存binlog差别,存在差异则将调换的这么些差异binlog拷贝到mha的workdir(一叁7.40)下→new
master还原星等

Fri Aug 26 11:57:46 2016 - [info] Executing command on the dead master 192.168.137.20(192.168.137.20:3306): save_binary_logs --command=save --start_file=mysql-bin.000074  --start_pos=22461852 --binlog_dir=/mysql/log --output_file=/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
  Creating /tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000074 pos 22461852 to mysql-bin.000074 EOF into /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /mysql/log/mysql-bin.000074 position 22461852 to tail(22497564).. ok.
 Concat succeeded.
Fri Aug 26 11:57:49 2016 - [info] scp from root@192.168.137.20:/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog to local:/usr/local/mha/ha1/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog succeeded.

柒.规定新的master,并检查最新slave(30)的relay
log是不是可以用来还原其余的slave→new
master还原阶段

8.生成最新slave(一3七.30)和new master(1三7.10)之间的差距relay
log(在新型relay
log的slave的/tmp下转移和别的slave差距的binlog,是双方的“Read_Master_Log_Pos”的差,取名为“relay_from_read_to_latest_前面紧接的是指标slave的ip”),然后cp到目的(new
master)的slave的/tmp下,同时将mha
workdir下方才保存的”saved_master_binlog_”(若是存在)文件拷贝到new
master的/tmp下→new master还原阶段

Fri Aug 26 11:57:56 2016 - [info] Connecting to the latest slave host 192.168.137.30, generating diff relay log files..
Fri Aug 26 11:57:56 2016 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.137.10 --latest_mlf=mysql-bin.000074 --latest_rmlp=22461852 --target_mlf=mysql-bin.000074 --target_rmlp=9857376 --server_id=30 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog --workdir=/tmp --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/ 
Fri Aug 26 11:58:02 2016 - [info] 
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to mysql-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000003, start_pos:9857539.
 Concat binary/relay logs from mysql-relay-bin.000003 pos 9857539 to mysql-relay-bin.000003 EOF into /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog ..
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /mysql/data/mysql-relay-bin.000003 position 9857539 to tail(22462015).. ok.
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog .
 scp slave:/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to root@192.168.137.10(22) succeeded.

九.new master选拔差距的relay
log(首先会咬定该salve原本Read_Master_Log_Pos”和“Exec_Master_Log_Pos是还是不是等于,由于不是半协助举行理并答复制所以slave即使读到了该pos然而只是由于复制是异步的装有还得等待master定期发送binlog到slave,假若这中档master故障了就能够招致四头的不相同等),倘诺不相等会在该slave试行save_binary_logs命令保存之间差距的relay
log取名称叫“relay_from_exec_to_read_前面紧接的是自个儿的ip”;然后利用”relay_from_read_to_latest_、saved_master_binlog_、relay_from_exec_to_read_”这么些分化的relay
log,同时将那多个文本的剧情统毕生成1个新的binlog文件“total_binlog_for_”→new master还原阶段

Fri Aug 26 12:00:06 2016 - [info] This slave(192.168.137.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000074:9857376). No need to recover from Exec_Master_Log_Pos.
Fri Aug 26 12:00:06 2016 - [info] Connecting to the target slave host 192.168.137.10, running recover script..
Fri Aug 26 12:00:06 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.137.10 --slave_ip=192.168.137.10  --slave_port=3306 --apply_files=/tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog --workdir=/tmp --target_version=5.6.15-log --timestamp=20160826115742 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Aug 26 12:04:22 2016 - [info] 
 Concat all apply files to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog ..
 Copying the first binlog file /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog to /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog.. dumped up to pos 120. ok.
 /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog position 120 to tail(35832).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.137.10_3306.20160826115742.binlog .
MySQL client version is 5.6.15. Using --binary-mode.
Applying differential binary/relay log files /tmp/relay_from_read_to_latest_192.168.137.10_3306_20160826115742.binlog,/tmp/saved_master_binlog_from_192.168.137.20_3306_20160826115742.binlog on 192.168.137.10:3306. This may take long time...
Applying log files succeeded.
Fri Aug 26 12:04:22 2016 - [info]  All relay logs were successfully applied.

10.生成其余slave到新master的change语句,实施master_ip_failover完结切换生成VIP→new master还原阶段

  1. 任何的slave也再一次new masters slave的步骤
    (从第八-九步,举个例子这里就能够从mha的workdir拷贝saved_master_binlog_到新型的slave(一三七.30)上选拔差别的relay
    log) →other slave还原阶段

  2. other slave change new master→other
    slave还原阶段

13.生成failover report

 

留意:这里的relay
log指的是slave已经读取到的master的binglog的职责(相当于slave中的relay
log文件中壹度保存了流行master binlog的地点)在show slave status
\G中也正是“Read_Master_Log_Pos”并不是“Exec_Master_Log_Pos”,所以新型relay
log的salve并不一定就是数量最新的(但是这种情形相比较少),只可以申明它保存的master
binlog是风靡的。

mha先导修复new
master(不管它是还是不是是最新的slave,所以一旦候选的slave是流行的slave那么自然是最棒的候选的slave就能够快捷修复好),然后再去修补其它的slave。

5.关闭MHA 

masterha_stop --conf=/usr/local/mha/ha1/ha1.cnf

4.删除fail文件(非必需)

出于起步mha的时候增添了–ignore_last_failover参数,所以不删除failower生成的文件也能开发银行,不然要求删除failower生成的文书“ha1.failover.complete”。

rm -f /usr/local/mha/ha1/ha1.failover.complete

三.修改ha1.cnf布局文件

亟待修改”secondary_check_script”选项中的master_host、master_ip、master_port为新的master;纵然两台Computer的布局都毫无贰致的话其它地点不用修改。

4.启动MHA

nohup masterha_manager --conf=/usr/local/mha/ha1/ha1.cnf --ignore_fail_on_start --ignore_last_failover < /dev/null > /usr/local/mha/ha1/start.log 2>&1 &

–remove_dead_master_conf:该参数代表当发生主从切换后,老的主库的ip将会从布局文件中移除。这里近期不接纳该参数,因为产生使用该参数会将ha一.cnf布局文件搞乱。

–start_log:日志。

–ignore_last_failover:发生主从切换后,MHAmanager服务会活动停掉,且在manager_workdir目录下目生成文书app1.failover.complete,若要运营MHA,必须先删除该文件,该参数代表忽略上次MHA触发切换发生的文件,这里设置为-ignore_last_failover。
在缺省气象下,要是MHA检验到连年发出宕机,且三次宕机间隔不足八小时的话,则不会议及展览开Failover,之所以这么限制是为着制止ping-pong效应。

–ignore_fail_on_start: 当有slave 节点宕掉时,默许是开发银行不了的,加上
–ignore_fail_on_start 就算有节点宕掉也能开发银行MHA,加上该参数会忽略运营文件中布局ignore_fail=1的server。

 

(一)再一次查看MHA状态是或不是正规:

[root@monitor ha1]# masterha_check_status --conf=/usr/local/mha/ha1/ha1.cnf
ha1 (pid:6371) is running(0:PING_OK), master:192.168.137.10
[root@monitor ha1]# 

(贰)查看运营日志

 cat manager.log 

Thu Aug 25 17:11:50 2016 - [info] 
192.168.137.10 (current master)
 +--192.168.137.20
 +--192.168.137.30

Thu Aug 25 17:11:50 2016 - [info] Checking master_ip_failover_script status:
Thu Aug 25 17:11:50 2016 - [info]   /usr/local/mha/ha1/fail_script/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.137.10 --orig_master_ip=192.168.137.10 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.137.50/24===

Checking the Status of the script.. OK 
Thu Aug 25 17:11:50 2016 - [info]  OK.
Thu Aug 25 17:11:50 2016 - [warning] shutdown_script is not defined.
Thu Aug 25 17:11:50 2016 - [info] Set master ping interval 1 seconds.
Thu Aug 25 17:11:50 2016 - [info] Set secondary check script: /usr/local/mha/bin/masterha_secondary_check -s backup -s master --user=root --master_host=master --master_ip=192.168.137.10 --master_port=3306
Thu Aug 25 17:11:50 2016 - [info] Starting ping health check on 192.168.137.10(192.168.137.10:3306)..
Thu Aug 25 17:11:50 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
[root@monitor ha1]# 

(3)发生的文书

ha1.master_status.health:mha寻常运转会爆发该公文

manager.log:mha监察和控制日志

start.log:mha运行时生成的日记

2.安装epel插件

使用yum格局安装,必要设置epel源

epel源

wget http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm

具备服务器都设置(mananage必要安装以下有所插件,node节点只要求设置perl-DBD-MySQL,cpan)

yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes cpan

也足以应用perl格局安装

#!/bin/bash
wget http://xrl.us/cpanm --no-check-certificate
mv cpanm /usr/bin
chmod 755 /usr/bin/cpanm
cat > /root/list << EOF
install DBD::mysql
install Config::Tiny
install Log::Dispatch
install Parallel::ForkManager
install Time::HiRes
install CPAN
install Digest::SHA
EOF
for package in `cat /root/list`
do
    cpanm $package
done