mysql replication monitor

This is not a tutorial about how to set up mysql replication. You can find all the details about how to set up mysql replication in the official mysql documentation. This is just a script that can be used to monitor a MySQL replication setup. A MySQL replication setup consists of a master server and a slave server. On the slave server there are two threads that run continuously, one is the I/O thread that fetches changes that occurred on the master server and one is the SQL thread that tries to run the queries that were executed on the master server.

Any one of those two threads can fail. The I/O thread can fail because a connection timeout, or because the master server has blocked and went down, and the SQL thread can fail because some query that was executed on the master server could not be executed on the slave server. If any of those threads fail the replication will be interrupted and as a server administrator you would want to know when this happens so that you can fix it as soon as possible and resume the replication.

If you just want to see the status of the replication threads on the slave server you can use any mysql client to connect to the server and log in with a user that has "replication client" privileges (root user has them by default but you can add them to any user ) and execute this:

  1. SHOW slave status\G;

This will show you information about the slave server like, the slave threads state ( running or not ), last error and replication position.
But doing manually this each time you want to know if the replication status is not efficient, so here comes my script that does it automatically and notifies you only if thee is a problem. This script will run on the slave server every 30 minutes or whatever you want to set it in cron and will check the slave status. In case any if the two threads is not running the script will send an email message to one or two email addresses.
The script is attached to this post but it requires some configuration before you can run it or set it up in cron.
you will have to modify the first lines of the script to specify what user name and password to use when connecting to the MySQL. server. This user must has "replication client" privileges. It is recommended that you add a user that only has this privilege and use it in this script. You can do that by logging in the MySQL server as root and execute:

  1. GRANT replication client to replc@localhost \
  2. identified BY "your_password";

You can use the method presented in a previous post to generate a random password for the "replc" user.
One other configuration you need to set is the email address(es) where the script will send the notifications and you can also customize the subject and body of the messages.
The script is released under GPL and you can get it from here:
MySQL Replication Monitor-0.2 (7.45 KB)
Update: modified to fix a small bug reported in the comments and through the contact form. Thanks everyone for letting me know about it.

8 thoughts on “mysql replication monitor

  1. I, too, am grateful for this script, but I also found a change that is needed :-) Maybe it is a difference in OS, but I had to change this line:
    rf=$(mktemp)

    to this one:
    rf=$(mktemp /tmp/replcheck.XXXXXXXX)

  2. I added a check seconds behind. An alert will be sent if the slave is 30 seconds or more behind.

    #!/bin/sh
    # replication_monitor.sh – a mysql replication monitor
    # Copyright (C) 2006 Mihai Secasiu http://patchlog.com
    #
    # This program is free software; you can redistribute it and/or modify
    # it under the terms of the GNU General Public License version 2 as
    # published by the Free Software Foundation
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program; if not, write to the Free Software
    # Foundation, Inc., 59 Temple Place – Suite 330, Boston, MA 02111-1307, USA.
    #

    DB_USER=replc
    DB_PASS=””

    alert_to=””
    alert_cc=””
    alert_failed_subject=”REPLICATION FAILED”
    alert_failed_message=”one of the replication threads on the slave server failed or the server is down\n”;
    alert_slow_subject=”REPLICATION SLOW”
    alert_slow_message=”the slave is behind by “;

    rf=$(mktemp)

    echo “show slave status\G”|\
    mysql -u $DB_USER –password=$DB_PASS > $rf 2>&1

    repl_IO=$(cat $rf|grep “Slave_IO_Running”|cut -f2 -d’:’)
    repl_SQL=$(cat $rf|grep “Slave_SQL_Running”|cut -f2 -d’:’)
    repl_BEHIND=$(cat $rf|grep “Seconds_Behind_Master”|cut -f2 -d’:’)

    # alert down
    if [ “$repl_IO” != ” Yes” -o “$repl_SQL” != ” Yes” ] ; then
    if [ “$alert_cc” != “” ] ; then
    cc=” -c $alert_cc ”
    fi

    cat <<EOF | mail -s "$alert_failed_subject" $alert_to $cc
    $alert_failed_message

    return from slave status command:
    $(cat $rf)
    EOF
    rm $rf
    fi

    # alert slow
    if [ $repl_BEHIND -ge 30 ] ; then
    if [ "$alert_cc" != "" ] ; then
    cc=" -c $alert_cc "
    fi

    cat <<EOF | mail -s "$alert_slow_subject" $alert_to $cc
    $alert_slow_message $repl_BEHIND seconds
    EOF
    fi

      1. When splitting reads to the slave and writes to the master, the slaves could get behind during high traffic times, if there is a bad index or inefficient query. This means that content being displayed by reading from the slave isn’t up to date.

        I have added a lock file so that I don’t get flooded by SMS if the slave does go down.

        #!/bin/sh
        # replication_monitor.sh – a mysql replication monitor
        # Copyright (C) 2006 Mihai Secasiu http://patchlog.com
        #
        # This program is free software; you can redistribute it and/or modify
        # it under the terms of the GNU General Public License version 2 as
        # published by the Free Software Foundation
        #
        # This program is distributed in the hope that it will be useful,
        # but WITHOUT ANY WARRANTY; without even the implied warranty of
        # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
        # GNU General Public License for more details.
        #
        # You should have received a copy of the GNU General Public License
        # along with this program; if not, write to the Free Software
        # Foundation, Inc., 59 Temple Place – Suite 330, Boston, MA 02111-1307, USA.
        #

        DB_USER=replc
        DB_PASS=””

        alert_to=””
        alert_cc=””
        alert_failed_subject=”REPLICATION FAILED”
        alert_failed_message=”one of the replication threads on the slave server failed or the server is down\n”;
        alert_slow_subject=”REPLICATION SLOW”
        alert_slow_message=”the slave is behind by “;

        lockfile=/tmp/replication_monitor.lock
        rf=$(mktemp)

        echo “show slave status\G”|\
        mysql -u $DB_USER –password=$DB_PASS > $rf 2>&1

        repl_IO=$(cat $rf|grep “Slave_IO_Running”|cut -f2 -d’:’)
        repl_SQL=$(cat $rf|grep “Slave_SQL_Running”|cut -f2 -d’:’)
        repl_BEHIND=$(cat $rf|grep “Seconds_Behind_Master”|cut -f2 -d’:’)

        if [ ! -e $lockfile ] ; then

        # alert down
        if [ “$repl_IO” != ” Yes” -o “$repl_SQL” != ” Yes” ] ; then
        if [ “$alert_cc” != “” ] ; then
        cc=” -c $alert_cc ”
        fi

        cat <<EOF | mail -s "$alert_failed_subject" $alert_to $cc
        $alert_failed_message

        return from slave status command:
        $(cat $rf)
        EOF
        rm $rf
        fi

        # alert slow
        if [ $repl_BEHIND -ge 30 ] ; then
        if [ "$alert_cc" != "" ] ; then
        cc=" -c $alert_cc "
        fi

        cat <<EOF | mail -s "$alert_slow_subject" $alert_to $cc
        $alert_slow_message $repl_BEHIND seconds
        EOF
        fi

        touch $lockfile

        fi

        1. thanks for sharing the code!
          However, I think the line
          touch $lockfile
          is misplaced, the script runs exactly one time, with or without detecting an error.
          It should be shifted into both if-sections sending the alerts.

Leave a Reply