Zabbix

官方站点www.zabbix.com
在监控开源软件方面。Zabbix可谓是集大成者,大包大揽,包含了agent,server,proxy,graphic。而且都做的相对不错,给我们这样的懒人运维人员帮助简直不要太大。
而且简易的安装方式,也是他受众多运维喜爱的原因。
zabbix_server
*需要事先支持php环境,博主环境为nginx+mariadb+php
*系统环境CentOS Linux release 7.2.1511 (Core)
*使用最新版本的Zabbix3.0.3
*Zabbix下载地址 www.zabbix.com/download.php


[11:38:34 /usr/local/src]#wget http://iweb.dl.sourceforge.net/project/zabbix/ZABBIX%20Latest%20Stable/3.0.3/zabbix-3.0.3.tar.gz
[11:41:10 /usr/local/src]#tar xf zabbix-3.0.3.tar.gz
[11:41:47 /usr/local/src]#cd zabbix-3.0.3
[11:42:22 /usr/local/src/zabbix-3.0.3]#yum install -y net-snmp-devel net-snmp net-snmp-config      #使zabbix支持net-snmp
[11:44:42 /usr/local/src/zabbix-3.0.3]#./configure --prefix=/usr/local/zabbix \
--enable-server \
--enable-agent \
--with-net-snmp \
--with-libcurl \
--with-libxml2 \
--enable-proxy \
--with-mysql=/usr/local/mariadb/bin/mysql_config
[11:47:34 /usr/local/src/zabbix-3.0.3]#make && make install
[11:54:22 /usr/local/src/zabbix-3.0.3]#cp -a misc/init.d/fedora/core/zabbix_server  /etc/init.d
[11:54:27 /usr/local/src/zabbix-3.0.3]#vi /etc/init.d/zabbix_server     #修改BASEDIR
   # Zabbix-Directory
   BASEDIR=/usr/local/zabbix
[11:55:15 /usr/local/src/zabbix-3.0.3]#vi /etc/services                 #为系统添加解析服务
zabbix-agent    10050/tcp               # Zabbix Agent
zabbix-agent    10050/udp               # Zabbix Agent
zabbix-trapper  10051/tcp               # Zabbix Trapper
zabbix-trapper  10051/udp               # Zabbix Trapper
[11:57:45 /usr/local/src/zabbix-3.0.3]#vi /usr/local/zabbix/etc/zabbix_server.conf   #配置zabbix_server
DBHost=localhost               #有时配置为localhost,zabbix_server无法连接数据库,配置为127.0.0.1即可,保留疑问
DBName=zabbix
DBUser=zabbix
DBPassword=zabbix
DBSocket=/data/mariadb/mariadb.sock
ListenIP=192.168.0.100         #配置监听的端口
LogFile=/var/log/zabbix/zabbix_server.log #日志路径
PidFile=/var/log/zabbix/zabbix_server.pid
[12:00:31 /usr/local/src/zabbix-3.0.3]#useradd -r zabbix                             #添加系统用户
[12:00:37 /usr/local/src/zabbix-3.0.3]#mkdir -p /var/log/zabbix
[12:00:41 /usr/local/src/zabbix-3.0.3]#chown -R zabbix.zabbix /var/log/zabbix
[12:02:45 /usr/local/src/zabbix-3.0.3]#mysql
grant all privileges on zabbix.* to zabbix@'localhost' identified by 'zabbix';       #授权mariadb用户
flush privileges;
[12:02:45 /usr/local/src/zabbix-3.0.3]#mysql -u zabbix -pzabbix zabbix < database/mysql/schema.sql
[12:03:48 /usr/local/src/zabbix-3.0.3]#mysql -u zabbix -pzabbix zabbix < database/mysql/images.sql 
[12:04:02 /usr/local/src/zabbix-3.0.3]#mysql -u zabbix -pzabbix zabbix < database/mysql/data.sql
[12:04:11 /usr/local/src/zabbix-3.0.3]#service zabbix_server start                   #启动zabbix_server服务
[12:05:46 /usr/local/src/zabbix-3.0.3]#tail -f /var/log/zabbix/zabbix_server.log     #查看zabbix_server启动后,生产日志,排除故障
[12:05:59 /usr/local/src/zabbix-3.0.3]#ps aux|grep zabbix_server                     #检查进程,服务已经正常启动
zabbix    4685  0.0  0.1 143872  1688 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: housekeeper [startup idle for 30 minutes]
zabbix    4686  0.0  0.2 143864  2184 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: timer #1 [processed 0 triggers, 0 events in 0.000020 sec, 0 maintenances in 0.000000 sec, idle 30 sec]
zabbix    4689  0.0  0.1 143864  2000 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: http poller #1 [got 0 values in 0.000630 sec, idle 5 sec]
zabbix    4690  0.0  0.4 248332  4244 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.000555 sec, idle 60 sec]
zabbix    4691  0.0  0.1 143872  1904 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: history syncer #1 [synced 0 items in 0.000002 sec, idle 1 sec]
zabbix    4693  0.0  0.1 143872  1904 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: history syncer #2 [synced 0 items in 0.000001 sec, idle 1 sec]
zabbix    4694  0.0  0.1 143872  1904 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: history syncer #3 [synced 0 items in 0.000001 sec, idle 1 sec]
zabbix    4695  0.0  0.1 143872  1904 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: history syncer #4 [synced 0 items in 0.000001 sec, idle 1 sec]
zabbix    4696  0.0  0.1 143868  1992 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.000187 sec, idle 3 sec]
zabbix    4697  0.0  0.1 143872  1904 ?        S    12:20   0:00 /usr/local/zabbix/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000002 sec, idle 5 sec]
[12:08:12 /usr/local/src/zabbix-3.0.3]#cp -a frontends/php /www/zabbix        #配置php前段
[12:10:03 /usr/local/src/zabbix-3.0.3]#chown -R www.www /www/zabbix           #授权访问
[12:13:43 /usr/local/src/zabbix-3.0.3]#semanage fcontext -a -t unconfined_t "/www/zabbix(/.*)?"  #在selinux开启的情况下你可能还需要这样的操作,由于nginx为编译安装所以句柄为unconfined_t,请使用ps auxZ查看WEB服务进程。
[12:18:21 /usr/local/src/zabbix-3.0.3]#vi /usr/local/php/etc/php.ini          #配置php支持zabbix的需求
max_execution_time = 300
max_input_time = 300
always_populate_raw_post_data = -1
[12:18:21 /usr/local/src/zabbix-3.0.3]#service php-fom restart                #重启php-fpm生效配置

Zabbix的WEB页面配置就不再详细讲述。(下一步,下一步。)
*温馨提醒,zabbix的默认用户为admin,密码为zabbix。
自此zabbix_server就已经安装成功。
zabbix_agent
接下来zabbix_agent的安装。基本上步骤和server相同,安装就不再多叙述

[13:31:21 /usr/local/src/zabbix-3.0.3]#cp -a  misc/init.d/fedora/core/zabbix_agentd /etc/init.d
[13:33:27 /usr/local/src/zabbix-3.0.3]#vi /etc/init.d/zabbix_server     #修改BASEDIR
   # Zabbix-Directory
   BASEDIR=/usr/local/zabbix
[13:33:27 /usr/local/src/zabbix-3.0.3]#vi /usr/local/zabbix/etc/zabbix_agentd.conf
LogFile=/var/log/zabbix/zabbix_server.log
PidFile=/var/log/zabbix/zabbix_server.pid
Server=192.168.0.1                          #指定zabbix_server的地址
Hostname=node1
[13:36:22 /usr/local/src/zabbix-3.0.3]#vi /etc/services                 #为系统添加解析服务
zabbix-agent    10050/tcp               # Zabbix Agent
zabbix-agent    10050/udp               # Zabbix Agent
zabbix-trapper  10051/tcp               # Zabbix Trapper
zabbix-trapper  10051/udp               # Zabbix Trapper
[13:38:27 /usr/local/src/zabbix-3.0.3]#service zabbix_agentd start
[14:39:44 /usr/local/src/zabbix-3.0.3]#tail -f /var/log/zabbix/zabbix_server.log     #查看zabbix_server启动后,生产日志,排除故障
[12:39:46 /usr/local/src/zabbix-3.0.3]#ps aux|grep zabbix_agentd                     #检查进程,服务已经正常启动
zabbix    6153  0.0  0.1  84132  1340 ?        S    13:29   0:00 /usr/local/zabbix/sbin/zabbix_agentd
zabbix    6154  0.0  0.1  84132  1392 ?        S    13:29   0:00 /usr/local/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
zabbix    6155  0.0  0.1  84132  1648 ?        S    13:29   0:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
zabbix    6156  0.0  0.1  84132  1652 ?        S    13:29   0:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
zabbix    6157  0.0  0.1  84132  1632 ?        S    13:29   0:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
zabbix    6158  0.0  0.1  84132  1336 ?        S    13:29   0:00 /usr/local/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
root      6646  0.0  0.0 112644   968 pts/1    S+   13:37   0:00 grep --color=auto zabbix_agentd

好了,现在agentd也已经完成了。
zabbix_agentd已经内置了很多的监控项,具体请见链接zabbix_agent内置键值
自定义键值
基本上主机的硬件状态都已经设置好了,而实际上我们往往需要自定义监控属性。
比如WEB服务器的http链接数目,tomcat或者nginx的负载,或者游戏进程的状态以及占用的cpu,内存。等等。
OK,我们来添加一个php-fpm占用内存的监控项.
首先我们需要知道如果获取这个值.可以通过top,和ps这俩个命令来取值.

top -n 1 -b|awk '$NF ~ "php-fpm" {SUM+=$6}END{printf SUM}'
ps -eo  rss,command|awk '$2 ~ "php-fpm" {SUM+=$1}END{printf SUM}'

配置zabbix_agentd.conf

[14:18:41 ~]#vi /usr/local/zabbix/etc/zabbix_agentd.conf
Include=/usr/local/zabbix/etc/zabbix_agentd.conf.d/*.conf         #包含多个配置文件
UnsafeUserParameters=1                                    #允许所有的特殊字符作为用户自定义键值的参数
[14:18:54 ~]#vi /usr/local/zabbix/etc/zabbix_agentd.conf.d/php-fpm.conf
#自定义键值php.mem
UserParameter=php.mem,ps -eo  rss,command|awk '$2 ~ "php-fpm" {SUM+=$1}END{printf SUM}'
[14:18:54 ~]#service zabbix_agentd restart                 #重启zabbix_agentd加载新的配置文件
[14:21:32 ~]#/usr/local/zabbix/bin/zabbix_get -s 192.168.0.10 -k "php.mem"
371168

zabbix的自定义键值还支持传参,这在实际运用中实在是太棒了。
我们来修改下php-fpm.conf

[14:23:07 ~]#vi /usr/local/zabbix/etc/zabbix_agentd.conf.d/php-fpm.conf
UserParameter=php.mem[*],ps -eo  $1,command|awk '$2 ~ "php-fpm" {SUM+=$1}END{printf SUM}'
[14:23:07 ~]#service zabbix_agentd restart
[14:23:34 ~]#zabbix_get -s "192.168.0.10" -k php.mem[pcpu]
awk: cmd. line:1:  ~ "php-fpm" {SUM+=pcpu}END{printf SUM}
awk: cmd. line:1:  ^ syntax error

错了?在php-fpm.conf中作为awk参数的$1被替换为我们传递进去的值”pcpu”,这显然不是我们希望的。
*不建议在配置文件中直接使用command,更安全有效的方法是执行一个脚本。在实际使用中如果需要$2这样的变量,应该配置为$$2。

UserParameter=php.mem[*],ps -eo  $1,command|awk '$$2 ~ "php-fpm" {SUM+=$$1}END{printf SUM}'

那么我们换个做法。

[14:34:57 ~]#vi /usr/local/zabbix/etc/zabbix_agentd.conf.d/php-fpm.conf
UserParameter=php.mem[*],/usr/local/zabbix/scripts/php-fpom.sh $1
[14:36:23 ~]#vi /usr/local/zabbix/scripts/php-fpm.sh
#!/bin/bash
ps -eo  $1,command|awk '$2 ~ "php-fpm" {SUM+=$1}END{printf SUM}'
[14:39:02 ~]#chmod a+x /usr/local/zabbix/scripts/php-fpm.sh
[14:39:02 ~]#/usr/local/zabbix/scripts/php-fpm.sh rss
383556
[14:39:02 ~]#/usr/local/zabbix/scripts/php-fpm.sh pcpu
3.4
[14:40:35 ~]#service zabbix_agentd restart
[14:40:54 ~]#/usr/local/zabbix/bin/zabbix_get -s 192.168.0.10 -k "php.mem[rss]"
383032
[14:40:54 ~]#/usr/local/zabbix/bin/zabbix_get -s 192.168.0.10 -k "php.mem[pcpu]"
3.4

有待更新zabbix_proxy

monitor

作为一名运维,我相信有两点是需要用职业生涯去培养的,一个当然是技能,另外一个则是责任。
监控始终是有责任心运维的表现。
运维工程师时时刻刻都需要服务器回馈他们的状态。
本文初次编辑为2016年05月21日。会陆续添加博主自己亲身搭建的所有有关监控的的内容。

Zabbix
Grafana

shell-scripts(backup)

这是我最近维护的线上数据库的备份脚本,断断续续code了不少。
就目前来看效果很不错,当然主要目的是抛(rang)砖(wo)引(zhuang)玉(bi)。

backup.sh
点击下载
backup.sh 会调用对应的配置文件mysql-cluster-list,public.conf,backuptime文件,调度多个备份脚本执行,并统计结果

#!/bin/bash
#Author		weskiller	2016-04-08	
#Email:[email protected]

#定义环境变量
export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export LANG="en_US.UTF-8"

#定义路径
basedir=/root/scripts/backup
log=$basedir/backup.log
date=`date +%F`
starttime_conf=$basedir/backuptime
mysql_cluster_conf=$basedir/mysql-cluster-list

#superadmin邮件地址,当脚本文件缺失的时候,会发送邮件
Semail='[email protected]'

#下标变量
declare -i sub=0

#设置循环检查的间隔时间
check_sleep=60

#设置备份超时时间(每天下午6点,用户高峰前期)
deadtime=`date --date "18:00" +%s`

#获取mysql-cluster的列表
mysql_cluster_list=(192.168.1.1)

#或者通过远程的web页面获取列表
#cluster_curl=http://192.168.1.1/mysql_cluster.txt
#for list in `curl "$cluster_curl"`;do
#	mysql_cluster_list[$sub]=$list;
#done

#标记日志备份开始
echo "" >> $log
echo "$date backup.sh start" >> $log

#加载公有配置
if [ -f "$basedir/config/public.conf" ];then
	. $basedir/config/public.conf
else
	/bin/mail -r "backup.sh" -s "backup.sh scripts is interrupt" "$Semail" <<< "`date +%F\ %T` - backup scripts is not performing! the public.conf is miss."
	echo "`date +%T` - backup scripts is not performing! the public.conf is miss." >> $log
	exit 1
fi

#检查开始备份时间配置文件
if [ ! -f "$starttime_conf" ];then
	sendemail "backup.sh scripts is interrupt" "`date +%F\ %T` - backup scripts is not performing! can't find configuration with ${starttime_conf}."
	echo "`date +%T` - backup scripts is not performing! can't find configuration with ${starttime_conf}." >> $log
	exit 1
fi

#加载配置文件mysql_cluster_conf,如果存在的话
if [ -f "$mysql_cluster_conf" ];then
	mysql_cluster_list=(`cat "$mysql_cluster_conf"`)
fi

#检测后端mysql_cluster
declare -i change=0
for i in `seq 0 $((${#mysql_cluster_list[@]}-1))`;do
	nc -z ${mysql_cluster_list[$i]} 3306 >/dev/null 2>&1
	if [ $? -eq 0 ];then
		continue
	else
		echo "detected ${mysql_cluster_list[$i]} is down. will be remove from mysql cluster." >> $log
		unset mysql_cluster_list[$i]
		change=1
	fi 
done

#重新定义有效的mysql_cluster_list
if [ $change -eq 1 ];then
	mysql_cluster_list=(${mysql_cluster_list[@]})
fi

#极端情况下后端集群全部宕机
if [ ${#mysql_cluster_list[@]} -eq 0 ];then
	sendemail "backup.sh scripts is interrupt" "`date +%F\ %T` - backup scripts is not performing! without living mysql cluster."
	echo "`date +%T` - backup scripts is not performing! without living mysql cluster." >> $log
	exit 1
fi

#手动备份
if [ "x$1" == "x" ];then
	manual=0
elif [ "$1" == "manual" ];then
	manual=1
	echo "backup.sh is start with manual." >> $log
fi

#重定向所有标准错误信息
#exec 2>$basedir/backup.error

#备份循环
sub=0
for project in $basedir/config/*-*;do
	script=`basename $project`
	project_name=${script%-*}
#默认模式下,根据设定的时间备份
	if [ "$manual" -eq 0 ];then
#获取配置时间
		project_start_time="`awk 'BEGIN{FS="="}/'$script'/{print $2}' $starttime_conf`" 
		if [ $? -eq 0 ];then
			project_timestamp=`date --date "$project_start_time" +%s`
			if [ "$project_timestamp" -le "`date +%s`" ];then
				echo "`date +%F\ %T` - $script backuptime is error" >> $log
				continue
			fi
		else
			echo "`date +%F\ %T` - $script backuptime is not find " >> $log
			continue
		fi
		waittime="$((project_timestamp-`date +%s`))"
#手动备份,根据设定的时间延迟
	elif [ "$manual" -eq 1 ];then
		first=`awk 'BEGIN{FS="="}{print $2}' backuptime|grep -P "^[^ ]" |sort -n|head -1`
		lag=$((`date --date '1 minute' +%s`-`date --date "$first" +%s`))
		project_start_time="`awk 'BEGIN{FS="="}/'$script'/{print $2}' $starttime_conf`" 
                if [ $? -eq 0 ];then
                        project_timestamp=$((`date --date "$project_start_time" +%s`+$lag))
                else
                        echo "`date +%F\ %T` - $script backuptime is not find " >> $log
                        continue
                fi
                waittime="$((project_timestamp-`date +%s`))"
	fi
	host=${mysql_cluster_list[$((sub%${#mysql_cluster_list[@]}))]}
#使用screen,定时放入后台
	screen -S $project_name -d -m /bin/bash "$project" "$host" "$waittime"
#放入项目数组
	project_list[$sub]="$script"
	((sub++))
done

#监控今天的备份日志,当所有备份完成后发送邮件通知
line=`grep -Pn "$date backup.sh start" $log|sed -n '$p'|awk 'BEGIN{FS=":"}{print $1}'`
while sleep $check_sleep && [ ${#project_list[@]} -gt 0 ];do
	sub=0
	for i in ${project_list[@]};do
		if sed -n "$line,\$p" $log |grep -Pq "$i backup (success|failed)" ;then
			 unset project_list[$sub]
		fi  
		((sub++))
	done
	project_list=(${project_list[@]})
	if [ "`date +%s`" -ge "$deadtime" ];then
		sendemail "backup is overtime" "`date +%F\ %T` -- project:${project_list[*]} already overtime"
	fi
done

#判断是否有失败的项目
if sed -n "$line,\$p" $log |grep -Pq "backup failed";then
	backup_failed=1
else 
	backup_failed=0
fi

#标记日志备份结束
echo "$date backup.sh finish" >> $log

#统计失败项目邮件
if [ $backup_failed -eq 1 ];then
	sendemail "some project backup failed" "`date +%F\ %T` -- backup failed project:`sed -n $line,'$p' $log|grep -Po "\S*(?= backup failed)"|tr '\n' ' '`"
fi

#生成WEB网页
sed -n -e "$line,\$p" -e '1a<pre>' -e '$a</pre>' $log > /backup/backup.html

#日志整理
if [ `date --date '1 day' +%d` -eq 1 ];then
        .  $basedir/logrotate.sh
fi

#优雅的退出
exit 0

scripts
点击下载
scripts 是真正的备份执行脚本,加载public.conf获取公共配置信息,针对不同的线上业务和数据库可以自定义操作,或者根据需求修改,这里只是提供模板。

#!/bin/bash
#author weskiller 2016-04-08
#description:project backup scripts unit

#2015-05-16:新增隔天自动备份功能

#定义环境变量
export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export LANG="en_US.UTF-8"
#获取脚本包含的信息
dir=/root/scripts/backup
scripts=`basename $0`
backup_host=192.168.1.${scripts#*-}
game_label=${scripts%-*}

#定义各种需要的时间戳
month=`date +%m`
last_month=`date --date "1 months ago" +%m`
day=`date +%d`
yesterday=`date --date "1 day ago" +%d`

#定义数据库信息
game_database="p${game_label}_logs1"
game_backup_database="p${game_label}_logs_backup1"
today="$(date --date @$(($(date --date "`date +%F`" +%s)-3600*8)) +%F\ %T)"
delete_gamelog_time="$(date --date @$((`date --date "$today" +%s`+7*3600)) +%F\ %T)"
three_months="`date --date '4 month ago' +%m`"

#加载公共配置
if [ -f "$dir/config/public.conf" ];then
	. $dir/config/public.conf
else
	echo "`date +%F\ %T`:not find public.conf" > $dir/config/$scripts.error 
	exit 1; 
fi

#获取backup.sh传递的参数
if [ "x$1" == "x" ];then
	echo "`date +%F\ %T`:not received parameter:mysql backup host " >> $dir/log/$scripts.log
	exit 1
else
	db_host="$1"
fi
if [ "x$2" == "x$2" ];then
	sleep $2
	if [ $? -ne 0 ];then
	echo "`date +%F\ %T`:received sleep time parameter,but sleep error " >> $dir/log/$scripts.log
	exit 1
	fi
else
	echo "`date +%F\ %T`:not received parameter:sleep time(s) " >> $dir/log/$scripts.log
	exit 1
fi

#开始时间戳
start_time="`date +%s`"

#创建文件夹,如果不存在的话
/usr/bin/ssh root@$backup_host "mkdir -p $backupdir/$game_label/months/$month"

#创建xtrabakup命令
if [ "$day" -eq 1 ];then
	innobackup_command="/usr/bin/innobackupex --no-timestamp $backupdir/$game_label/months/$month/$day"
	/usr/bin/ssh root@$backup_host "rm -f $backupdir/$game_label/days"
	/usr/bin/ssh root@$backup_host "ln -sf $backupdir/$game_label/months/$month $backupdir/$game_label/days"
#删除三个月之前的备份
	delete_three_backup="rm -rf $backupdir/$game_label/months/$three_months"
else
#获取days下的目录,确定日期
	uday=$(ssh root@$backup_host "basename \`ls -d $backupdir/$game_label/days/*|grep -Pv "tar.bz2$"\` 2>/dev/null")
	if [ ${uday:-0} -eq $yesterday ];then
	innobackup_command="/usr/bin/innobackupex --incremental --incremental-basedir $backupdir/$game_label/days/$yesterday --no-timestamp  $backupdir/$game_label/days/$day ; result=\$? ; if [ \$result -ne 0 -a ! -e $backupdir/$game_label/days/$day/xtrabackup_info ];then rm -rf $backupdir/$game_label/days/$day;fi ;exit \$result"
	else
	innobackup_command="/usr/bin/innobackupex --incremental --incremental-basedir $backupdir/$game_label/days/$uday --no-timestamp  $backupdir/$game_label/days/$day ; result=\$? ; if [ \$result -ne 0 -a ! -e $backupdir/$game_label/days/$day/xtrabackup_info ];then rm -rf $backupdir/$game_label/days/$day;fi ;exit \$result"
	jump=1	
	fi
fi

#创建mysqldump命令
mysqldump_command="mysqldump --single-transaction --skip-add-locks --skip-lock-tables  --no-create-info --replace --where \"create_time < '$today'\" $game_database logs  |mysql -u$db_user -h$db_host -p$db_pass $game_backup_database"

#创建清理日志命令
mysql_command="mysql -e \"delete from $game_database.logs where create_time < '$delete_gamelog_time';\""

#压缩前一天的备份文件
if  [ "$day" -eq 1 ];then
	tar_command="tar -jcf $backupdir/$game_label/months/$last_month/$yesterday.tar.bz2 -C $backupdir/$game_label/months/$last_month $yesterday --remove-files"
else
	if [ ${uday:-0} -eq $yesterday ];then
		tar_command="tar -jcf $backupdir/$game_label/days/$yesterday.tar.bz2 -C $backupdir/$game_label/days $yesterday --remove-files"
	else
		tar_command="tar -jcf $backupdir/$game_label/days/${uday}.tar.bz2 -C $backupdir/$game_label/days $uday --remove-files"
	fi
fi
#日志标记开始
echo "" >> $dir/log/$scripts.log
echo "`date +%F\ %T` $scripts start" >> $dir/log/$scripts.log
#在远程备份客服端上执行命令
for command in "$innobackup_command" "$mysqldump_command" "$mysql_command" "$tar_command" "$delete_three_backup" ;do
#先判断变量内容,确保ssh命令正常执行
if [ -z "$command " ] ||  echo "$command"|grep -Pq "^\s*$" ;then
	continue;
fi
ssh root@$backup_host "$command" 2>$dir/log/$scripts.error 1>/dev/null
if [ $? -eq 0 ];then
	echo "`date +%F\ %T` -- \"$command\" performing success." >> $dir/log/$scripts.log
else
	echo "`date +%F\ %T` -- \"$command\" performing failed." >> $dir/log/$scripts.log
#	sendemail "$scripts" "`cat $dir/log/$scripts.error`"
	log "failed" "$scripts"
	exit 1
fi
done
#删除错误的输出
rm -rf $dir/log/$scripts.error
#标记日志结束
echo "`date +%F` $scripts finish" >> $dir/log/$scripts.log
#输出backup.log日志
if  [ ${jump:-0} -eq 1 ];then
	log "success" "$scripts" "$((`date +%s`-start_time))" "$uday $day"
else
	log "success" "$scripts" "$((`date +%s`-start_time))"
fi
#优雅的退出
exit 0

public.conf
点击下载
public.conf 公共配置文件,包含了日志输出和维护邮件组,mysql验证

#this is public config that all scripts need to be loaded
#定位绝对路径
backupdir=/backup
#备份验证
db_user=
db_pass=

#负责人E-Mail地址
adminmail=([email protected] [email protected])
sendemail () {
	for mail in ${adminmail[@]};do
	/bin/mail -r "backup.sh" -s "$1" "$mail" <<< "$2"
	done
}

#输出日志函数
log () {
case $1 in
	success)
	if [ $3 -lt 60 ];then
		consume="${3}s"
	elif [ $3 -lt 3600 ];then
		consume="$(($3/60))m$(($3%60))s"
	else
		consume="$(($3/(60*60)))h$((($3%(60*60))/60))m$(($3%60))s"
	fi
	if [ "x$4" == "x" ];then
		echo "`date +%F\ %T` - $2 backup $1. used ${consume}." >> /root/scripts/backup/backup.log
	else
		echo "`date +%F\ %T` - $2 backup $1. used ${consume}. ${4% *} => ${4#* } " >> /root/scripts/backup/backup.log
	fi
	;;
	failed)
	echo "`date +%F\ %T` - $2 backup $1" >> /root/scripts/backup/backup.log
	;;
esac
}

backuptime
点击下载
backuptime 主要用于设定备份数据库的时间

#测试数据库
test-1=01:00
test-2=00:00

#线上数据库
produce-101=03:00
produce-102=03:30

shell-character

Bourne-Again SHell是绝大多数linux系统下默认的shell,只要涉及linux平台上的研发和维护,都或多或少的会接触Bash的CLI.
Bash shell确实十分强大,在linux的平台上,几乎无所不能,但对于新手来说,还是有许多需要注意的地方.
下面的知识适合刚接触bash,却还不熟练的同学.

bash shell是对字符非常敏感的脚本语言.
*以下所有命令均在CentOS Linux release 7.2.1511 (Core)测试,linux内核为3.10.0-327.10.1.el7.x86_64,bash 版本为 GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu),部分内容可能在其他的linux发行版上有差异.

先来分析linux的分隔符.
shell的默认分隔符存放在变量IFS(Internal Field Seperator)中,下面我们来看看它长啥样(注意输出结果).

[13:56:37 ~]#set -x                #开启命令输出模式,即会输出你键入的命令(包括所有参数)
[13:57:17 ~]#echo -n $IFS
+ echo -n
[13:57:29 ~]#echo -n "$IFS"
+ echo -n ' 	
'
 	
[13:57:37 ~]#echo -n '$IFS'
+ echo -n '$IFS'
$IFS[13:57:42 ~]#

=_=,是不是很奇怪?IFS里面到底放的是些什么?那么我们再看看这个变量的一些信息(你可能需要提前掌握使用bash变量的一些知识点我查看).

[14:05:26 ~]#set +x                #关闭命令输出模式
[14:05:27 ~]echo ${#IFS}
3

看来有三个字符存储在IFS里面,那么挨个挨个的查看下.

[14:08:10 ~]#set -x                #开启命令输出模式
[14:08:10 ~]#echo -n ${IFS:0:1}
+ echo -n
[14:08:11 ~]#echo -n "${IFS:0:1}"
+ echo -n ' '
 [14:08:12 ~]#

到这里,大家应该明白了,第一个字符存储的是空格,linux默认吧空格作为分隔符.
可能还有同学会有疑问,加双引号,和不加双引号,为什么输出不同,这里简单说一下(后面会更详细的说明),在linux中具有非常特别的意义,通常被称呼为强制引用符.即被双引号包括的内容,就一定会被解释,包括空白,所以即使${IFS:0:1}是一个空格,echo 还是会输出这个空格(第六行)!
那么接下来就简单多了.

[14:25:52 ~]#echo -n ${IFS:1:1}
+ echo -n
[14:25:55 ~]#echo -n "${IFS:1:1}"
+ echo -n '	'
	[14:25:59 ~]#

第二个是制表符\t,作为linux默认的第二个分隔符.

[14:27:23 ~]#echo -n ${IFS:2:1}
+ echo -n
[14:34:38 ~]#echo -n "${IFS:2:1}"
+ echo -n '
'

[14:34:42 ~]#

第三个是换行符\n,作为linux默认的第三个分隔符.
讲解分隔符是为了让你进一步了解bash工作的机制.帮助你理解类似下面这样的语句.

#example.sh
array=(a b c)
for i in "${arrary[*]}";do
     echo $i
done
for j in ${array[*]};do
     echo $j
done

*其实在使用数组中,如果某一个成员的内容包含了分隔符,那么在使用${array[*]}往往不是你想要的结果.而${array[@]}却会方便的多.

下面来了解bash的几个特殊字符,包括`
双引号在shell中使用的很多,然而能够熟练使用的人却不多.
回顾下双引号在shell脚本中的使用
赋值变量,当变量包含特殊字符,比如空白字符或者需要使用引用变量的时候

var1="hello world!"        #请注意这条赋值语句
var2="welcome to blog.weskiller.com , $var1"

使用grep awk sed命令

echo $var2|grep "hello world"
echo $var2|sed "s/blog/www/"
echo $var2|awk "{print $3}"

=V=,好像出了什么问题~. 在给var1赋值的时候发生了意想不到的事情.

var1="hello world!"
-bash: !": event not found

这是什么鬼?
新手一定在这里懵逼了,其实在交互式的shell中!会被转义为history,就像下面演示的一样,其实在非交互的shell中,这个功能会被禁用.

[15:41:18 ~/scripts]#history -c
[15:41:21 ~/scripts]#ls
backup.sh  clear_log.sh  test.sh
[15:41:22 ~/scripts]#echo

[15:41:35 ~/scripts]#history 1
    3  history 1
[15:41:41 ~/scripts]#!-1
history 1
    3  history 1
[15:41:44 ~/scripts]#

没错,双引号强制转义了!,而实际上如果不用双引号反而达到了你想要的效果.

var1=hello\ world!              #这里使用反斜线来转义空格,这样就不会被认为是分隔符,所以在shell看来"hello world!"是一个整体
echo $var1 

单引号在使用中往往是要告诉shell,不要乱改我输出的内容,常称呼为弱引用符
就像双引号遇到的情况,用单引号就可以完美的解决

var1='hello world!'

反引号`可谓是shell脚本语言的精华,用过的人都…,我常称之为替换符.
`command` 中的命令会被执行而输出结果会返回,这个和$(command)如出一辙,唯一可能觉得遗憾的是` `不支持调套使用,而$()是可以多次嵌套使用的.
*到现在为止,博主尚未发现` `和$()的使用区别.
在使用反引号的时候注意的是不要被单引号包括,同时反引号外的双引号和反引号内的双引号互不干扰,就像下面这样

timestamp="$(date --date "@$((`date --date "1 days ago" +%s`-2*3600))" +%F\ %T)"                       #取得当前时间间隔为一天两小时的时间戳

讲了这么多,在实际使用中的效果如何?
来看看吧

#定义时间戳,多次复用date命令
today="$(date --date @$(($(date --date "`date +%F`" +%s)-3600*8)) +%F\ %T)"
delete_timestamp="$(date --date @$((`date --date "$today" +%s`+7*3600)) +%F\ %T)"
#巧妙使用单双引号,在sed,awk中插入变量
pattern="`date +%F\ %T`"
grep -P "$parttern" $log
sed -i 's/'"$pattern"'/'"`date +%F`"'/g' $log