前言

我的网站codingsoho运行了一段时间,数据库也日渐庞大,感觉数据库的备份变得刻不容缓了。周末整理了一下,实现了基本的备份功能。包括数据库及媒体文件的本地以及云备份,同时实时邮件通知备份进展。

工具备份MySQL文件 - Navicat

如果你不打算用代码来实现,Navicat可以帮忙你实现基本的数据库备份功能。刚好我也是用Navicat管理MySQL的数据库的,所以顺便学习了一下。

假定你已经创建了自己的连接。打开连接属性,在[高级]标签里设置"设置保存路径",这样你的备份文件将会存在此路经。

设置完成之后,打开[计划]功能,点击下方[新建批处理作业]

会出现下方的批处理窗口,在左边选择对应的数据库,它就会出现在右边的[可用任务]里,点击它,将将会被选中出现在下面的[已选择任务]列表里。

点击做上方的[保存按钮],保存该任务,在弹出的下方对话框填入文件名确定即可。

这时可以看到,刚刚创建的backup任务已出现在计划列表里。

选中该任务,点击上方的[设置任务计划],出现一个弹出对话框,选中[计划]标签栏,点击[新建]按钮开始新建一个计划,根据下面的选项,你可以选择按天或者其他的选择。

确定之后,它会再让你输入密码确认。

设置完成之后,它机会自动的去备份你的数据库了。看了一下它备份的psc格式,可以通过Navicat转为sql文件。

python脚本备份

上面只是简单介绍一下工具,下面开始我们的主题:如何用python来实现。

备份MySQL

MySQL备份的命令是mysqldump

常用的格式为

mysqldump -u[username] -p[password] [dbname] > [save-path]

所以,我们备份的核心功能其实就是调用这句命令

cmd_dump='mysqldump -u%s -p%s %s > %s' %(conf.sql_username,conf.sql_pwd,db_name,output_filename)
os.system(cmd_dump)

在上面的语句中,用户名和密码我们存在配置文件里,db_name是数据库名字,因为我不仅要备份一个数据库,所以这个会是一个动态的结果,最终我们要设置好备份位置output_filename

配置文件为conf.py,内容如下

sql_username = "root"
sql_pwd = "******"
sql_exe_path = "D:\Program Files (x86)\MySQL\MySQL Server 5.5\bin"
output_sql_path = "C:\\backup\sql"
output_media_path = "C:\\backup\media"
logs_path = "C:\\backup\log"

完整代码如下

    # set log path
    logname = datatime_util.filename_prefix + '.log' 
    logpath = os.path.join(conf.logs_path, logname) 
    # set filename
    filename = datatime_util.filename_prefix + '.sql'
    # loop each db
    for db_name in conf.dbs:
        msg = '-'*2 + datatime_util.logname_prefix + " " 
        # set file path
        output_filename = conf.output_sql_path + os.sep + db_name + "_" + filename
        # execute dump command
        cmd_dump='mysqldump -u%s -p%s %s > %s' %(conf.sql_username,conf.sql_pwd,db_name,output_filename) 
        if os.system(cmd_dump)==0:
            msg += 'database backup successfully: {}'.format(output_filename)
        else:
            msg += 'database backup failed! {}'.format(output_filename)
        write_logs(logpath,msg) 

dbs定义如下

dbs = (
"sales6",
"sales7",
    )

其中,调用了一个命令write_logs,用于写日志,就是常见的写文件操作,追加方式,代码如下

def write_logs(filename,contents):
    with open(filename,'a') as f:
        f.write(contents + '\n')

datatime_util提供根据时间设置一些文件名和log的方法,这样我们能够保证创建的文件或内容可溯性强,同时不会重复,实现如下

class DateTimeUtil(object):
    @property
    def foldername_day(self):
        return time.strftime('%Y-%m-%d')
    @property
    def filename_prefix(self):
        return time.strftime('%Y-%m-%d-%H%M%S')
    @property
    def logname_prefix(self):
        return time.strftime('%Y-%m-%d,%H:%M:%S')

备份媒体文件

备份媒体文件用到了window的命令xcopy,实现方法类似,代码如下

    for media in conf.medias:
        prefix = '+'*2 + datatime_util.logname_prefix + ' '  
        foldername = media[0] + "-" + datatime_util.filename_prefix #
        output_foldername = conf.output_media_path + os.sep + foldername 
        if not os.path.exists(output_foldername): 
            if not os.mkdir(output_foldername):
                msg = prefix + "succeed to create media folder {}".format(output_foldername)
                write_logs(logpath, msg)             
            else:  
                msg = prefix + "fail to create media folder {}".format(output_foldername)
                write_logs(logpath, msg)
                sys.exit()            
        cmd_copy='xcopy %s %s /s /e' %(media[1], output_foldername)
        if os.system(cmd_copy)==0:
            msg = prefix + 'media copy successfully: {}'.format(media[1])
        else:
            msg = prefix + 'media copy failed! {}'.format(media[1])
        write_logs(logpath,msg)

medias的定义如下,里面放的是应用的media文件

medias = (
("sales6","E:\\Computer\\virtualenv\\djangocmsblog\\djangocms35-csblog\\media"),
    )

它同样轮询媒体库,然后将目录下的文件拷到指定目录,并按合适规则命名

媒体文件压缩

媒体文件往往很大,如果直接拷贝对磁盘要求太高,所以我们接下来对它压缩一下。
高效的方法应该是读原文件直接压缩,我这儿本来是分两步介绍的,所以先拷贝,再压缩,最后再删除文件夹。实际使用时可以进行优化。

        f = zipfile.ZipFile( output_zipname, 'w', zipfile.ZIP_DEFLATED )
        zip_subdir = "media"
        for dirpath, dirnames, filenames in os.walk( output_foldername ):
            for filename in filenames:
                filepath = os.path.join(dirpath, filename)
                zip_path = os.path.join("media", filepath[len(conf.output_media_path)+1:])
                f.write(filepath, zip_path)
                cmd_remove='rm -rf %s' %(output_foldername) 
                os.system(cmd_remove)

        f.close()

关于压缩技术已在Django压缩包下载一文做过介绍,这儿就不再详述了。
提醒一下的是,对于命令f.write,第二个参数最好指定,否则它会根据源文件路径压缩,层级太多。

有的windows电脑上rm命令不支持,我们可以用下面命令代替

例如:

del /s /q *.svn

rmdir /s /q *.svn

/q disables Yes/No prompting
/s means delete the file(s) from all subdirectories.

定时调度

到这儿,我们的基本功能就实现了。但是我们希望它是周期性调度的,而不是只执行一次。

接下来我们用python的schedule功能来实现。

首先如果没安装先安装一下

pip install schedule

使用非常简单,它的demo代码如下

import schedule
import time

def job():
    print("I'm working...")
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)
while True:
    schedule.run_pending()
    time.sleep(1)

假如说我想每分钟都运行一次,可以按照下面这样实现。

    schedule.every(1).minutes.do(backup)  

    while True:
        schedule.run_pending()
        time.sleep(1)

邮件通知

每一次备份都有可能出错,我们希望结果可以实时通知到管理员,不管备份是成功还是失败了。 这儿我采用邮件方式来实现。

我用的是sina邮箱,首先你要在sina客户端配置好,让它支持smtp.

接下来实现发送邮件的功能,我做了一个封装类

import smtplib
from email.mime.text import MIMEText
import conf

class MailUtil():
    def __init__(self, *args, **kwargs):
        self.SMTPserver = kwargs.get('address', None)
        self.sender = kwargs.get('sender', None)  
        self.password = kwargs.get('password', None)
        self.destination = args[0]
    def sendmail(self, subject, message):
        msg = MIMEText(message,_charset='utf-8')
        msg['Subject'] = subject
        msg['From'] = self.sender
        msg['To'] = self.destination
        mailserver = smtplib.SMTP(self.SMTPserver, 25)
        mailserver.login(self.sender, self.password)
        mailserver.sendmail(self.sender, [self.destination], msg.as_string())
        mailserver.quit()
        # print('send email success')

在conf.py里配置好邮箱发送需要的相关配置

sender_user = dict(address='smtp.sina.com',sender='codingsoho@sina.com',password='codingsoho')
receiver_user = ('codingsoho@163.com',)

有点奇怪,我往live.com发送总是不成功。后来网上搜了一下

注意:使用的sender和receivers 都需要在对应的邮箱设置中开启SMTP服务,否则发送不会成功。

调用发送则非常简单

mailsending = MailUtil(*conf.receiver_user, **conf.sender_user)
mailsending.sendmail("codingsoho backup notification", msg)

但是部署到阿里云上时,这儿又出错了。

Traceback (most recent call last):
  File "C:\virtualenv\codingsohobackup\mailutils.py", line 26, in <module>
    mailsending.sendmail("codingsoho backup notification", "send sucessfully")
  File "C:\virtualenv\codingsohobackup\mailutils.py", line 18, in sendmail
    mailserver = smtplib.SMTP(self.SMTPserver, 25, timeout=120)
  File "C:\Python27\lib\smtplib.py", line 256, in __init__
    (code, msg) = self.connect(host, port)
  File "C:\Python27\lib\smtplib.py", line 316, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "C:\Python27\lib\smtplib.py", line 291, in _get_socket
    return socket.create_connection((host, port), timeout)
  File "C:\Python27\lib\socket.py", line 575, in create_connection
    raise err
socket.error: [Errno 10060]

原因是新开通的ECS默认禁止连接外网的25端口,可以尝试连接SSL 465端口,如 smtplib.SMTP('smtp.sina.com', 465)

这两个协议如下

25端口(SMTP):25端口为SMTP(Simple Mail TransferProtocol,简单邮件传输协议)服务所开放的,是用于发送邮件。

465端口(SMTPS):465端口是为SMTPS(SMTP-over-SSL)协议服务开放的,这是SMTP协议基于SSL安全协议之上的一种变种协议,它继承了SSL安全协议的非对称加密的高度安全可靠性,可防止邮件泄露。SMTPS和SMTP协议一样,也是用来发送邮件的,只是更安全些,防止邮件被黑客截取泄露,还可实现邮件发送者抗抵赖功能。防止发送者发送之后删除已发邮件,拒不承认发送过这样一份邮件。

我试了一下,至少telnet smtp.sina.com 465能连上,但是还是报错。

Traceback (most recent call last):
  File "C:\virtualenv\codingsohobackup\mailutils.py", line 26, in <module>
    mailsending.sendmail("codingsoho backup notification", "send sucessfully")
  File "C:\virtualenv\codingsohobackup\mailutils.py", line 18, in sendmail
    mailserver = smtplib.SMTP(self.SMTPserver, 465)
  File "C:\Python27\lib\smtplib.py", line 256, in __init__
    (code, msg) = self.connect(host, port)
  File "C:\Python27\lib\smtplib.py", line 317, in connect
    (code, msg) = self.getreply()
  File "C:\Python27\lib\smtplib.py", line 368, in getreply
    raise SMTPServerDisconnected("Connection unexpectedly closed")
smtplib.SMTPServerDisconnected: Connection unexpectedly closed

查了一下资料,原来SMTPSMTPS使用不同的函数来创建。 另外,上面代码中只支持单个接收人,我同时修改了代码支持多个接收地址,修改后的代码如下:

class MailUtil():
    def __init__(self, *args, **kwargs):
        self.SMTPserver = kwargs.get('address', None)
        self.sender = kwargs.get('sender', None) 
        self.password = kwargs.get('password', None)
        self.destination = args[0]
        self.port = kwargs.get('port', 25)
    def sendmail(self, subject, message, port=25):
        msg = MIMEText(message,_charset='utf-8')
        msg['Subject'] = Header(subject, 'utf-8')
        msg['From'] = self.sender
        msg['To'] = ", ".join(self.destination) # multi recipient concatenated with ","
        try: 
            if 25 == self.port:
                mailserver = smtplib.SMTP(self.SMTPserver, self.port)
            elif 465 == self.port:
                mailserver = smtplib.SMTP_SSL(self.SMTPserver, self.port)
            else:
                mailserver = smtplib.SMTP(self.SMTPserver, 25)
            # for debug
            '''
            mailserver.set_debuglevel(1)
            mailserver.ehlo()
            '''
            try:
                mailserver.login(self.sender, self.password) 
            except smtplib.SMTPException as e:
                print("Authentication failed:", e)
                sys.exit(1)
            mailserver.sendmail(self.sender, self.destination, msg.as_string()) 
        except (socket.gaierror, socket.error, socket.herror, smtplib.SMTPException) as e: 
            print(" *** your message may not have been sent! ")
            print(e)
            sys.exit(2)
        else:
            print("Message successfully sent to %d recipient(s) " %len(self.destination )) 

        mailserver.quit()

配置文件同样修改

sender_user = dict(address='smtp.sina.com',sender='codingsoho@sina.com',password='codingsoho')
receiver_user = (['codingsoho@163.com','codingsoho@126.com'],)

完成之后,邮件就可以正常发送了。

上传到百度云

安装

pip install bypy

授权

执行bypy info,然后根据提示通过百度的网址访问获取授权码,激活授权。

bypy info

(env35) E:\Computer\virtualenv\codingsohobackup\src>bypy info
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> You are running Python on Windows, which doesn't support Unicode so well.
Files with non-ASCII names may not be handled correctly.
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> WARNING: System locale is not 'UTF-8'.
Files with non-ASCII names may not be handled correctly.
You should set your System Locale to 'UTF-8'.
Current locale is 'cp936'
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> Encoding for StdOut: cp936
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Please visit:
https://openapi.baidu.com/oauth/2.0/authorize?scope=basic+netdisk&redirect_uri=oob&response_type=code&client_id=q8WE4EpCsau1oS0MplgMKNBn
And authorize this app
Paste the Authorization Code here within 10 minutes.
Press [Enter] when you are done

根据上面提示的网址https://openapi.baidu.com/oauth/2.0/authorize?scope=basic+netdisk&redirect_uri=oob&response_type=code&client_id=q8WE4EpCsau1oS0MplgMKNBn 去获取授权码,然后输入并回车

56dd11c4921359297a44901686fadcce
Authorizing, please be patient, it may take upto 300 seconds...
Authorizing/refreshing with the OpenShift server ...
OpenShift server failed, authorizing/refreshing with the Heroku server ...
Successfully authorized
Quota: 2.007TB
Used: 172.078MB

授权成功。

测试

接下来测试百度云的同步功能,包括上传、下载及同步比较。由于百度PCS API权限限制,程序只能存取百度云端/apps/bypy目录下面的文件和目录。

执行

bypy list

/apps/bypy ($t $f $s $m $d):

上传命令:

bypy upload

比如创建本地文件1.txt,执行该命令,即可将文件上传

下载命令:

bypy downdir

完成之后可以通过下面命令比较

bypy compare

上面的例子中,比较结果如下

==== Same files ===
F - 1.txt
==== Different files ===
==== Local only ====
==== Remote only ====

Statistics:
--------------------------------
Same: 1
Different: 0
Local only: 0
Remote only: 0

程序实现媒体文件上传

我们会用bypy upload实现文件上传,查看源代码https://github.com/houtianze/bypy/blob/master/bypy/bypy.py,它的定义如下

def upload(self, localpath = '', remotepath = '', ondup = "overwrite"):
  ''' Usage: upload [localpath] [remotepath] [ondup] - \
upload a file or directory (recursively)
    localpath - local path, is the current directory '.' if not specified
    remotepath - remote path at Baidu Yun (after app root directory at Baidu Yun)
    ondup - what to do upon duplication ('overwrite' or 'newcopy'), default: 'overwrite'

这儿localpath可以是目录,也可以是指定的文件

接下来用subprocess来调用该命令执行上传

import subprocess
subprocess.call(['bypy', 'upload', str(result_file), "/upload/to/"])

这样,文件就会上传到百度云的/app/bypy/upload/to目录

Next

剩下的需要完成的

  • 备份媒体文件时,只备份差异部分,否则媒体库变多后对磁盘是很大的负担

参考文档

下一节准备

评论

codingsoho: 10 months ago
Hi, testing for the comments about blog backup