python文件自动同步备份v1.2【运维必备】2020/12/31

We. · 发表于 2020-12-25 20:29

本帖最后由 We. 于 2021-1-4 08:18 编辑

v1版本打包在这里了，感兴趣的自己下来看：

同步备份v1.rar (1.6 KB, 下载次数: 50)

声明：
感谢@dawn2018的提醒，本方案只适用于局域网内同步备份，没有做加密/认证，没有过防火墙。

以下是v1.2内容：
新增加了多进程下载
修改下目录就可以直接拿去用了
--------------------------------------------------------------------------------------------------------------------------

需求：平台会把虚拟机备份的文件打包到服务器A，再同步备份到服务器B（只需要考虑A到B）。

思路：
服务器A作为服务端，定时遍历自己的文件目录，把文件目录信息打包成一个校验文件。
服务器B作为客户端，下载校验文件，遍历自己的文件目录是否和服务器相同，并下载本地没有的文件。
通过http传输，使用python开启一个简单的http服务。有防火墙需要把端口放通，没有就不管。

生产环境：python3.7.9，两台CentOS7.9服务器。

在服务端的备份目录下开启http服务：
nohup是用来后台开启http服务的，不然控制台没法干其他事情。

服务端：

[Python] 纯文本查看 复制代码

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

import os
 
path = '/H3C_Backup'
 
def func(path):
    contents = os.walk(path, topdown=True)
    dir = []
    file = []
    for (root, dirs, files) in contents:
        dir.append(root)
        for i in files:
            file.append(root+'/'+i)
    return [dir, file]
 
content = func(path)
 
with open(path+'/'+'content.txt', 'w', encoding='utf-8') as f:
    for i in content[0]:
        f.write(i)
        f.write('\n')
 
with open(path+'/'+'file.txt', 'w', encoding='utf-8') as f:
    for i in content[1]:
        f.write(i)
        f.write('\n')

客户端：

[Python] 纯文本查看 复制代码

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

import os
import time
import shutil
import multiprocess
import requests
 
 
def init() :
    url = ['http://172.172.172.1:8000/file.txt', 'http://172.172.172.1:8000/content.txt']
    download_file = requests.get(url[0], stream=True)
    with open('/download/file.txt', 'wb') as f :
        for chunk in download_file.iter_content(chunk_size=4096) :
            f.write(chunk)
 
    download_content = requests.get(url[1], stream=True)
    with open('/download/content.txt', 'wb') as f :
        for chunk in download_content.iter_content(chunk_size=4096) :
            f.write(chunk)
 
 
def function(path) :
    # 通过os.walk()方法遍历到所有文件夹和文件
    file = []
    dir = []
    x = os.walk(path, topdown=True)
    for (root, dirs, files) in x :
        dir.append(root)
        for i in files :
            file.append(root + '/' + i)
    return [dir, file]
 
 
def check_dir(path) :
    # 获取本地目录
    x = function(path)
    dir_so = x[0]
 
    # 清洗服务端目录
    dirs = open('/download/content.txt', 'r', encoding='utf-8')
    dir_dst = dirs.readlines()
    dir_dst_info = []
    for i in dir_dst :
        i = i.replace('\n', '')
        print(i)
        dir_dst_info.append(i)
 
        # 比较目录，目录不一致就添加
    for i in dir_dst_info[1 :] + dir_so :
        if i not in dir_so :
            os.mkdir(i)
            print('创建了' + i)
        if i not in dir_dst_info :
            try :
                shutil.rmtree(i)
                print('删除了' + i)
            except :
                pass
 
 
def download(url, path) :
    download_file = requests.get(url, stream=True)
    with open(path, 'wb') as f :
        for chunk in download_file.iter_content(chunk_size=10240) :
            f.write(chunk)
            print('添加了' + path)
 
 
def check_file(path) :
    x = function(path)
    file_so = x[1]
    pool = multiprocessing.Pool(processes=10)
    # 清洗服务端文件
    files = open('/download/file.txt', 'r', encoding='utf-8')
    files_dst = files.readlines()
    files_dst_info = []
    for i in files_dst :
        i = i.replace('\n', '')
        files_dst_info.append(i)
 
    # 没有的下载,多余的删掉
    for i in file_so + files_dst_info :
        if i not in file_so :
            url = 'http://172.172.172.1:8000' + i
            pool.apply_async(download, (url, i,))
 
        if i not in files_dst_info :
            os.remove(i)
            print('删除了' + i)
    pool.close()
    pool.join()
 
if __name__ == '__main__' :
    path = '/H3C_Backup'
    init()
    check_dir(path)
    check_file(path)

10个进程起飞，一共12T数据慢慢跑。

12个进程一起跑这cpu占用率有点高啊。

速度也不算慢，一小会儿80个G了。

今早起来一看，传了10个T了，还在运行，等他慢慢弄完把。

待优化：
1、写法待优化
2、触发方式待优化
3、用socket的tcp会不会比http更快？

另外，为什么多线程这么拉跨比单线程还慢？总感觉多进程有点浪费cpu资源。迅雷的下载方式又是如何实现的？
欢迎指教。

We. · 发表于 2020-12-25 22:10

moliy 发表于 2020-12-25 21:39
以前有个类似需求,是备份数据库和网站目录的,我那时候用bash写的,在A服务器遍历打包和导出后,直接用scp给传 ...

我考虑用http还是想着最后把这个工具集成到一个管理web上。期待以后发现更多的需求，再在这个工具上增加一些行为把。

We. · 发表于 2021-5-7 20:28

麻木不忍发表于 2021-5-6 22:13
你好，我是做运维的，监控摄像机IP测试和磁盘阵列硬盘、nvr的硬盘运营一段时间都损坏的，需要人为一个去看 ...

我最近也在搞这些玩意儿一起交流下，上了规模以后头疼死了，各种奇奇怪怪的问题。

pkni1230 · 发表于 2020-12-25 20:44

我也有这样的需求 379257839 加个好友一起研究

rex · 发表于 2020-12-25 20:50

关注一下，等代成品

coolsnake · 发表于 2020-12-25 20:51

高手，学习一下

We. · 发表于 2020-12-25 21:15

pkni1230 发表于 2020-12-25 20:44
我也有这样的需求 379257839 加个好友一起研究

嗯，加你了。一起讨论下思路

We. · 发表于 2020-12-25 21:16

rex 发表于 2020-12-25 20:50
关注一下，等代成品

明天没时间调了，后天应该能调，我弄好了叫你，下周上生产环境测试。

xjshuaishuai · 发表于 2020-12-25 21:20

不错，学习了，谢谢楼主分享！

wysyz · 发表于 2020-12-25 21:20

关注一下

moliy · 发表于 2020-12-25 21:39

以前有个类似需求,是备份数据库和网站目录的,我那时候用bash写的,在A服务器遍历打包和导出后,直接用scp给传到B服务器..添加到计划任务

qiqiniao · 发表于 2020-12-25 21:54

这个很实用，持续关注

帐号		自动登录	找回密码
密码			注册[Register]

[Python 转载] python文件自动同步备份v1.2【运维必备】2020/12/31

免费评分