Skip to content

Backup Brief

What is a backup?

Backup refers to the duplication of data in the file system or database. In the event of an error or disaster, the effective data of the system can be restored in a timely manner and normal operation.

What are the backup methods?

  • Full backup: refers to a one-time copy of all files, folders or data in the hard disk or database. (Pros: the best, can recover data faster. Disadvantages: take up a larger hard disk space.)
  • Incremental backup: refers to the backup of the data updated after the last full backup or incremental backup. The process is like this, such as a full backup on the first day; a backup of the newly added data on the second day, as opposed to a full backup; on the third day, a backup of the newly added data on the basis of the second day. , Relative to the next day. And so on.
  • Differential backup: Refers to the backup of the changed files after the full backup. For example, a full backup on the first day; a backup of the new data on the second day; a backup of the new data from the second day to the third day on the third day; and a backup of all the new data from the second day to the fourth day on the fourth day. And so on.
  • Selective backup: Refers to backing up a part of the system.
  • Cold backup: refers to the backup when the system is in shutdown or maintenance state. The backed up data is exactly the same as the data in the system during this period.
  • Hot backup: Refers to the backup when the system is in normal operation. As the data in the system is updated at any time, the backed-up data has a certain lag relative to the real data of the system.
  • Remote backup: refers to backing up data in another geographic location to avoid data loss and service interruption caused by fire, natural disasters, theft, etc.

rsync in brief

On a server, I backed up the first partition to the second partition, which is commonly known as "Local backup." The specific backup tools are tar , dd , dump , cp, etc. can be achieved. But you shouldn't "put all of your eggs in the same basket." Once the hardware fails and cannot start normally, the data still cannot be retrieved. In order to solve the local backup For this problem, we introduced another kind of backup --- "remote backup".

Some people will say, can't I just use the tar or cp command on the first server and send it to the second server via scp or sftp?

In a production environment, the amount of data is relatively large. First of all, tar or cp consumes a lot of time and occupies system performance. Transmission via scp or sftp also occupies a lot of network bandwidth, which is not allowed in the actual production environment. Secondly, these commands or tools need to be manually entered by the administrator and need to be combined with the crontab of the scheduled task. However, the time set by crontab is not easy to grasp, and the set time is too short. For example, if it is executed once every 1 minute, it may happen that the first script is not executed, and the second script is executed again; the set time has passed For example, if it is executed once every 5 hours, there may be data loss because the data is not backed up in time.

Therefore, there needs to be a data backup in the production environment which needs to meet the following requirements:

  1. Backups transmitted over the network
  2. Real-time data file synchronization
  3. Less occupancy of system resources and higher efficiency

rsync appeared to meet the above needs. It uses the GNU open source license agreement. It is a fast incremental backup tool. The latest version is 3.2.3 (2020-08-06). You can visit [ Official website ] (https://rsync.samba.org/) for more information.

In terms of platform support, most Unix-like systems are supported, whether it is GNU/Linux or BSD. In addition, there are related rsync under the Windows platform, such as cwRsync.

The original rsync was maintained by the Australian programmer Andrew Tridgell (shown in Figure 1 below), and now it has been maintained by Wayne Davison (shown in Figure 2 below) ) For maintenance, you can go to github project address to get the information you want.

 Andrew Tridgell  Wayne Davison

Attention!

rsync itself is only an incremental backup tool and does not have the function of real-time data synchronization. It needs to be supplemented with another program. In addition to this, synchronization is one-way, and if you want two-way backup, you need to use another tool to achieve it.

Basic Principles and Features

How does rsync achieve efficient one-way data synchronization backup?

The core of rsync is its Checksum algorithm. If you are interested, you can go to How Rsync works and The rsync algorithm for more information, This section is beyond the author's competence and will not be covered too much.

The characteristics of rsync are:

  • The entire directory can be updated recursively;
  • Can selectively retain file synchronization attributes, such as hard link, soft link, owner, group, corresponding permissions, modification time, etc., and can retain some of the attributes;
  • Support two protocols for transmission, one is ssh protocol, the other is rsync protocol

Author: tianci li

Contributors: Steven Spencer

Back to top