Distributed Network Backups

One of the major problems we all run into is data storage. Now, this has become much less of a problem on the client PC end. It seems that these days, most client machines have massive amounts of unused storage space sitting on their hard drives. This isn’t typically the case for servers, however. So, you have an inverse problem where users who have tons of available space on their drives are all backing up their data, and, errr, music files, and, um video files to a server that has little room. It’s enough to the make any admin tear their hair out. Distributed backups may remedy that situation, sort of. Below you will find a myriad of different solutions. Some may be suitable for your situation. Just remember, that some of these solutions utilize networks outside of your lan environment. This means that security and redundancy become real issues. Just take that into consideration.

iFolder – This is a released software package from Sun Microsystems. “iFolder is a simple and secure storage solution that can increase your productivity by enabling you to back up, access and manage your personal files-from anywhere, at any time. Once you have installed iFolder, you simply save your files locally-as you have always done-and iFolder automatically updates the files on a network server and delivers them to the other machines you use.” Now this statement is kind of misleading. iFolder actually can work in two modes. It can sync to a central server or it can work in a peer to peer fashion without a central server. And these are accessible from outside the network.

FolderShare – This is Microsoft’s folder sharing utility. “FolderShareTM is a service that allows you to securely keep files synchronized between your devices, share files with friends or colleagues, and remotely download your files from any web browser. FolderShare consists of two components – My FolderShare and the FolderShare Satellite.” At first I thought this was an online syncing program to a central repository. But on closer inspection, it seems to be an online syncing program between two or more devices. I like that better.

Unison – Unison is a file-synchronization tool for Unix and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.

Unison shares a number of features with tools such as configuration management packages (CVS, PRCS, Subversion, BitKeeper, etc.), distributed filesystems (Coda, etc.), uni-directional mirroring utilities (rsync, etc.), and other synchronizers (Intellisync, Reconcile, etc). However, there are several points where it differs:

  1. Unison runs on both Windows and many flavors of Unix (Solaris, Linux, OS X, etc.) systems. Moreover, Unison works across platforms, allowing you to synchronize a Windows laptop with a Unix server, for example.
  2. Unlike simple mirroring or backup utilities, Unison can deal with updates to both replicas of a distributed directory structure. Updates that do not conflict are propagated automatically. Conflicting updates are detected and displayed.
  3. Unlike a distributed filesystem, Unison is a user-level program: there is no need to modify the kernel or to have superuser privileges on either host.
  4. Unison works between any pair of machines connected to the internet, communicating over either a direct socket link or tunneling over an encrypted ssh connection. It is careful with network bandwidth, and runs well over slow links such as PPP connections. Transfers of small updates to large files are optimized using a compression protocol similar to rsync.
  5. Unison is resilient to failure. It is careful to leave the replicas and its own private structures in a sensible state at all times, even in case of abnormal termination or communication failures.
  6. Unison has a clear and precise specification.
  7. Unison is free; full source code is available under the GNU Public License.

DIBS – “Since disk drives are cheap, backup should be cheap too. Of course it does not help to mirror your data by adding more disks to your own computer because a virus, fire, flood, power surge, robbery, etc. could still wipe out your local data center. Instead, you should give your files to peers (and in return store their files) so that if a catastrophe strikes your area, you can recover data from surviving peers. The Distributed Internet Backup System (DIBS) is designed to implement this vision.Note that DIBS is a backup system not a file sharing system like Napster, Gnutella, Kazaa, etc. In fact, DIBS encrypts all data transmissions so that the peers you exchange files with can not access your data.”

Features

Automated Backup

After initial configuration, DIBS is designed to run in the background and automatically backup desired data. Specifically, any files, directories, or links placed in the DIBS auto backup directory (usually ~/.dibs/autoBackup) are periodically examined by DIBS and sent to peers for backup. If the data changes, DIBS automatically unstores old versions and backs up changes.

Incremental Backup

DIBS performs incremental backup. Specifically, if DIBS is asked to backup a file (either automatically or by the user), and DIBS determines the file is already backed up and the file is unchanged, DIBS does not re-backup the file. This allows you to efficiently backup large numbers of files without wasting bandwidth by repeatedly backing up unchanged data.

Security

DIBS uses Gnu Privacy Guard (GPG) to encrypt and digitally sign all transactions. Thus you can be confident that even though you are sending your files to others for backup, your data will remain private. Furthermore, by using digital signatures, DIBS prevents others from impersonating you to store files with your peers.

Robustness

DIBS uses Reed-Solomon codes (a type of erasure correcting code similar to those used in RAID systems) to gain the maximum robustness for a given amount of redundancy. See the FAQ for a description of the benefits of Reed-Solomon codes.

Flexible Communication Modes

Since peers can have varying levels of connectivity to the network, DIBS offers different communication methods to support a variety of users.

  • active: In active mode, the sender directly connects to another peer over the network to transfer files and messages. This is the preferred mode for peers who are almost always directly connected to the network.
  • passive: In passive mode, the sender stores messages in a local queue and delivers them to the receiver whenever the receiver initiates a connection. This mode is required when the receiver is behind a firewall (and can not be contacted directly) or is not always connected to the network.
  • mail: If both the sender and receiver are behind firewalls or only occasionally connected to the network, direct connections between sender and receiver are not possible. However, the DIBS protocol can be used over email. The main drawback of mail mode is that your mail provider may get upset if you send large amounts of email.

P2PBackup – “A P2P system for backup. The concept is that you choose which data that will be back upped and share space on your own hard drive for others to put their backups on. The great thing about P2P is that you can spread out the backup on many computers, and if you are the paranoid type you could let many copies of the data exist, just in case some peer happens to be offline. The amount of data you can have backup on is of course proportional to the amount you let other use on your own computer for their backup, if you share 100GB you can have 5 x 20GB back upped data on the other computers in the system. The good thing is that you can decide how safe you want to be.”