Tuesday, 5 November 2019

Mail Archiving - What It Is And Why You Need It

Backups are a familiar concept and have been around almost as long as there has been data stored on computers. Outside the computer world, the concept is even older. People have been making copies of important documents for centuries. More recently this has been accomplished through carbon paper and photocopying. But for almost as long as there has been writing, man has made multiple copies of important information to protect it from loss due to unforeseen circumstances, whether it be war, natural disaster or political power shifts. Computer backups serve a similar purpose: In the event of an unforeseen problem - a fire in the data center or a crashed hard disk - we simply cannot afford to lose important information because recreating the information, if it can be recreated at all is time consuming and expensive.

While everybody pretty much agrees backups are a good thing, it is important to be aware of their limitations. A backup is a snapshot of the data at the time the backup was taken. That is, whatever data was there at the time the backup was made is what you will find in it later, nothing more and nothing less. And the purpose of the backup is also somewhat limited - to restore some or all data in the case of an unforeseen event. In some cases, a crashed hard disk for example, you need to restore all data. In other cases, a user who inadvertently deletes something important for example, that single something may be restored.

For practical reasons, there are limitations as to how often backups are made. Some systems cannot be backed up while the system is running so you want to limit the frequency so as not to limit its usability. And some systems are only capable of "full" backups i.e. you must copy all data, not just new data in the system. This puts a strain on storage systems since sooner or later you run out of disk space and have to start deleting old backups. It also tends to be wasteful since a large percentage of the data in a backup will be the same as many previous backups. If you doubt this statement, simply take a look at your Inbox and count the mail you received before today. If you are doing daily backups each email will be in every single backup since it arrived in your Inbox.

Some of this has been mitigated by some system's ability to do "incremental" backups, where only new data is backed up. But again, the main purpose of backups is to be able to restore a system to a known state, even if that state is not identical to the state of the data at the time of the unforeseen event.

Mail archiving is an entirely different beast. A bit depends upon why you are archiving which I'll get to in a minute, but an archive is not just a snapshot of the system, it is a cumulative view of the system. Every mail entering or leaving the system is put directly into the archive, and nothing is ever removed. So why, you are probably asking yourself, can't I just reassemble all of my old backups and get the same thing. The answer is that backups are made periodically, not constantly, because their purpose is to restore the system to a known state in the event of disaster. A mail could come into the system, be quickly deleted by receiver, and never make it into a backup. An archiving solution, on the other hand, dutifully adds the mail to the archive because it serves a different purpose.

This leads us to why you would want to archive mail. There are three main reasons for archiving mail. The first is disk space management. Most messaging systems limit the amount of data individual users may store in the system. Once that limit is reached, they must remove some data. But it is not always desirable to simply delete the data. Users may have mail that they access infrequently but still need to save for years. So rather than simply increasing user's space quotas (which by the way make backups even larger) on expensive storage devices, an archiving solution allows them to save this mail on less expensive disks, normally compressed to save even more space, but still gives them access to the information when needed.

The second reason is laws and regulations with funny names like SoX and HIPAA. Some companies and organizations are required to store all correspondence, including e-mail, for a given period of time. They could store backups for the proscribed time period but, as discussed above, this does not necessarily fulfill the "all correspondence" requirement.

The last reason is as an aid in legal proceedings. In the same way courts may require a company to hand over paper documents to opposing council, electronic documents such as E-mail Archiving are also being requested in what is now called "e-discovery". And again, like with paper documents, the non-existence of a given e-mail can be used to prove that such mail does not and never existed, provided the contents of the archiving system can be proven to include everything sent or received by the system. And this is where an archive is superior to a backup. For example, a mail from an employee to the CEO warning about faults in a product could be deleted by both and never end up in a backup. With archiving, you can prove this mail never existed.

An additional problem with backups in an e-discovery context has been demonstrated in several high profile lawsuits. Several companies have lost lawsuits simply because they could not perform e-discovery from backups within the court-appointed time frame. So even though they had backups going back years, even a decade, they could not load and search them fast enough to meet the deadline and lost the suits.

In summary, archiving solutions came into being to answer shortcomings with backups as the needs of companies and organizations changed. While there are some similarities between the two, they really provide solutions to two different problems. Backups serve the need to restore a system to a known state whereas archives serve the need to store perhaps all data ever seen by the system for efficient retrieval later. Of course, it should be possible to provide one piece of software capable of serving both these needs. Still, the problems solved are somewhat contradictory so you would end up with more complex software than you really need. At the same time, not everyone needs the one solution or the other so there are advantages to the encapsulation of functionality into different solutions. This encapsulation also allows each solution more freedom to evolve as needs evolve which is usually a good thing.

No comments:

Post a Comment