Safeguarding Critical Data
Online backup and recovery protects National Weather Service operations from disaster
As an Agency of the National Oceanic and Atmospheric Administration (NOAA), the U.S. National Weather Service (NWS) provides weather, hydrologic and climate
forecasts and warnings for the United States, its territories, adjacent waters and ocean areas. Sophisticated computer models and high-speed communications systems are used to generate data, outlooks, forecasts and warnings. As the sole official voice for issuing U.S. warnings during life-threatening weather situations, business continuity is of paramount importance to achieving its mission.
NWS data and products also form a national information database and infrastructure that is used by other governmental agencies, the private sector, the public and the global community. Sending and receiving close to 400,000 weather bulletins each day, the NWS maintains the largest meteorological telecommunications switching center in the world. This data originates from weather offices around the country.
The NWS Southern Region encompasses one-quarter of the U.S. contiguous land, and is home to the world’s most active weather. Nearly 1,000 NWS Southern Region field employees work in 32 regional forecast offices, four river forecast centers, seven Center Weather Service Units, the Spaceflight Meteorology Group, the FAA Academy, weather service offices and regional headquarters. The headquarters location provides vital around-the-clock operational support to its field offices and manages programs, scientific enhancements, staffing and an annual budget of more than $100 million. Staff members oversee a multitude of technological developments and implementations, meteorological and hydrologic programs, quality control of field services and operational prioritization.
Given the nature of its operations, the NWS knows better than most just how devastating a natural disaster can be to business operations and communications. As such, the Southern Region’s technical team identified several shortcomings in its backup program that needed to be addressed in order to safeguard critical data. This data includes observational weather data that is managed in a custom Oracle-based application, such as data collected from weather radars and satellites, from data buoys’ marine observations and from surface observing systems that help the aviation industry. However, e-mail stores and financial information also must be protected, and nearly one-third of users run entirely from laptops, so it is important to provide for back up of their data as well.
The first shortcoming that the team identified was the amount of time required to recover lost data. The region’s system backups were made at its Fort Worth, TX, headquarters location, with tapes rotating offsite to an operational office just to the north of the facility. With data residing offline, recovery in the event of disaster or loss could not be completed faster than several hours or, in some cases, several days.
The team also wanted to reduce the amount of time required each day to deal with the system. In the past, one person spent a portion of every day working with the tape backups. It took at least an hour each day to change the autoloader, sequence the next system, check the tapes and validate the file system to make sure data could actually be restored from the tapes. Even then, they could not count on having a full, validated backup — if an overnight backup exceeded the capacity of the autoloader, someone had to change out the autoloader, finish the previous night’s backup and re-set it for that night’s backup when he or she arrived at the office the next morning.
There was also considerable overhead and maintenance, such as cleaning heads, associated with the tape-based backup system. Ten years ago, the system cost more than $175,000. Additionally, the NWS Southern Region was spending close to $1,000 per month for maintenance.
Finally, security of the physical tapes and the information residing on them was becoming increasingly important, both with respect to the security aspects of regulatory requirements and the implementation of data retention policies.
The National Weather Service’s IT team evaluated proposals from multiple vendors and service providers in order to identify the most comprehensive solution that would safeguard the agency’s critical data. A major obstacle during the evaluation of several solutions turned out to be the backup and recovery service for the agency’s
Netscape-based mail system. Since the Southern Region hosts mail services for nearly 1,000 users working in 41 offices, it’s a critical system — particularly for those employees who travel extensively and maintain a large amount of mail on the central server. The technical team found that many vendors considered the Netscape system ‘non-standard,’ and either would not support it or proposed additional customization fees that negatively impacted the total cost of ownership.
After reviewing all of the proposals, the agency ultimately chose an online backup and recovery system from DS3 DataVaulting. The service they selected is based on Asigra agentless software and enables automated, daily backups of critical data residing on multiple platforms ranging from HP-UX, Linux and Sun Solaris servers to Windows XP and Windows 2003 systems. In one step, data is automatically encrypted, backed up and vaulted in remote, fault-tolerant data centers. The service is also portable – any data can be restored to any user, anywhere. Another attractive feature is that the system captures software versions to ensure restoration of old files, so there is no concern about aging technology and tape formats that might not be readable in the future.The cost of this online solution was one-third lower than the nearest competitor, even with the additional functionality of D2D backup and recovery. This would be an important consideration for any business, but was particularly relevant for the very cost-conscious government agency.
Today, the NWS Southern Bureau can recover data within hours, sometimes minutes. Also, because the service can restore to anywhere with an Internet connection, business is protected even in the event of a wide-impact disaster or in the event of a catastrophic facility disaster. There is no longer any worry about retrieving tapes from a remote location that might also have suffered major damage.
Recently, a National Weather Service user lost a hard drive. The online service allowed him to be back up and running, with all data restored, in less than four hours. In the past, the recovery process would at best have taken five to six hours even if the backup was onsite — longer if the tape had already rotated offsite.
The new automated e-mail backup process is performing at least 10 times faster than what was achieved with tape. This improvement is particularly important for the e-mail system because it never really stops, so quick backups mean minimal interference to users. In addition, the Asigra message-level restore technology allows NWS system administrators to be very granular, picking up and restoring a single e-mail for a single user at any location.
One administrator returned from vacation after the new system was in place to find that several important e-mail messages were missing. However, the user was not sure of their exact content. Previously, the administrator would have had to retrieve the correct backup tape, load it (and possibly interrupt current processes), search through 1,000 e-mail accounts, identify and pull the right files, load them back into the server and restore them to the user. However, with the new system, the administrator was able to go directly to the backup and retrieve only the desired missing messages.
With the old system, the administrator would have to manually go to the systems and check the logs to determine the status of the backup. With the new system, the application generates e-mail daily to those who need to know what is happening with the backups. An added advantage with this type of notification is that, if the system fails over a weekend or holiday, the administrators will be notified right away, and are able to take immediate corrective actions instead of having a surprise waiting for them when they come back to work.
Finally, the online backup and recovery system enables password protection and 256-bit AES in-flight and at-rest encryption for added security in both data transfer and storage. These features enable the NWS to address the security aspects of regulatory requirements, and also to more easily implement data retention policies.
The National Weather Service Southern Region’s new online backup and recovery system has addressed all of the shortcomings that were originally identified by their technical team, allowing the agency to maintain business operations and to more quickly recover lost data. The ability to restore data to anywhere with an Internet connection truly meets the office’s Continuity of Operations Plan (COOP), which was something the tape-based system could not do in a timely manner. This, along with other features, makes the online data backup a more viable solution for dynamic data storage and retrieval for an enterprise.
Also, since the service enables scalability, the agency will be able to add capacity to address future growth as it is needed, or to enhance the speed and flexibility of the recovery system by adding multiple versions of backup data.
Mario Valverde is Chief of the Systems Integration Branch for the National Weather Service’s Southern Region. He may be reached at firstname.lastname@example.org