SAS Disaster Recovery Policy
Disaster-recovery planning is important for any critical business system, including production systems running the SAS® Intelligence Platform and SAS® solutions. SAS customers should and usually do have disaster-recovery plans for their SAS deployments, SAS applications, and SAS data files. Because the implementation of the SAS Intelligence Platform and SAS solutions is often highly customized and each customer can have different requirements for replicating SAS content, there is no single tool or process that comprehensively meets all of the SAS disaster-recovery needs.
This position statement assumes that you are familiar with the concepts in “Best Practices for Backing Up and Restoring Your SAS Content” in the SAS® 9.4 Intelligence Platform: System Administration Guide, Fourth Edition.
Disaster recovery is not the same as high availability. Though both concepts are related to business continuity, high availability is about providing undisrupted continuity of operations whereas disaster recovery involves some amount of downtime, typically measured in days. This position statement addresses disaster recovery. SAS recommends that disaster recovery be predicated on regular, full system-image backups or system clones, and that disaster recovery processes be validated on a regular basis. The following approaches reflect supported methods for recovering SAS content in the event of a disaster.
Note: The approaches stated below are not mutually exclusive. More than one supported approach might be required to meet your disaster-recovery objectives.
A. Full-System Backups (also known as Disk Cloning or Disk Imaging)
Use disk cloning or disk imaging of all disks to create and maintain a full-system backup or clone of the production operating environment. As necessary, supplement regularly scheduled disk cloning/imaging with scheduled file-system backups, third-party database backups, and other backup mechanisms supported by your operating-system vendor.
Note: Throughout this position statement, the word “clone” means an exact copy of, at a minimum, the following from the production system:
- The operating system
- The operating environment. For example: User Identifiers (UIDs) and Group Identifiers (GIDs), environment variables, mount points, kernel settings, Windows registry, and so on
- All user home directories
- All third-party applications
- The SAS deployment (SASHome and all SAS configuration directories)
- SAS data files
- External data files (relational database management systems [RDBMSs] and other non-SAS data files)
If the SAS production environment is virtual or container-enabled, you can maintain a full-system clone using a cloning process supported by your virtualization/container software provider. If the SAS production environment is on bare metal, use a disk-cloning or system-imaging technology that is supported and accepted by the operating-system vendor.
Note: A copy of ONLY the SAS deployment is NOT considered a full-system clone.
B. SAS Batch Import and SAS Batch Export Tools
This approach uses SAS promotion tools running in batch mode to back up metadata-based SAS content. Under this approach, packages are exported (“backed up”) from production and imported into the disaster-recovery environment. Any physical content for which a metadata definition exists or can be created can be backed up using this approach. Under this approach, the disaster-recovery environment does not need to be a clone of the SAS production environment.
C. SAS® Migration Utility Packages
This approach uses a migration package created from production to configure a new disaster-recovery system after a failure occurs. The SAS Migration Utility creates a package of metadata and SAS content necessary for the SAS® Deployment Wizard to configure a new disaster-recovery environment1. SAS® 9.4 migration packages generally build quickly and without disruption to business processes. The SAS Migration Utility can be launched from a script that is scheduled to run in accordance with the Recovery Point Objective (RPO) that you have with your business. With respect to the SAS deployment, only the SAS installation2 would exist on the disaster-recovery system. Under this approach, the disaster-recovery environment must have the same topology as the production system. However, it does not need to be a clone of the SAS production system.
- Each approach has its own limitations. Several considerations for Option A are included in this position statement. Limitations of the other options are discussed in more detail in the documentation linked under each approach.
- Some SAS solutions use SAS data sets as the data store necessary to sync metadata content with content stored in third-party data providers. To be backed up, these data sets must be unlocked at the time the backup occurs. Alternatively, customers might choose to use industry standard replication software that offers synchronous data-replication options3.
- The SAS® Content Server, located on the SAS middle tier, maintains a lock when it is running. File-system backups created against a locked content server are not able to be recovered. Note: The SAS Migration Utility and the Deployment Backup and Recovery Tool are not affected by this lock because they use the JCR specification to create a backup of the SAS Content Server’s Java Content Repository.
- Disaster-recovery strategies that make use of the Deployment Backup and Recovery Tool are supported only if the disaster-recovery environment is a clone of the production system as discussed in Option A above.
The original intent of the tool was that it would be able to recover specific SAS and user content back onto the exact same hosts. The tool assumes that nothing other than user-related content has changed between the time of the last backup and the time of the recovery. For example, a simple change in the SAS® Web Infrastructure Platform Database password between backup and recovery causes the tool to fail to be able to recover content. Therefore, in order to use the Deployment Backup and Recovery Tool as a part of a disaster-recovery approach, the disaster-recovery operating environment must be a clone of the production system at all times. In addition, the machine names of the disaster-recovery system must match the machine names used throughout the deployment on the production system. These machine names must remain unchanged until the recovery has been validated on the target. The use of canonical names or aliases as the machine names is supported.
- When canonical names or aliases are used, it is important to use Domain Name System (DNS) names rather than local host names throughout the SAS environment. This includes the SAS Deployment Wizard, the SAS® Deployment Manager, SAS® Management Console, and any manual edits to configuration files.
- Using tools like sas-backup-metadata to back up metadata on production and then recover it onto the disaster-recovery system are only supported under Option A. Note: The Deployment Backup and Recovery Tool also calls these routines.
- When considering Option A, your business is responsible for ensuring that the operating environment of the disaster-recovery system is a clone of production at all times. Subtle differences between the disaster-recovery system and the production environment will cause either the recovery process to fail or the deployment tooling to fail to be able to maintain, administer, and back up the recovered system. If the administration tooling does not function on the disaster-recovery environment, you will not be able to recover your users back to a new production system when it becomes available. Examples of subtle differences that will result in failures to recover or execute include:
- deltas in the configuration of the kernel security modules (Example: SELinux, Solaris Trusted Extensions)
- deltas in Discretionary Access Controls (DACs)
- deltas in the way file systems such as tmp are mounted (Example: noexec)
- differences in UIDs/GIDs of any SAS accounts (or user accounts)
- different drive-letter mappings on Windows operating systems
- different operating-system patch-levels
- different environment variables
- different PATH variables
- different versions or locations of Java
- different versions, configurations, or locations of Java or third-party products and drivers used by SAS or SAS users
- Copying only the SAS deployment (the SASHome or SAS Configuration directories) from one system to another is not supported. In this scenario, SAS Deployment Manager tasks fail to be able to properly manage and maintain the deployment and the SAS Deployment Agent (necessary to perform future backups and middle-tier clustering). Copying the SAS deployment as part of Option A is supported as long as the disaster-recovery system remains a clone of the production system.
- You are responsible for ensuring that you have the appropriate processes, resources, holistic-system backups, skills, and competencies to guarantee your success in disaster-recovery activities. Recovery processes should be exercised and validated on a regular basis. Validation should include ensuring the following:
- deployment-tooling functions are successful
- backups via the Deployment Backup and Recovery Tool are successful
- SAS Deployment Manager updates passwords, rebuilds and redeploys web apps, updates certificates, and so on
A disaster-recovery plan for SAS environments needs to incorporate disaster-recovery procedures for the external systems and processes that SAS uses or depends upon. These external systems and processes might be as simple as a DNS server address or more complex such as another production application exchanging data with SAS processes. Additional considerations include:
- External customer data: Customer data is often located in databases or network-file systems that are external to the machines hosting the SAS deployment. Because SAS does not provide tools to back up or restore such data, you must consider data backups as part of your disaster-recovery plan. For example, SAS® Customer Intelligence solutions deliver a SAS Customer Intelligence Common Data Model that is created in the customer’s preferred third-party database. To successfully recover SAS Customer Intelligence solutions, the data needs to be recovered to the same point in time as the SAS system, so you must synchronize these data backups with SAS system-image backups.
- External systems and processes: SAS deployments frequently interact with other systems and processes. Thus, a disaster-recovery plan for SAS environments needs to incorporate disaster-recovery procedures for the external systems and processes that SAS uses or depends upon. These external systems and processes might be as simple as the use of a DNS server address or more complex such as another production application exchanging data with SAS processes. SAS does not provide tools to address such external systems and processes.
Solely relying on system imaging is a sufficient disaster-recovery plan for many customers; however, this plan depends on the disaster-recovery system being a clone of the production system. When a clone cannot be achieved, additional options are available. Each approach has its limitations and more than one supported approach might be required to meet the RPO and Recovery Time Objective (RTO) of your business. Your disaster-recovery procedures should be well documented, carefully validated, and exercised on a regular basis.
1 Many SAS solutions require additional, manual configuration steps specific to your environment. Work with your SAS Account Team to ensure that you have documented any additional “post-migration” steps that you will need to perform.
2 SAS installation means only the SAS installation binaries found under SASHome. It specifically does not include any content under the SAS configuration directory.
3 SAS does not make recommendations, promote, or provide support for the usage of third-party replication software or disk-imaging, disk-cloning, or virtualization software tools