• Print  |
  • Feedback  |

FOCUS AREAS

Hot Topics

Related Links

Papers

Migration

Successful Migration

Q&A with a SAS®9 Early Adopter

In this case study, SAS R&D director David Shamlin interviews SAS software manager Don Williams about his migration experience. Williams was the project leader for a migration of the internal defect tracking system at SAS Institute.

The migration took place during the development of SAS 9. This is a standard SAS practice that uncovers bugs under real-life usage. SAS management defines a point during development when the software is stable enough to put into production, but developers still have plenty of time to fix any bugs that are found.


Objectives and Scope of This Document

Numerous planning ideas and best practices are examined. Although the application was migrated from SAS 8.2 to SAS 9.0, the information is applicable to all releases of SAS 9.

The application was migrated not just to a later release of SAS, but also to a newer operating system version and to 64-bit hardware. But regardless of the project's complexity, any migration team can benefit from reading about Williams's experience.

Note that when SAS data libraries were migrated, the team did not have access to the new MIGRATE procedure with validation tools, which is available beginning in SAS 9.1.


Description of the DEFECTS Application

Shamlin: What is the purpose of the application?

Williams: The DEFECTS system is essentially a transaction-based system that employs a variety of technologies to enable SAS staff to enter, track, and matriculate defects (bugs) and other quality issues discovered in our SAS software products, documentation, and internal tools.

Shamlin: What SAS products are used?

Williams: Base SAS, SAS/SHARE, SAS/AF, and SAS/IntrNet.

Shamlin: What third-party products are used?

Williams: Apache Tomcat Web servers and containers. Perl and Korn scripting languages.

Shamlin: Organizationally who are the users, and how large is the user base?

Williams: This global system is utilized by R&D, Sales and Marketing, Publications, and Technical Support staffs. The system has roughly 2000 registered users, most of whom communicate with the system via the company’s intranet.

Shamlin: Quantify the data volume and client usage load.

Williams: It’s rare to find a minute when users aren't accessing our data. We have one of the most open environments you can imagine. We have to; it touches too many parts of our company not to. Users access our data from around the world, 24 hours a day, seven days a week.

This translates to thousands of daily transactions including approximately 200 adds, 700 updates, and 4-5 thousand signals. Probably on the order of many tens of thousands of queries per day.

Shamlin: What are typical work flows/usage scenarios?

Williams: Connections to the data from this system include access through traditional SAS sessions (DATA steps and SAS procedures), and from SQL and JDBC. Connections are made across multiple hardware/operating system configurations and across multiple versions of SAS. Because of the system's role in our R&D processes and the SAS affiliated offices throughout the world, the data is almost constantly being updated or queried.

As noted above, users add and update defects through our company’s intranet. Transactions are generated and passed to an SAS/SCL process which executes as a server process. This SCL process polls for transactions, and when transactions arrive in an input queue they are evaluated, validated, and then processed. Once the transaction has been applied, the transaction is passed to one of a set of signal processes. These are SAS/SCL processes again executing as servers—they poll for transactions from an input queue (in this case a SAS catalog library). When a transaction is detected, the signal processor compares the transaction to a SAS data set of signal subscriptions. For every user whose signal subscription matches the transaction, an e-mail signal is sent. This mechanism contacts the appropriate interested parties whenever a defect is added or updated.

A myriad of reports and queries from this data are made each day with a majority of them being real-time and Web-based. Frequently this involves Web pages rendered to a user’s HTML browser after the request was sent to an SAS/IntrNet application server. Multiple SAS products and procedures are used in generation of these rendered Web pages.


Illustration of the DEFECTS Workflow

The DEFECTS staff provided the following illustration of the application's workflow.

Illustration of the DEFECTS application's workflow


Migration Specifics

Shamlin: Describe the source and target environments involved in the migration.

Williams:

Shamlin: Can you quantify the SAS files and third-party files that were moved to the target?

Williams: In terms of SAS libraries, we use about 10 GB of disk space. We use 6 GB for third-party files. We have five SAS libraries, hundreds of SAS data sets, several hundred thousand SAS catalogs, and millions of SLIST catalog entries. We have a few hundred thousand native files (for example, test programs to replicate the defect, screen captures).


Initial Objectives for Migration

Shamlin: Why did you upgrade to SAS 9?

Williams: We purposefully migrated DEFECTS to pre-production SAS 9 software in order to validate the robustness of SAS. Moving DEFECTS to the development release during final development stages identifies core problems in the software related to high-demand applications that stress a large number of SAS components. This is an excellent test of the software at a point early enough in the development cycle when we can flush out bugs and fix them before the release.

Shamlin: What was your schedule?

Williams: There were 3-4 months of modest planning and then, after securing management approval and backing, a couple of weeks of incremental migration. The schedule for the migration took into account some pre-scheduled server outages—so we would have the option of troubleshooting during the outages, or the migration team could have down time.

Shamlin: How did you budget for migration?

Williams: When thinking about what it would cost to complete the migration, I had to imagine the tasks involved, who would have to tend to each, and approximately how long it would take (i.e., typical project planning cost assessments). You can't predict how many errors you will encounter or how much effort each will take to resolve, but you do need to allow for such events in your planning.


Migration Team Roles

Shamlin: Organizationally, who did you expect to be involved in the work?

Williams: Staff for the project included a core team of four roles:

Shamlin: Did you and your team have experience with previous SAS upgrades?

Williams: Yes. I had completed this type of migration with earlier versions of this system starting with Version 6 (from V5). Other team members had participated with me in the last migration of this system from 6.12 to 8.1, and from 8.1 to 8.2.

Shamlin: As the project reached completion, what did your real team look like?

Williams: Partly because we were working with pre-release SAS, it took a large set of persons with varying skills to make this migration happen. Much of that was dealing with debugging and troubleshooting, while some was HP-UX host and system administration skills. The bugs that were flushed out and fixed will help SAS 9 migration experiences in the field. The staff included

Shamlin: Whose involvement was critical?

Williams: Over years we've had a nice opportunity to tune for HP-UX 10.20, but we found that we needed one of our UNIX platform developers to talk us through tuning to 11.11. For example, he helped us to set up DEFECTS to run with the UNIX kernel. Another UNIX developer helped to debug system errors.


Migration Strategy

Shamlin: What was the strategy?

Williams: We broke up the project into big chunks. Probably took a total of 3-4 days to complete each of the phases. Note that we executed in 32-bit mode for a couple of days before moving over to 64-bit.

Shamlin: You deconstructed the system into chunks?

Williams: The DEFECTS system is structured in such a way that we could convert it a segment at a time to run in 64-bit mode. So we tested it in 64-bit mode one chunk at a time. Then once we felt good about that chunk we could simply flip a virtual switch, and that part was then being used in the production application.

I asked myself: How much of a hybrid environment can we have? I considered it both with respect to the hardware/OS and the components that make up DEFECTS itself. The strategy was to go through a fluid migration where a piece at a time went from running production in the source environment to running production in the target environment. So there was a window of time where some portion of the production DEFECTS system was running in the target arena while the rest was running in the source arena. As time moved forward, more and more of the system moved to the target arena until the entire production system was running in the target environment.

Shamlin: How much of the planning was formalized? Step 1, step 2, step 3, etc.?

Williams: This was our checklist for migration:

  1. Suspend crontab table on HP-UX so that scheduled jobs are interrupted.
  2. Create the Apache server .htaccess file so that users other than our development team cannot access our Web pages—thus insulating users from executing our pages while the migration is in progress but allowing the development team access to validate the migration was successful.
  3. Quiese the signaling (e-mail) processes.
  4. Quiese the back-end transaction process.
  5. Stop the DEFECTS SAS/SHARE server.
  6. Migrate the DEFECTS libraries (which have both catalog and data set members) from the HP-UX 10.20 platform to our new HP-UX 11.11 platform.
  7. Restart the DEFECTS SAS/SHARE server.
  8. Restart the DEFECTS back-end transaction process.
  9. Test/evaluate the system as completely as possible by using our intranet clients (add, update, query page) plus examining output from our integrity reports.
  10. Remove the Apache server .htaccess file, thereby allowing users to return to the system.
  11. Return crontab table to normal and return system to operational status.

Shamlin: Any daily regularly scheduled events?

Williams: We had someone on the team analyze integrity reports every night for 15-20 minutes. We were looking for "silent server problems" but didn't find any. Another regular milestone is backups for rollback contingency.

Shamlin: What do you mean by "silent server problems"?

Williams: Silent problems are the worst. They are the ones that are inevitable and often unforeseeable. Silent server problems describe a class of bugs that are not directly flagged as an error by any part of the system or result in an exception being thrown. They tend to surface as data integrity problems and are sometimes due to user error as well as bugs in the system. Regardless, the integrity reports are valuable tools for identifying these kinds of problems. It’s very important to plan for extra resource time in case you do encounter these types of problems.

Shamlin: Please go into more detail about integrity reports.

Williams: The reports hierarchically cross-check possible relationships (multi-variable edits). We look at them every day to ensure data integrity. We have been for years. Integrity reports are posted online, noting errors in data, like duplicate or invalid values. Each day a designated staff member investigates and corrects the reported issues.

Shamlin: What was your strategy for rollback and recovery?

Williams: Good backup copies are necessary for rollback/recovery. We backed up in triplicate, in fact, and also between each incremental migration. Take a snapshot of data—snapshot of your catalogs, etc.—so you don't have to go all the way back to the beginning if a snag occurs. You can't be too safe with respect to backing things up each step of the way.

We did have some "trap doors" when we migrated the SAS libraries from V8 to V9 and when we changed operating systems. These are considered trap doors because once you've done it you really don't want to have to back them off. In our case, while it would have been possible to fall back, it would have been a last resort.

Shamlin: For minimizing downtime due to rollback, did you have specific strategies?

Williams: You've got to be able to turn off functionality of one piece and turn it back on incrementally. Almost like pulling the lever and letting it through. Every step was already done in pieces for validation reasons. We had our staff in place, because most problems occurred during the business day.


Training

Shamlin: Were there any training/education aspects to the migration?

Williams: Learning what was in SAS 9. And learning what it would take to go to HP-UX 11.11 in terms of resources and technology, researching whether one dedicated person with exceptional training in hardware and software could move a large application in a relatively acceptable timeframe.

Shamlin: Did you have to train anyone involved in migrating the application regarding the target environment?

Williams: Other than me, no. That was one of my tasks in the project.

Shamlin: Did you have to retrain any of the user base on how to use the migrated application because of functional differences?

Williams: No. Our system functioned exactly as it did before the migration.


Validation

Shamlin: How did you validate the migrated system?

Williams: Best to do testing during the business day rather than at night. We consciously chose to expose the majority of the user community to an unproven system at times to flush out problems ASAP. We chose to risk wide impact of problems to reduce cost of testing. In other words, we let users do the testing.

Shamlin: Any formal testing?

Williams: We checked the expected results of each step. This is basic testing common sense. As you complete each step in your process, check to see whether your deliverables are in the expected state. This applies to any pieces of the system that were touched during this step. Don't assume based on spot checking; a thorough check is best.

We did simple tests first. For example, what is the data movement? Can the data movement be done in stages? Test for loss of indexes, for example. It’s best to take samples and work with those for a day or two to gather results.

Shamlin: What tests did you run? Ad hoc or standard regression?

Williams: Our tests were to exercise the system. We'd enter transactions into the system and follow them through. We also did a significant amount of querying against the system to check for response time and accuracy.

Shamlin: Did you specify a quality expectation before beginning validation?

Williams: Not formally. The expectation was clear though—migrate the system to SAS 9 and to HP-UX 11.11 with the resulting system being at or above the previous level of quality, performance, availability, and reliability.

[Editor: See the validation topics of the Migration Community for more resources.]


Lessons Learned

Shamlin: What tasks were easier than expected?

Williams: Once we got past the debugging, the migration happened without a hitch. Once everything was planned and prepared, moving from 32 bit to 64 bit only took about 30 minutes. We're fortunate, we've got fast servers.

Planning and support is what takes the most amount of time. The time and effort to migrate the bulk of the SAS files from their SAS 8.2 format to the appropriate SAS 9 format was trivial (on the order of minutes). What made that possible was complete commitment and good planning organizationally.

Shamlin: Any problems you narrowly missed?

Williams: Part of the DEFECTS application has a lot of legacy SCL code, 10-20K lines, that no one has touched in a long time. We got a bit lucky with those. If an error had appeared in that source code, it would've been expensive to debug and fix.

Shamlin: What were the biggest costs?

Williams: Migrating to the new version of the operating system was the largest cost. We needed tuning information for the new host and how it interacts with SAS/SHARE and SAS in general. It's very important to know or gain intricate details about your hardware and your operating systems, especially if you are moving to a new environment.

[Editor: See a tuning topic for links to SAS partner pages. From there you can find white papers about tuning for optimal performance.]

Another large block of resources spent were by the IT team. The list of what was different or missing in the new OS version involved items like utilities not present, mail issues, kernel differences, and Perl scripting issues.

We had the good fortune to tune a dedicated server—meaning we didn't have to share it with other departments or processes. That allowed us to tune the UNIX kernel specifically to our needs.

Shamlin: How was production usage of the system impacted during the migration?

Williams: We did much of the migration during off-peak hours or weekends. We gave users notice of pending changes, so that if they had critical work they'd have good opportunity to use the system prior to any scheduled outages.


Best Practices and Advice

Shamlin: What were the most valuable resources you relied on?

Williams: Because SAS 9 was pre-production, we had expectations of finding problems, but we also had the good fortune of excellent internal SAS support (R&D, Technical Support, for example). But probably even more important, the benefit was that our team had management backing. The company was not going to ship the product (SAS 9) to our first customer until the application was stable. Our management wanted to use DEFECTS to test the new code going out to our users. It is an application that many developers, Technical Support, and others rely on to get their day's work done. If DEFECTS goes down, there is an impact to the entire work day. The application is important to everyone in the company.

There was an expectation by all parties that show-stopping problems would be encountered. There was a supporting agreement among management that resources would be applied immediately when critical problems arose. Acknowledging this expectation and honoring the agreement proved to be a major factor in overall success.

The metamessage here to users in the field is that successful migration was a direct result of complete organizational commitment to the process. In the DEFECTS case, once management committed, all resources within the R&D community made the migration top priority. We acknowledged going in that issues could arise to delay progress. So instead of allowing those to derail the overall organization’s day-to-day business, we agreed as a community to support the migration first, and sacrifice some other business in the short term. While this choice did result in some disruption to routine day-to-day activity, the end result was a shorter time to meet overall objectives than in the past.

Shamlin: Did you discover any other best practices?

Williams: The human factor, for example, some staff took a vacation prior so they wouldn't get burned out. Your staff needs to stay fresh. Optimal time for a problem is the beginning of day when you have the most resources. We had a license to force errors during peak usage, which comes back to management commitment again.

Another suggestion is to do some simple pilot testing first with the system to see what issues may appear. This involves doing quick and dirty migrations of isolated pieces of the system and/or partial samples of the data just to see what happens. Testing in stages gives you a feel for hurdles, trial runs, who’s going to be impacted?

You never want to disenfranchise someone, so keep everybody in the loop. You need to understand what to look for and the most likely negative events that will be encountered. And it’s important for end-users to be in agreement that the movement to a new release will be positive. It is vital you have a virtual team of experts. These include system administrators, Web administrators (e.g., our Apache Web server), scripting experts, SAS support experts (software, development, Tech Support), application support experts. This all requires just a moment of empowerment to get the virtual team together. Keep in mind that communication and quick response is very valuable in the movement to migrate.

Know the number of vars, obs, indexes in every table and check to make sure correct numbers brought over.

Shamlin: What did the experience teach you?

Williams: Preparation and planning are key.


Benefits of Migrating to SAS 9

Shamlin: Can you quantify the benefit of the migrated system?

Williams: As always, our first priority was to show that the new system maintained performance when compared to the previous system. We were pleased to not only meet that goal but also experience better throughput with queries and response times with 11.11 and SAS 9 than we did with 10.20 and SAS 8. In terms of RAM, disk and swap space, and other hardware issues, our prior 10.20 and new 11.11 nodes are similarly configured. So it's clear that properly configured 11.11 together with SAS 9 is faster.

Shamlin: Can you provide more specifics?

Williams: Processes that may have taken five seconds to complete were completing in three seconds instead. Essentially this was the measure of how fast each individual transaction was processed. Not very scientific, but it was enough for us to realize that HP-UX 11.11 along with SAS 9 improved performance.

[Editor: In formal studies with Hewlett-Packard, SAS 9.1 shows better performance than SAS 8.2 on most HP-UX 11.11i (PA-RISC) servers. The performance has improved because SAS 9.1 has been compiled to take advantage of the PA-RISC chipset that only HP-UX 11i supports, because the HP Math Libraries are now supported, and because several SAS procedures have introduced threading.]

Shamlin: Have you incorporated any new SAS 9 functionality?

Williams: The reality for most SAS customers is to first ensure that the migrated system won't break. Customers are conservative and afraid of change. The first questions are "Will it port? Will it migrate?" So just moving forward is of utmost importance.

Then the next priority is changing code to take advantage of new features. Since the migration, another team has replaced some of our SCL code with Java. This update will make maintenance easier and execution faster.


Conclusion

Williams indicates that "preparation and planning are key." We see this principle in action when the team encountered a stumbling block while trying to tune the new OS version. But due to their emphasis on planning and testing, the team was able to identify that need and bring in OS experts at a noncritical time. This case study shows that thorough planning, as well as management buy-in, can significantly contribute to a migration's success.