Act

The team at Groceryrama and GreenVillage can begin to act by designing a system for dealing with their data needs and executing the processes defined in that system. Start the design phase by taking stock of all of these different structures, formats, data sources, and data feeds in the define and discover phases of the data management methodology. Then you can create an environment that accommodates the needs of your business. In the design phase, you consolidate and coordinate your data management activities by concentrating on the following imperatives:
  • Consistency of rules: Ultimately, an organization needs one set of business rules that can be stored centrally but deployed across all data sources, applications, and lines of business.
  • Consistency of the data model: The data model is the single, definitive source for how your data maps to your business. Through the process of creating a well-structured data model, you identify the appropriate source systems and begin to reconcile multiple views, if required.
  • Consistency of business processes: During the planning stage, you identify processes that are potentially impacted. Now, the task is to provide consistency across these processes.
This is the time to gather teams of business analysts, data architects, and IT specialists and begin to make practical decisions about how the data will be organized and regulated.
For example, you need to make sure that the content of the data files works together. Do your customer names have same format? Or do some sources put the surname first and others put it last? Are all of dates in the same format? Do your product lists contain duplicated records? These questions and other like them can be addressed with the data quality and master data management functions in applications such as DataFlux Data Management Studio and SAS Master Data Management.
Another common data problem is choosing among similar, but not identical, rows of data. For example, you might have records in the supplier information tables for both of the merging entities that appear to refer to the same supplier. The supplier records for Acme Pork Products for one company spell out the name of the state where the supplier is located and include a postal code. However, the supplier records for the other company use a two-letter state code and omit the postal code. Which supplier records should the merged Groceryrama and GreenVillage enterprise use?
You can use the entity resolution tools in SAS Master Data Management to diagnose the extent of the problem. Then you can designate one survivor record for suppliers that you can use throughout the enterprise. Business users have established how the data and rules should be defined. The IT staff can now ensure that databases and applications comply with the definitions.
SAS Data Integration Studio is an Enterprise extract, transform, load (ETL) application that gives a big productivity boost to SAS coders doing data preparation and data management. It contains several features that help you get your data working together. First, you can register your data as metadata and group it into libraries of source and target tables. Then, you add the items in the libraries to job flows that enable you to perform the extract, transform, and load tasks at the core of data integration. Finally, you can deploy these jobs and schedule them for execution in batches. The application also supports related processes such as SQL queries, table loading, analytics, and reporting.
DataFlux Data Management Studio support data job flows and process job flows to improve data quality. DataFlux Data Management Studio is designed to work with DataFlux Data Management Server. Any authorized user can review and work with the jobs on the server. SAS Visual Process Orchestration adds the ability to integrate executable files from various systems into a single process flow. It enables you to build orchestration jobs, which are process jobs that run other jobs.
SAS Data Integration Studio and DataFlux Data Management Studio are often used by data management specialists. SAS Data Loader brings data management within reach of the general business user. SAS Data Loader simplifies the process of working with large distributed Hadoop data sources. Then it provides a series of wizard-based directives that help you perform tasks like transforming, profiling, and querying data in Hadoop
You can use SAS software to quickly and efficiently acquire data from a wide variety of data sources. For example, you can use SAS/ACCESS interfaces for critical sources such as the following:
  • Oracle, Sybase, DB2, and Microsoft
  • SQL Server, and Teradata databases
  • Hadoop and Impala data
  • data from enterprise resource planning applications such as SAS
Then, you can use the external file wizards in SAS Data Integration Studio acquire data from fixed-width, delimited, and user-written external files. You can also use SAS Data Integration Studio to register metadata from all of your data sources into libraries that are used in jobs. You can work with all of this data in your SAS applications.