SAS Namespace Submodels |
The Transform submodel is used to:
describe stored processes
represent ETL processes
define queries
schedule processes
give initialization information for software components.
Metadata Types |
The following metadata types, relevant to the Transform submodel, are classified as Abstract, Event, Query, Process, and Scheduling:
Abstract Metadata Types |
The AbstractJob, AbstractTransformation, and QueryClause metadata types are supertypes that aren't expected to be instantiated; they exist so that their subtypes will inherit appropriate attributes and associations.
Event Metadata Type |
The Event metadata type is used to describe conditions that must occur to drive other processes.
Query Metadata Types |
The Select metadata type represents a query process. The query is stored as text in the SourceCode association of the Select metadata object. The query may contain strings which should be replaced by a value. The Variable metadata type will contain information and associations which help determine which strings should be replaced and which value should be used.
The RowSelector and OrderByClause metadata types are used by the Transformation metadata type and subtypes to further qualify the transformation.
Process Metadata Types |
The process metadata types are used to define a process. A process may be a stored process. In this case, the code for the process is stored and additional associations give information about the inputs and outputs and where the process can be run. The process could also be a process in which an application will be generating the code, based upon the associated inputs and outputs and the location, or on the DeployedComponent that will be running the generated process.
Impact analysis is one of the value propositions to the way in which the metadata is defined. The metadata for a process contains all of the information about the sources and the targets; therefore, if a change is made to any source, it is easy to identify the process and targets that might be impacted by the change.
The TransformationActivity metadata type represents a grouping of TransformationStep metadata objects. At this level, the TransformationSources and TransformationTargets associations represent the initial inputs to the activity and the final output of the activity. For more detailed information about what is happening within the activity, the application should drill down first to the TransformationStep objects, then the Transformation metadata objects.
A TransformationStep is a grouping of Transformation metadata objects. Transformation subtypes include ClassifierMap and Select (discussed under Query Metadata Types). A ClassifierMap shows the mapping between Classifier metadata types. Examples of Classifier metadata types include PhysicalTable and Report. A Classifier often has features, for example, a PhysicalTable has Columns. These features are mapped by using a FeatureMap metadata type. The StepPrecendence metadata type is used to show the order of steps within an activity. If a StepPrecendence object is not defined for a Transformation, then it is assumed that the steps may run in parallel.
The SyncStep and ConditionalPrecedence metadata types are used with the preceding types when modeling a workflow.
Scheduling Metadata Types |
Once a process has been defined and tested, the process can be scheduled. The Job metadata type groups TransformationActivity metadata objects into a runtime unit to be rescheduled. The JFJob metadata type represents a job which is scheduled in the LSF Job Flow.
Usage |
Case 1: Describe Stored Processes
The stored process begins with a ClassifierMap and has associations to the SourceCode that is to be run, the component(s) that can run the process (ComputeLocations), the inputs (ClassifierSources), and the outputs (ClassifierTargets) of the stored process.
Case 2: Represent ETL Processes
The ETL uses a ClassifierMap and the associated FeatureMaps to show the mapping of data through a process flow. The ClassifierMaps are entities that are grouped together in TransformationSteps. TransformationSteps are grouped into TransformationActivities. A TransformationActivity is grouped by a Job. The Job is the unit that defines the process which is to be run. The Job may be scheduled to run as a batch process, which is triggered by various external or internal events. The StepPrecedence and ConditionalPrecedence metadata types are used to show the order of TransformationSteps. Each level of the ETL process can have different locations where the process should be run. The ComputeLocations association is used here also to show the components which are capable of performing the process.
Case 3: Define Queries
A query uses the Select metadata type, which is a subtype of ClassifierMap, to define the SQL query. The query is stored as SourceCode and may contain substitution strings. The associated Variable objects will contain information about which string to replace and where to get the value that should be used. The Select object will also use the ClassifierSource and ClassifierTarget associations to document the inputs and outputs of this query.
Case 4: Schedule Processes
Any process defined in the metadata may be scheduled to run as a job. It is required that the process be part of a TransformationActivity. TransformationActivities may then be grouped together as a Job.
Case 5: Give Initialization Information for Software Components
A DeployedComponent (or its subtypes) may need initialization information that is used at startup. The DeployedComponent would have an associated InitProcess. The Transformation metadata type is usually used to represent this process, and startup information that is needed is associated to the Transformation using the TransformationSources association.
Copyright © SAS Institute Inc. All rights reserved.