Glossary of terms relating to SAS Query Services

Attribute is a single piece of information related to an object.

Aggregation is a collection of values or objects that makes a whole. An aggregation can be a concrete or conceptual set of whole-part relationships among objects. An aggregate function performs a calculation on a column in a set of row and returns a single value. An aggregate query summarizes information from multiple row by including an aggregate function such as SUM or AVG.

Authentication is a process of validating a set of credentials, such as a userid and password, for obtaining access to some protected asset.

Axis is a set of tuples. Each tuple is a vector of members. A set of axes defines the coordinates of a multidimensional data set.

BriefInformationMap is a thin object also referred to as a smart object that represents a SAS Query Services' information map in a metadata repository.

Business Model is a persistable container of business items where the items are defined based on a context of physical metadata and are meant to be used in specifying queries. Each business item provides for specifying a portion of a query – which are meant to be in terms that are more user-centric rather than how the data is necessarily stored. There are two concrete types of business models: the information map and data selection..There are three categories of objects contained within a business model for modeling their data. They are data sources, business rules, and business items.

Business Model Object Inter-Model ID is unique within the chain of business models it's object's business model is based upon. This ID is made up of the business model's ID concatenated with a "." concatenated with the intra-model ID for the object. Essentially this view of the ID adds the business model's ID to provide a view of the ID that is unique within the environment (the chain of business models being used).

Business items provide a layer of business information above the physical information that adds the logical layer for shielding, sharing and transparency from the physical information.

Business Query models the information necessary to define and manipulate a query based on business items contained in a business model. A data selection is an example of a concrete type of business query.

Business rule refers to information describing how to use the data from the data sources. For example, how to join tables or limit (govern) the amount of output of data.

By-group, by-grouping
See slicing.

Cardinality refers to the number of members in the set. When specifically applied to database theory, the cardinality of a table refers to the number of rows (or tuples) contained in a table. The cardinality of a relationship defines how many instances of each entity relate to each other.

Calculated member is a member of a dimension whose value is determined from other members' values (for example, by application of a mathematical or logical operation). Calculated members may be part of the OLAP server database or may have been specified by the user during an interactive session. A calculated member is any member that is not an input member.

Column refers to a vertical set of data values. Many times this is referred to in the context of how data is organized within a database ... such as a Table with columns in it. Other times this is referred to in the context of how data is visually displayed to the user (for example, after queries are run). On column contains data of one and the same kind.

Crosstabs are a way of displaying aggregated data. OLAP or "ROLAP" queries may produce crosstabs. Below is an example of an interactive OLAP crosstab. See also relational crosstabs.

Cube is a set of data that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures.

Database is a collection of information organized into interrelated tables of data and specifications of data objects. It is a collection of tables, with related data.

Data items provides for mapping of physical data to a single unit of business data. It adds the ability to describe things like formatting and ordering. It represents the finest granularity that can be used for defining a result set. This is essentially equivalent to a column in a result set.

Data item reference is essentially a reference to a data item along with some set of overridden attributes of the base data item which can be a custom data item or another data item reference. Data item references allow overriding the functionality of certain attributes of data items and is useful when creating queries that need tweaks to the base data items.

Data Selection is a modeled query specification based upon business metadata defined in information maps. They are passed to the Data Services portion of SAS Query Services in order to run the query and obtain a result set containing the data for the query results.

Data Source describes the data that is to be accessed and information on how to access it. In IQ, a data source is a table, view or cube along with information on how to access the database object. For tables and views, the access information is composed of the server and SASLibrary to access the table or view. For cubes, it is composed of the server and OLAP schema to access the cube.

Descendent is a member in a dimension hierarchy that is related to a member of a higher level within the same dimension. For example, in a Time dimension containing the levels Year, Quarter, Month, and Day, January is a descendant of 1997.

Dimension is a structural attribute of a cube, which is an organized hierarchy of categories (levels) that describe data in the fact table. These categories typically describe a similar set of members upon which the user wants to base an analysis. For example, a geography dimension might include levels for Country, Region, State or Province, and City.

Drill down/drill up is technique for navigating through levels of data ranging from the most summarized (up) to the most detailed (down). For example, when viewing the details of sales data by year, a user can drill down to display sales data by quarter, and further to display data by month.

Entity is a single object about which data can be stored. It is the "subject" of a table. Entities and their interrelationships are modeled through the use of entity-relationship diagrams.

Entity integrity is a state in which all the rows in a database have a not-null primary key value, all tables have primary keys, and no table has any duplicate primary key values. This ensures that there are no duplicate entries for anything represented in the database.

Expression is a combination of symbols and operators that evaluate to a single data value. Simple expressions can be a constant, variable, column, or scalar function. Complex expressions are one or more simple expressions connected by operators.

Extended Attribute is a piece of metadata about an existing object that is not part of the standard metadata for that object. For example, the standard metadata for a column object contains attributes like name, length, type, format, etc.

Executing a query is the term used for transforming a data selection query and submitting it to be processed by a data server. The result of this is a result set (either tabular or multi-dimensional).

Field is a single unit of data stored as part of a database record. Each record is made up of one or more fields, which correspond to the columns in a database table. Fields are often also referred to as attributes.

Filters provides a means for restricting the set of data returned from queries.

Filtering is the ability to restrict data based upon criteria set in the WHERE clause of an SQL or a MDX statement.

Flat files are data files that contain records with no structured relationships.

Foreign key is a field in a relational table that matches the primary key column of another table. The foreign key can be used to cross-reference tables. A foreign key is the linking pin between two tables. Given a value from a row in one table you can access another table to find the right row with related data.

Formatting A mechanism for converting data values into character strings of a specific pattern or format.

Format Definition A set of information that provides the rules on how data values are to be converted into formatted values. There are two providers of format definitions: SAS and users. That is, there are SAS-defined formats and User-defined formats. For example, the COMMAw.d SAS-defined format provides for formatting numeric values into character strings with commas that separate every three digits and a period that separates the decimal fraction.

Format or Format Value A metadata value that when applied to data values causes the data to be "formatted" into character strings of the form specified by the format instance. A format value provides further information than the format definition on how the data is to be formatted like character width constraints and such. For example, the data value of 1254.71 can be converted to a formatted value of " $1,254.71" using a DOLLAR10.2 format value. The DOLLARw.d format definition is SAS-supplied.

Formatted Value A data value that has been converted to a character string of a specific pattern according to the rules specified in the format applied to the data value and the format's definition. Usually width information is specified in the format value and general pattern information is specified in the format's definition.

Governing is a business rule, which is used to limit the output of a business query when the query is run.

HAVING is an SQL clause that specifies data filtering conditions similar to the WHERE clause but operates after aggregate functions have been calculated.

Hierarchy is logical tree structure that organizes the members of a dimension such that each member has one parent member and zero or more child members.

Hierarchized - members are first ranked in an OLAP information map according to position within the hierarchy, then according to each level involved, based upon a string or numeric expression.

Join is a SQL JOIN statement is used to combine the data contained in two relational database tables based upon a common attribute.

GROUP BY groups together similar rows; usually for purpose of applying an aggregate function.

Index resembles an index at the back of a book. Using a keyword you can find the right page number of rows with a table.

Information Map (“map” for short) is a business model. Not only does it allow for modeling business metadata but it provides a few other things that a business model does not. It will provide for organizing the contained business items within subfolders in the map. Subfolders within maps are called “information map folders”. It will provide for storage in a folder within a repository root folder in an OMA repository. Folders within a repository root folder extend the FolderInterface class managed by SAS Foundation Services’ Information Service.

Information Map Folder is a folder used for organizing business items within an information map.

Key is a database key is a attribute utilized to sort and/or identify data in some manner. Each table has a primary key which uniquely identifies records. Foreign keys are utilized to cross-reference data between relational tables. Keys are usually annotated as PK - primary key; FK - foreign key; and AK - alternative key (an AK is a column that meets the requirements of a PK but is not designated as such; it is a good candidate for sorts and indexes). The primary key column(s) is designated with a 'key' icon to the left as shown below:

SAS Foundation Services promote sharing through the utilization of global-, user-, and session-scoped resources where appropriate. At the same time, partitioning by access context (for example, privileged-user and group/user membership) will be preserved across protected resource pools.

A variety of basic platform service categories are

Discovery Service – the discovery service will support the late binding of a client to a deployed service implementation.
Session Management Service – this service will provide a control structure for maintaining state within a bounded session and facilitate resource management and context passing.
User Service – the user service will provide for the identification of a user and the storage and retrieval of user-specific information (the concept of "user" can be extended to groups of users, or "roles").
Security Services – there are a variety of services in the security area that must be provided by the platform. These include support for an authenticated session login, propagation of user identity context across distributed security domains, and protected-resource access policy administration and enforcement.
Event Services – a variety of asynchronous notification and request management services will be provided to support dynamic, event-driven, processes
Information Services – in order to exploit storage environments on an enterprise scale, information services across a variety of repository implementations are required.
Communication Services – connection management and protocol services.

Logging – activity/error tracing; file-based, for debugging purposes

JOIN combines two tables with an inner join; displays only values from each table that match;

Normalization is the process of structuring relational database schema such that most ambiguity is removed.

Many-to-many relationships occur when each record in TableA may have many linked records in TableB and vice-versa. For example, one newspaper has many readers and one reader reads many newspapers.

MDX which stands for Multi-Dimensional Expression is a declarative query language used to define and manipulate multidimensional objects and data.

Metadata is literally "data about data." This term refers to information about data itself -- perhaps the origin, size, formatting or other characteristics of a data item. In the database field, metadata is essential to understanding and interpreting the contents of a data warehouse.

Member is an item in a dimension representing one or more occurrences of data. A member can be either unique or nonunique. For example, 1997 and 1998n represent unique members in the year level of a time dimension, whereas January represents non-unique members in the month level because there can be more than one January in the time dimension if it contains data for more than one year.

Measure is a set of values that are based on a column in the cube's fact table and are usually numeric. Measures are the central values that are aggregated and analyzed.

Multidimensional Structure is database paradigm that treats data not as relational tables and columns, but as information cubes that contain dimension and summary data in cells. Each cell is addressed by a set of coordinates that specify a position in the structure's dimensions. For example, the cell at coordinates {SALES, 1997, WASHINGTON, SOFTWARE} would contain the summary of software sales in Washington in 1997.

Multi-dimensional results are query results which are usually displayed as crosstabs. Both OLAP and "ROLAP" queries produce multi-dimensional results.

Nonhierarchized - ranks the members of the set, but without regard to hierarchy.

Ordered set is set of members returned in a specific order. The ORDER function in a Multidimensional Expressions (MDX) query returns an ordered set

ORDER BY is a query clause specifying the sort order of the result set.

OLAP allows for the real-time analysis of data stored in a database. The OLAP server is normally a separate component that contains specialized algorithms and indexing tools to efficiently process data mining tasks with minimal impact on database performance.

One-to-one relationships occur when there is exactly one record in TableA that corresponds to exactly one record in TableB.

One-to-many relationships occur when each record in TableA may have many linked records in TableB but each record in TableB may have only one corresponding record in TableA.

PathUrl is a repository path to a MetadataInterface object located in or below a repository root folder. This path is similar to a filepath in a file system.

Persistence is saving of an object definition so it will be available after the current session ends.

Primary key of a relational table uniquely identifies each record in the table. It can either be a normal attribute that is guaranteed to be unique or it can be generated by the DBMS. Primary keys may consist of a single attribute (for example, column) or multiple attributes in combination. Imagine we have a student records database that contains three tables. The first table, STUDENTS, contains a record for each student at the university. The second table, CLASSES, contains a record for each class session offered. The third table, ENROLLMENT, contains student enrollment records (for example, each record represents a single student enrolling in a single course). There would be multiple records for each student (representing all the classes that student is enrolled in) and multiple records for each class session (representing all the students enrolled in that class). A student's unique student ID number would be a good choice for a primary key in the STUDENTS table.

Property is a named attribute of a control, field, or database object that you set to define one of the object's characteristics (such as size, color, or screen location) or an aspect of its behavior (such as whether it is hidden).

Prompt is associated with a Data Item or Filter (or other items) will result in the request of further information. Itprovides the ability to specify information or data values at runtime. This information can be categorized as:

o Values

§ A constant value to be entered by the user. For example: Age > ___

§ Values can come from:

· Pick List of predefined values (that is, Lookup Values)

· User specified

o Data Items

§ The choice of a Data Item, to be entered by the user: For example ___ > 30

§ Values should come from a pick list of the available Data Items in the Collection.

Query is the primary mechanism for retrieving information from a database and consist of questions presented to the database in a predefined format using either SQL (relational tables) or MDX (OLAP tables).

Ragged Hierarchy is a dimension hierarchy in which one or more levels do not contain members in one or more branches of the hierarchy. For example, the state or province level in a geography hierarchy contains no members for countries or regions that do not have states or provinces.

Ranking is a way of filtering data based on values. An example could be to return only the top 5 sales figures.

Relational is a collection of information organized in tables. Each table models a class of objects of interest to the organization (for example, Customers , Parts , Suppliers ). Each column in a table models an attribute of the object (for example, LastName , Price , Color ). Each row in a table represents one entity in the class of objects modeled by the table (for example, the customer name John Smith or the part number 1346). Queries can use data from one table to find related data in other tables.

Relational crosstabs are summarizations of relational data in a crosstab, for example:

Relationship is a link between tables that references the primary key in one table to a foreign key in another table. The relationship line is represented in a database diagram by a solid line if referential integrity between the tables is enforced, or a dashed line if referential integrity is not enforced for INSERT and UPDATE transactions. The endpoints of a relationship line show a primary key symbol to denote a primary key-to-foreign key relationship, or they show an infinity symbol to denote the foreign key side of a one-to-many relationship. In Metadata, a relationship is an association between a pair of objects, where one object is an origin and the other object is a destination. The association repeats for each subsequent pair of objects, so that the destination of one relationship becomes the origin in the next relationship. In this way, all objects in an information model are associated through a chain of relationships that extend from one object to the next throughout the information model.

Record consists of one set of tuples for a given table. In a relational database, records correspond to rows in each table. In a OLAP database, records corresponds to the intersection of columns and row in a table.

Recursive relationship refers to a relationship established within a table. In other words, a table is 'related' to itself. Recursive relationships are usually established in situations where it is useful to perform a self-join. The classic example of a self-join is the 'employee-manager' report. In this situation, the 'employees' table contains name, ID and 'reports to' columns. The 'reports to' column contains the ID (not the name) of the employee's manager. In order to prepare a report that lists employees and their managers by their names, a self-join is required.

Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table.

Resources used when querying are server connections and result sets. These resources are often what limits the amount of work an application can do.

Result items are data items that are to be used to define columns, tuples, etc. which are returned in the result set when a business query is executed.

Result set is a set of rows returned from a SELECT statement. The format of the rows in the result set is defined by the column-list of the SELECT statement.

"ROLAP" is a term that is sometimes used to describe relational crosstabs. This is somewhat misleading, since relational crosstabs do not support the same interactive features (drill, expand) as OLAP, and don't support advanced filters.

Row consists of one set of attributes (or one tuple) corresponding to one instance of the entity that a table schema describes.

SAS Query Services is a set of components that provide robust query functionality resulting. A key aspect of this functionality is metadata-driven queries. It includes components that leverage the information provided by the metadata, components for creating and updated this metadata, as well as ad-hoc query capabilities.

Schema is a collection of meta-data that describes the relations in a database. A schema can be simply described as the "layout" of a database or the blueprint that outlines the way data is organized into tables.

SELECT is a statement in SQL or MDX that is the primary mechanism for retrieving information from a database.

Examples: Given the following table:

Members

ID	LastName	Age
1	Smith	25
2	Jones	42
3	Reynolds	36

This SQL statement:

SELECT LastName
FROM Members
WHERE Age>30

Would produce the following results:

LastName

Jones

Reynolds

Server connections are network connections to the back-end data servers such as workspace servers or OLAP servers.

ServerProperties contain a set of function and operator definitions along with attributes describing a particular type of data server supported by SAS Query Services.

Simple table is a table of columns and rows. See slicing for examples of simple tables.

SQL is a structured query language is an industry-standard language used for manipulation of data in a relational database. The major SQL commands of interest to database users are SELECT, INSERT, JOIN and UPDATE.

Slice is a subset of the data in a cube, specified by limiting one or more dimensions by members of the dimension. For example, set of data items or a data selection containing age and name data items might be sliced by age into individual data selections returning only rows with one particular age, for example, "12":

Sort Order is a set of rules in a collation that define how characters are evaluated in comparison operations and the sequence in which they are sorted.

Step is an operation that modifies the state of an OLAP query.

Table in a relational or OLAP database is a predefined format of rows and columns that define an entity.

Tabular results are results which are usually displayed as simple tables of columns and rows.

Tuple is a term from set theory which refers to a collection of one or more attributes.

Union combines the results from one or more queries.

WHERE is an SQL & MDX clause that specifies data filtering conditions used for row selection when executing queries.

Workspace servers are data servers capable of processing SAS code and information map stored processes.

XML which stands for Extensible Markup Language is a text markup language for interchange of structured data.