Understanding Data Federation and Best Practices

Overview

Successful data federation projects require careful preparation and attention in two areas:
  • The data model
  • Data security
Proper understanding of your organization's data access needs is key to a successful data federation deployment. The data model will control the type of underlying work required at run time to satisfy requests for data. A poorly constructed data model can result in inefficient performance and perhaps incorrect results if the data is not well understood. Data security design should consider how your users will gain access to the back-end data. Will users be accessing the data under their individual authorizations? Will you establish one or more data owners who will act as data access proxies for the end users? There are several options to consider in the design phase so that SAS Federation Server can be properly configured.

Data Model

A good starting place is to identify your data sources, understand the relationships between different sets of data, and derive a data model that meets the needs of your business. This is the same type of background work that typically goes into a data warehousing project, where the end result is often a set of tables or views that are loaded in a data mart. But unlike data warehousing, with SAS Federation Server, the data does not need to be copied from the source into a data mart. Instead, the data can be fetched from their source locations and operated on in real time during the data requests. These operations need to be as efficient as possible, and this will require a well-thought data model.
Data caching should be considered when designing the data model. It can provide a vital role in optimizing query performance by pre-executing and storing intermediate results. This can be particularly useful for back-end data sources that have low availability, slow access speeds, expensive access fees, or contain data where real time values are not required. A data model can consist of a set of user-visible FedSQL views that are dependent on other restricted FedSQL views which can be dependent on back-end data (for example, tables and database views). FedSQL views are very similar to relational database views, except that the data can come from a heterogeneous set of data sources.
Any combination of FedSQL views can be cached, though the views must be definer’s rights views. The cached views can be refreshed on a periodic basis through the scheduler, or refreshed at any time through direct administration SQL commands. If certain data sets will be used frequently or involve complicated SQL operations to return the results, you will want to consider data caching for those results. Cached results can be joined with uncached data to provide quick responses to user queries.
Another consideration is to use the Memory Data Store (MDS) to store frequently used or temporary data. MDS allows tables to be stored in the memory of the server process. This allows for extremely fast data access performance. These tables must be managed manually and are automatically deleted when the server is shut down. FedSQL can join an MDS table with tables from any other data source, whether they are cached or uncached, including other MDS tables.

Data Security

Overview

In a federated system, data can be acquired from a large variety of data sources, and your users might not always have direct access to those systems. Even if they do, administering a large number of database credentials and setting up database privileges such that all users can access all data with their personal credentials can be burdensome. The Federation Server security capabilities provide some alternatives which can help ease security administration.

Invoker’s Rights Views

If your users already have direct access to the back-end systems and you want them to access the data under their individual or shared logins, then you can configure SAS Federation Server to access the data under the rights of the invoking user. This is done by creating invoker’s rights FedSQL views. When these views execute, any connections to back-end data will require that the invoking user have proper credentials to access the data. In addition, you will want to ensure that SAS Federation Server security settings allow invokers to access the required data. For example, if a FedSQL view is defined to select data from an Oracle™ data source and a DB2™ data source, the administrator will need to ensure that proper privileges are assigned to each view invoker on the Oracle™ and DB2™ data, in addition to the FedSQL view itself. The privileges can be granted to a group, or set of groups that the invokers are members of, instead of to each individual invoker.
Because the user must be able to directly access all of his data, you will need to secure each object for each user. And invoker’s rights views need to be sensitive to these security settings on underlying objects, particularly column level security. Consider a view that selects columns C1 and C2 from table T, for example,
SELECT C1, C2 FROM T
If table T has been secured through the Federation Server such that User1 does not have SELECT privilege on the table, then when User1 selects from the view, he will get an error as it attempts to access table T. You might consider denying SELECT privilege on the view to User1 in this case. Furthermore, if User2 has SELECT privilege on table T1 and column C1 but not on column C2, then User2 will get an error when selecting from the view. Putting column level security on the view to disallow SELECT privilege on column C2 to User2 will not help, since the underlying view definition specifically requires access to column C2 in table T. When creating views that require data from other objects that have column-level security, you might consider using the ‘*’ to select all columns, for example:
SELECT * FROM T
When the Federation Server is configured with SelectStarExpansion=VISIBLE, the ‘*’ will expand to all columns for which the invoking user has SELECT privilege. This enables you to create a single view usable by all invokers, yet each invoker will see the columns for which he has SELECT privilege only.

Definer’s Rights Views

If your users do not have direct access to the back-end systems, or you want to read objects under the authorizations of a single user, then you can configure the Federation Server to access the data under the rights of the defining user. This is done by creating definer’s rights FedSQL views. When these views are executed by any user, any connections to back-end data will be done under the definer of the view (the definer is actually the schema owner containing the view).
In this security model, you will generally deny access to the back-end data to all users except the view definer. Any column-, table- or row-level security will be set on the view itself. For example, if your view is defined as:
SELECT C1, C2 FROM T
You will want to ensure that the view definer has full access to columns C1 and C2 in table T. All other users do not require access to back-end data. And in many cases, if your users are interfacing only with exposed top-level objects of the data model, then you can deny privileges to those users on back-end objects. Individual Federation Server security settings can then be consolidated on just the exposed objects of the data model (usually FedSQL views). Users then access only the views; no other credentials are required. This greatly simplifies the security settings in the Federation Server and reduces administration on all the back-end data sources as well.

Mixed Models

Note that you can use a combination of approaches with both invoker’s and definer’s rights views intermingled. FedSQL views can be nested, and view types are honored. For example, you might create view V1 that is owned by Owner1 and data is accessed through the credentials of Owner1. However, if view V1 selects from view V2 which is another definer’s rights view but owned by Owner2, then all data within view V2 is accessed under the authorizations of Owner2. If view V2 selects from view V3 which is an invoker’s rights view, then any data within view V3 continues to be accessed by Owner2, who is the invoker of the view.

Dropping Objects

SAS Federation Server persists metadata for security settings in its set of system tables. The server attempts to synchronize this metadata with the actual objects that it represents. For example, if the administrator has granted SELECT privilege on table T1 to User1, and subsequently table T1 is dropped, the server will likewise drop its security metadata onto T1. If T1 is subsequently re-created, it will have no security set on the object itself. All security definitions for T1 will come from its inheritance objects, including the schema, catalog, data service and server.
If you prefer that the security metadata is retained on the server, there are a couple suggested possibilities. First, it might be that a job is dropping the table only to re-create it in a subsequent step. This is often the case when refreshing the contents of a table. Rather than dropping and re-creating the table (which often has the additional side-effect of losing any indexes on the table), consider issuing a DELETE FROM TABLE command to delete all the rows from the table. Here, the underlying table still remains intact, as do the security metadata stored in the system tables.
A second approach is to use definer’s rights views to assist with individual object security. Often times, data in back-end data sources is being manipulated (created and dropped) by a few “power” users only. These users will require CREATE, DROP TABLE, and VIEW privileges. The CREATE privilege set applies only to container objects (schema, catalog, data service, and server) and not individual tables or views. The DROP privilege can be applied at the object level, but if designed correctly, can be used from the container object.
More often, business users are the ones requiring table and column level privileges to access the data. If you choose the security model using definer's rights views, then you can place your individual privileges on the views, which do not go away when a back-end table is dropped and re-created. Back-end data can be secured by denying privileges on the container object to business users, but granting SELECT to view owners. An approach similar to this can be used to eliminate the need for table-, column- and row-level security on back-end data source objects.