Contents SAS IOM Data Provider Previous Next

Format Processing

The SAS IOM data provider supports SAS format processing. You can use any format that is defined within the context of the IOM workspace that you are using with the IOM provider, user written format, and SAS formats, when reading data.

Persisted Format Information

Format and informat information is associated with columns that are in a SAS data set. It is possible for this information to persist in a SAS data set.  You can discover which, if any, SAS formats or informats are associated with a column using the ISASColumnsInfo custom rowset interface. Along with some other SAS specific column metadata, this interface returns the names, lengths and decimal widths of formats/informats stored in the data set. See the specification of the ISASColumnsInfo interface for more information.

A format name that is returned by ISASColumnsInfo::GetColumnsInfo() might not be supported by the IOM provider (or not found in the list above). If you attempt to apply formatting services with a format that is not supported, a best fit strategy is applied based on the data type of the column. If this default behavior is not acceptable, it is up to you as the application writer to adjust accordingly if the persisted format/informat is not supported.

The COLUMNS schema rowset also returns format/informat metadata. Given your application needs, this might be a more suitable method. While the ISASColumnsInfo interface is only available on rowsets (or already opened data sets), the COLUMNS schema rowset can be used to learn about a table's column metadata without having to explicitly open the table via a rowset. Furthermore, the COLUMNS schema rowset is the only means available from ADO to obtain this metadata; being a custom interface, the ISASColumnsInfo interface cannot be integrated with ADO. You must look carefully at both the ISASColumnsInfo interface and the COLUMNS schema rowset in order to decide which best serves your purpose.

SASFORMAT Structure

SASExtensions.idl defines a structure that is named SASFORMAT. Instances of this structure can be used to bind columns using formatting. The basic strategy that is used to read rows of a SAS data set that has formatted columns is to create a SASFORMAT structure with the appropriate values and pass it to the provider on a IAccessor::CreateAccessor() call. We have overloaded the pBindExt member of the DBBINDING structure to point to instances of SASFORMAT on CreateAccessor() calls.

Fetching Data Using the Default Format

The simplest way to format data is to use the default format that is associated with a variable.  A SAS variable can have a format specification that is persisted in the data set itself.  In this case when you request that a column of data be returned using the default format, this persisted information is used.  If no such format information is stored on the data set file for a variable, then a best fit strategy is used for each column when the format is applied.  The actual choice that is made depends on the variable type.

To tell the provider that you want a particular column to be returned using its default format, create an instance of SASFORMAT. Set the wType member of this structure to SASFMT_FORMAT and the remaining members to zero or NULL, as appropriate. In general, a SASFORMAT instance that has these member values means default format.

Create a DBBINDEXT structure, which is defined by OLE DB and  is used to associate a SASFORMAT structure with a DBBINDING structure.  Set the pExtension member of the DBBINDEXT structure to be the address of your SASFORMAT structure; set the ulExtension member to one.

Finally, with the DBBINDING structure that you are using to bind the column that you want formatted, set the pBindExt member of the DBBINDING structure to the address of the DBBINDEXT structure you just built. Set the wType member of this DBBINDING structure to one of the OLE DB character types: DBTYPE_STR, DBTYPE_WSTR or DBTYPE_BSTR.  The provider will return data for this column that has the default format that is applied using the specified OLE DB type to physically represent the data in memory.

For example, suppose you have a data set with three columns: SALEDATE, QUANTITY, and PRICE.  Your objective is to fetch all three of these columns, using the default format that is associated with each. You know that you indicate this when you bind to each column. First you need a SASFORMAT structure that says that you want to use the default format.  You can reuse the same instance of this structure for all three columns as follows:

#include "SASExtensions.h"                                  // needed for SAS
                                                            //extensions to OLE DB
                                                            //(i.e., definition of the

SASFORMAT   formatOverride;
                                                            // SASFORMAT structure

memset( &formatOverride, 0, sizeof( formatOverride ) ); //Initialize all members
                                                            //to NULL

formatOverride.wType = SASFMT_FORMAT;                       // Set the override type
                                                            // to "formatting"

You need a DBBINDEXT structure to link your SASFORMAT structure to your DBBINDING structures. Again, in this example, you only need one DBBINDEXT structure that you can reuse for all your columns as follows:

DBBINDEXT bindingExtension;
bindingExtension.pExtension = (BYTE *) &formatOverride;
bindingExtension.ulExtension = 1;

Now you need to tie  bindingExtension to each column that we want to format. You are going to bind three columns so you need the following three DBBINDING structures:

DBBINDING rgBindings[3];

// We're going to use this structure to lay out how we're going to organize
// the data returned by the provider structure ACOLUMN
{
        DWORD           status;
        BSTR            value;
};
memset( rgBindings, 0, sizeof( rgBindings ) );
// defensive programming
// The following loop binds all the columns as DBTYPE_BSTR with default formatting requested.
for( ULONG i =   0;  i < sizeof(rgBindings); i++ )
{
//We bind to the columns in order skipping the self bookmark because it can not
// be formattedrgBindings[i].iOrdinal = i+1;   // We don't need the column length to be returned because we are asking for the // data back as BSTRs and that has a fixed length.  Asking for the status back
// is interesting though because applying a format to a particular data item could
// fail.

rgBindings[i].dwPart = DBPART_VALUE | DBPART_STATUS;
rgBindings[i].obValue = (i * sizeof(structure ACOLUMN)) + offsetof(structure ACOLUMN, value);
rgBindings[i].obStatus = (i * sizeof(structure ACOLUMN)) + offsetof(structure ACOLUMN, status);
// The client knows when it is done with each data item and can free it.

rgBindings[i].dwMemOwner = DBMEMOWNER_CLIENTOWNED;      
// We could use a different character data type, but then we would have to make
// sure and specify a maximum column width. This would be based on the width
// of the format being applied.

        rgBindings[i].wType = DBTYPE_BSTR;
// Here is our formatting  information. We can reuse the same DBBINDEXT and         
// SASFORMAT structure multiple times when we want when we want the default format.

rgBindings[i].pBindExt = &bindingExtension;
}

With the given array of DBBINDING structure, a call can be made to IAccessor::CreateAccessor() to create an HACCESSOR that will retrieve columns using SAS formats.  You should not use such an accessor for output operations like IRowsetChange::SetData() or IRowsetChange::InsertRow().  The IOM provider does not support the use of informats, so using such an HACCESSOR with these methods would cause unpredictable results.

Overriding the Default Format

Using a format other than the default format is not difficult. In this case, when you create a SASFORMAT structure, you need to specify a format name in addition to SASFMT_FORMAT for the wType member. You can also specify a format length (or width) and number of decimals (for numeric formats) if you want something other than the default values for the format that you chose.

For this example, use the following sample columns: SALEDATE, QUANTITY, and PRICE. For an application, you might want SALEDATE to be returned using MMDDYY8. formatting and PRICE to be returned with DOLLAR8.2.   The application might want to do some calculations on the PRICE column as well as display it using the DOLLAR format, so you must bind to it as a numeric value as well.   So Although you have three actual columns, you will have four bindings as follows:

The following code will create just such an array of DBBINDING structures:

DBBINDING rgBindings[4];                // We are going to create four bindings
DBBINDEXT rgBindingExtensions[2];       // Two of our bindings will have formatting
SASFORMAT rgFormatOverrides[2];         //      overrides
ULONG dwOffset;                         // Used to track the offset of a bound data point into
                                     //      an I/O buffertypedef structure                          // In this scenario we will want each binding to include
{                                       //      a status and value part. This structure will help
        DWORD    status;                //      us lay that out as we create each binding.
        BYTE     value[1];
} COLUMNDEF;

// Be defensive! Initialize all our structures.

memset( rgBindings, 0, sizeof( rgBindings ) );
memset( rgBindingExtensions, 0, sizeof( rgBindingExtensions ) );
memset( rgFormatOverrides, 0, sizeof( rgFormatOverrides ) );

// We start laying parts into our buffer at offset 0.  By the time

// we're done creating our bindings, dwOffset should be the total
// number of bytes needed to hold one row in memory.dwOffset = 0;
// Bind SALEDATE as DBTYPE_BSTR with the MMYYDD8. format applied.  The basic
// OLE DB binding info comes first.rgBindings[0].iOrdinal = 1;

rgBindings[0].dwPart = DBPART_VALUE | DBPART_STATUS;
rgBindings[0].obValue =  dwOffset + offsetof( COLUMNDEF, value );
rgBindings[0].obStatus =  dwOffset + offsetof( COLUMNDEF, status );
rgBindings[0].dwMemOwner = DBMEMOWNER_CLIENTOWNED;
rgBindings[0].wType = DBTYPE_BSTR;

// Now we set up our format override for this column.

rgFormatOverrides[0].wType = SASFMT_FORMAT;
rgFormatOverrides[0].pwszName = L"MMYYDD";
rgFormatOverrides[0].iLength = 8;

// Finally, we tie the SASFORMAT structure to the DBBINDING struct

rgBindingExtensions[0].pExtension = (BYTE *) &(rgFormatOverrides[0]);
rgBindingExtensions[0].ulExtension = 1;
rgBindings[0].pBindExt = &(rgBindingExtensions[0]);

// We increment our offset past the data points for this column.

dwOffset += sizeof(BSTR) + offsetof( COLUMNDATA, value );

// Bind QUANTITY as DBTYPE_R8. This case is simple because we just
// have to give basic OLE DB binding information.rgBindings[1].iOrdinal = 2;

rgBindings[1].dwPart = DBPART_VALUE | DBPART_STATUS;
rgBindings[1].obValue =  dwOffset + offsetof( COLUMNDEF, value );
rgBindings[1].obStatus =  dwOffset + offsetof( COLUMNDEF, status );
rgBindings[1].wType = DBTYPE_R8;dwOffset += sizeof(double) + offsetof( COLUMNDATA, value );

// Bind PRICE as DBTYPE_BSTR with the DOLLAR8.2 format applied.
// This is very much like the first binding we created.

rgBindings[2].iOrdinal = 3;
rgBindings[2].dwPart = DBPART_VALUE | DBPART_STATUS;
rgBindings[2].obValue =  dwOffset + offsetof( COLUMNDEF, value );
rgBindings[2].obStatus =  dwOffset + offsetof( COLUMNDEF, status );
rgBindings[2].dwMemOwner = DBMEMOWNER_CLIENTOWNED;
rgBindings[2].wType = DBTYPE_BSTR;

rgFormatOverrides[1].wType = SASFMT_FORMAT;
rgFormatOverrides[1].pwszName = L"DOLLAR";
rgFormatOverrides[1].iLength = 8;
rgFormatOverrides[1].iDecimals = 2;

rgBindingExtensions[1].pExtension = (BYTE *) &(rgFormatOverrides[0]);
rgBindingExtensions[1].ulExtension = 1;
rgBindings[2].pBindExt = &(rgBindingExtensions[0]);
dwOffset += sizeof(BSTR) + offsetof( COLUMNDATA, value );// Bind PRICE as DBTYPE_R8
rgBindings[3].iOrdinal = 3;
rgBindings[3].dwPart = DBPART_VALUE | DBPART_STATUS;
rgBindings[3].obValue =  dwOffset + offsetof( COLUMNDEF, value );
rgBindings[3].obStatus =  dwOffset + offsetof( COLUMNDEF, status );
rgBindings[3].wType = DBTYPE_R8;

// After execution, the following dwOffset will be the number of bytes
// the application will need to store a row of data when fetched using
// this binding information.dwOffset += sizeof(double) + offsetof( COLUMNDATA, value );

The DBPROP_SAS_FORMATS Custom Rowset Property

The SAS IOM data provider implements a custom rowset property that is called DBPROP_SAS_FORMATS.  When it is called from ADO, the IOM provider uses this property to access formatting features.  An application that calls the IOM provider directly through the OLE DB interfaces should NOT use the DBPROP_SAS_FORMATS property to format columns. Instead, the application should use the SASFORMAT structure with the DBBINDING extensions as previously defined. The method for formatting previously shown is optimized to work with the OLE DB interfaces much more efficiently than the DBPROP_SAS_FORMATS property.

The DBPROP_SAS_FORMATS property takes a string value. The string is consists of a comma separated list of column names, including any format overriding information that you want to associate with the column. You specify a column name and an optional override with a string of the following form

[+]COLUMN[=FORMAT[w].[d]]

where

Prefixing a column name with a plus sign (+) indicates that you want the column to be returned with the indicated format that is applied in addition to its unformatted form. You are asking for the one physical column of data to be exposed in the ADO rowset as two columns: one formatted and one not. When this is requested, the name of the formatted column is of the form COLUMN_FORMAT.

To format all the columns in a data set at once using the default or persisted formats, use the _ALL_ keyword.  When you give _ALL_ as the value of the DBPROP_FORMATS property, all columns are returned formatted.  If you want all columns returned in both their formatted and unformatted forms, you can use the plus sign (+) modifier: +_ALL_.

The ADO alias for this property is SAS Formats.  Use this alias to reference the property in an ADO properties collection. Also, the value of a DBPROP_FORMATS property is not case-sensitive.

A column name alone in the list indicates that data from that column should be returned using the default format. A column name that is preceded by a plus sign indicates that the column should be returned in both ways (formatted and unformatted). This is achieved by creating a pseudo column on the recordset. The pseudo column contains formatted data items and will precedes the column of unformatted values.

Following a column name with the optional format specifier overrides the default format that is associated with the column. The behavior of a plus sign before the column name results in the column's data being returned both unformatted and formatted with the specified information.

Note that the order in which columns are named in the DBPROP_SAS_FORMATS string does not affect the order in which they are returned by an ADO recordset.

Let's look at an example for our data set with three variables:

A value of

SALEDATE=MMDDYY8.,+PRICE=DOLLAR8.2

results in four columns on the ADO recordset.  Assuming the columns are found in the data set in the order listed above, the columns (and their types/formats) surfaced by the recordset would be

Note that the fourth column is not persisted beyond the life of the ADO recordset it is exposed on.

Because ADO recordset properties are read-only after the recordset is opened, you must specify a DBPROP_SAS_FORMATS property value before calling the recordset Open() method. Because of a limitation in ADO, you must use an ADO command object to set the custom DBPROP_SAS_FORMATS option.  The basic algorithm is as follows:

  1. Open a connection to your data source.
  2. Set the ActiveConnection command property to the connection that you just opened.
  3. Set the CommandText command property to the name of the data set to open.
  4. Set the CommandType command property to adCmdTableDirect.
  5. Set the DBPROP_SAS_FORMATS property value using the command objects Properties collection.
  6. Open the recordset by using the command object.

The following Visual Basic code fragment show how to do this in more detail.

Dim cnt As New ADODB.Connection
Dim rst As New ADODB.Recordset
Dim cmd As New ADODB.Command
' STEP 1: Open a connection to your data source
cnt.Open "Provider=sas.IOMProvider.1;
Data Source=c:\mydata"

' STEP 2: Set the command objects ActiveConnection property
     ' to the connection you just opened. This will bring the IOM
     ' provider's custom rowset properties into the command object's
     ' property collection.
     Set cmd.ActiveConnection = cnt

     ' STEP 3
     cmd.CommandText = "sales"

     ' STEP 4: Set the DBPROP_SAS_FORMATS option value to your desired value.
     cmd.Properties("SAS Formats") = "saledate=mmddyy8.,+price=dollar8.2."

     ' STEP 5: Open the recordset. The custom properties you set on the
     ' command object will be passed to the underlying OLE DB rowset when
     ' it is opened.  After the Open() method returns you should be able
     ' to see that the value of rst.Properties("SAS Formats") is the same as
     ' the value you set on the command object above. The fields (that is,
     ' columns) of the recordset will include the formatting as indicated
     ' by the DBPROP_FORMATS value.
     rst.Open cmd,,,,adCmdTableDirect

     ' Use the open recordset

     ' When finished, close the recordset
     rst.Close

     ' Close the connection
     cnt.Close

If you do not know the geometry of the data set that you are working with when you write the code, you can discover a table's column metadata at run time using the OpenSchema() connection object method with the adSchemaTables query type. The recordset that is returned contains the persisted SAS format information for each column along with the defined OLE DB schema information.

 

Contents SAS IOM Data Provider Previous Next