| File Format-Specific Reference for the IMPORT and EXPORT Procedures |
| Overview of Microsoft Excel Files Essentials |
SAS/ACCESS Interface to PC Files Server works with Microsoft Excel workbook 5, 95, 97, 2000, 2002, 2003, 2007. Throughout this document these files are collectively referred to as .xls files, except for version 2007. Microsoft Excel workbook 2007 file formats are referred to as .xlsb, .xlsm, or .xlsx files. Files that are created with Excel 2007 can create and work with all Excel file formats.
An Excel file represents an Excel workbook. An Excel workbook is a collection of worksheets. A worksheet that was created with a version before Excel 2007 can contain up to 256 columns and 65,536 rows in an .xls file.
A cell is the intersection of a column and a row. It is referenced by a column number and a row number. For example, B5. A cell is the basic unit that saves data in the worksheet. A cell can contain a numeric value or a text value of up to 32,767 characters.
A range is a subset of cells in a worksheet. Its address identifies it: It begins with the name of the top-left cell and ends with the name of the bottom-right cell, separated by two periods. For example, the range B2..E8 is the range address for a rectangular block of 12 cells, where the top-left cell is B2 and the bottom-right cell is E8 (shown as shaded, below). A range name identifies a range.
Excel 2007 has been enhanced to support 16,384 columns and 1,048,576 rows in a worksheet. Files that are created with Excel 2007 can have have an .xlsb, .xlsx, or .xlsm extension. Due to the limitations of the ACE engine and the Jet engine, the SAS LIBNAME engine is limited to 255 columns in an Excel file.
A Range of Data in an Excel Worksheet
![[Range of Data in an Excel Worksheet]](images/excel-capture.gif)
SAS/ACCESS Interface to PC Files Server treats an Excel workbook as a database, and a range (subset of cells in a worksheet) as a table. A range name must be defined in the Excel file before SAS can use it. A worksheet is treated as a special range. A worksheet name appended with a $ character is treated as a range.
For example, Sheet1 is a sheet name in an Excel file. SAS treats Sheet1$ as a valid range name and uses it to refer to the whole worksheet. You need to use SAS n-literal when referring to the sheet name. For example, 'Sheet1$'n. The first row of data in a range is usually treated as a column heading and used to create a SAS variables name.
Excel 4 files, only one spreadsheet is allowed per file
Excel 4, Excel 5, and Excel 95 limits are 256 columns, and 16,384 rows
Excel 97, 2000, 2002, 2003 limits are 256 columns, and 65,536 rows
Excel 2007 limits are 16,384 columns, and 1,048,576 rows.
Excel 95 files are treated as the same format as Excel 5 files.
Excel 2000, 2002, and 2003 files with an .xls file extension are treated as the same format as Excel 97 files.
Excel 2007 has three different file extensions, .xlsb, .xlsm, and .xlsx.
| Excel Data Types |
Microsoft Excel software has two data types: character and numeric.
Character data as labels or formula strings. Character data is generally considered text and can include character type dates and numbers. A cell can save up to 32,767 characters.
Numeric data can be entered as numbers or formulas. Numeric data can include numbers (0 through 9), formulas, or error values (such as #NULL!, #N/A, #VALUE!).
| Excel Numeric Data and Time Values |
The conversion of date and time values between SAS data sets and Microsoft Excel spreadsheets is transparent to users. However, you are encouraged to understand the differences between them.
In Microsoft Excel software, a date value is the integer portion of a number that can range from 01 January 1900 (saved as integer value: 1) to 31 December 9999 (saved as integer value: 2,958,465). A Microsoft Excel software time value is the decimal portion of a number that represents time as a proportion of a day. For example, 0.0 is midnight, 0.5 is noon, and 0.999988 is 23:59:59 (on a 24-hour clock). While a number can have both a date and a time portion, the formats in Microsoft Excel display a number in a date, time, or date/time format.
In SAS software, SAS dates are valid back to AD 1582 and ahead to AD 9,999. A date value is represented by the number of days between January 01, 1960, and that date. A time value is represented by the number of seconds between midnight and that time of day. A datetime value is represented by the number of seconds between midnight January 01, 1960, and that datetime.
When you export a SAS time value to an Excel file, the value could be displayed as "1/0/1900" in the Excel file. Format the cell with a Time format to see the time value displayed correctly.
| Excel File Formats |
If you use 'Save As' and select 'Save As type' of 'Excel 97-2003 Workbook', the file is saved as an .xls file in that format. Under 'Save As' you can select the other 2007 formats:
Excel Workbook creates a .xlsx file.
Excel Macro-Enabled Workbook creates an .xlsm file.
Excel Binary Workbook creates an .xlsb file.
Excel 97-2003 Workbook creates an .xls file.
| SAS Import and Export Utilities Support for Excel Files |
SAS Import and Export Utilities provide three components to access Microsoft Excel files.
| LIBNAME statement |
generates a LIBNAME statement and PROC SQL code for the PC files engine read data from or write data to an Excel file. This component supports Excel versions 5, 95, 97, 2000, 2002, 2003, and 2007.
| ||||||
| The ACCESS and DBLOAD procedures |
generate ACCESS procedure code to read data from an Excel file. Also, generates the DBLOAD procedure code to write data to an Excel file. This component supports only Excel versions 4 and 5/95. This is for SAS 6 compatibility support and is available only on Microsoft Windows.
| ||||||
| .xls file formats |
translates Excel .xls file formats to read data from or write data to an Excel file. This component supports Excel versions 5/95, 97, 2000, 2002, and 2003. For details about available statements.
|
| LIBNAME Statement Options to Import or Export Excel Data |
The LIBNAME statement component for importing and exporting Microsoft Excel workbook data generates SAS LIBNAME statement code. The LIBNAME statement uses the Microsoft Ace engine or Microsoft Jet engine to access data in Microsoft Excel workbook files.
| Data Source | Syntax | Valid Value | Default Value | PROC IMPORT | PROC EXPORT |
|
EXCEL
EXCEL97 EXCEL2000 EXCEL2002 EXCEL2003 EXCEL2007 |
DBDSOPTS= | Excel data set options |
|
Yes | Yes |
| DBSASLABEL= | Compat|None | Compat | Yes | No | |
| GETNAMES= | Yes|No | Yes | Yes | No | |
| MIXED= | Yes|No | No | Yes | No | |
| NEWFILE= | Yes|No | No | No | Yes | |
| RANGE= | range name |
|
Yes | No | |
| SCANTEXT= | Yes|No | Yes | Yes | No | |
| SCANTIME= | Yes|No | Yes | Yes | No | |
| SHEET= | sheet name |
|
Yes | Yes | |
| TEXTSIZE= | 1 to 32767 | 1024 | Yes | No | |
| USEDATE= | Yes|No | Yes | Yes | No |
SAS LIBNAME statement options to import or export Excel Data files on Linux, Windows and UNIX. This component requires that the PC Files Server is running on a Microsoft Windows operating platform where the PC files reside.
| Data Source | Syntax | Valid Value | Default Value | PROC IMPORT | PROC EXPORT |
| EXCELCS | DBDSOPTS= | 'Excel data set options' |
|
Yes | Yes |
| DBSASLABEL= | Yes|No | Yes | Yes | No | |
| PORT= | 1 to 65535 |
|
Yes | Yes | |
| RANGE= | 'range name' |
|
Yes | No | |
| SCANTEXT= | Yes|No | Yes | Yes | No | |
| SCANTIME= | Yes|No | Yes | Yes | No | |
| SERVER= | 'server name' |
|
Yes | Yes | |
| SERVERPASS= | 'server password' |
|
Yes | Yes | |
| SERVERUSER= | 'server User ID' |
|
Yes | Yes | |
| SERVICE= | 'service name' |
|
Yes | Yes | |
| SHEET= | 'sheet name' |
|
Yes | Yes | |
|
SSPI=
64-bit Windows only |
Yes| No | No | Yes | Yes | |
| TEXTSIZE= | 1 to 32767 | 1024 | Yes | No | |
| USEDATE= | Yes|No | Yes | Yes | No | |
| VERSION= | '5' | '95' | '97' | '2000' | '2002' | '2003' | '2007' | '97' | Yes | Yes |
| IMPORT and EXPORT Procedure Options for Excel Files |
enables data set options for the LIBNAME engine such as READBUFF=, INSERTBUFF=, DBTYPE=, DROP=, FIRSTOBS=, and OBS=. These options are for advanced users who are familiar with the PC Files LIBNAME engine.
| Default: | 1for UNIX users who use the client server model with an Excel workbook. |
| Note: | To improve performance for reading data, set the READBUFF= option to 25 or higher. |
Enclose the options in single or double quotation marks as shown in these examples: DBDSOPTS='FIRSTOBS=10 READBUFF=25'; .
If the option string that you specify contains single quotations marks, use double quotation marks around it in your statement.
For example: DBDSOPTS="DBTYPE=(BilledTo='CHAR(8)')"; .
See Data Set Options and Data Set Options for PC Files on Linux and UNIX, and 64-Bit Windows for a list of supported data set options.
specifies the data source column names.
| COMPAT |
specifies that the data source column headings are saved as the corresponding SAS label names.
| ||
| NONE |
specifies that the data source column headings are not saved as SAS label names. The SAS label names are then left as blanks.
|
specifies whether the IMPORT procedure is to generate SAS variable names from the data values in the first row of the import file.
If data in the first row of the input file contains special characters for a SAS variable name, such as a blank; SAS converts the character to an underscore. For example, the column heading.
| YES |
specifies that the IMPORT procedure generate SAS variable names from the data values in the first row of the imported Excel file. |
| NO |
specifies that the IMPORT procedure generate SAS variable names as F1, F2, F3, and so on. |
assigns a SAS character type for the column and converts all numeric data values to character data values when mixed data types are found.
| Default: | NO |
| Restriction: | Supported only when DBMS=EXCEL. |
| Restriction: | If MIXED=YES, the connection is set to import mode and updates are not allowed. |
| Note: | Due to a limitation in the Microsoft Ace engine and the Microsoft Jet Excel engine, using MIXED=YES could result in improper text variable lengths. |
| NO |
assigns numeric or character type for the column, depending on the majority of the type data that is found.
|
These registry settings can affect the behavior of the MIXED= option. The options are located in the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel key of the Microsoft Windows registry. To change the value of TypeGuessRows, follow these steps in Microsoft Windows:
Start
Run
Enter Regedt32 select OK.
Open this key in the Registry Editor window:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel
Double-click TypeGuessRows.
Select Decimal, enter 0 in the Value field, and select OK.
Exit from the Registry Editor window.
| TypeGuessRows |
An integer type with a default value of 8. You can use the number of
rows in the worksheet range in scans to determine column types. If you set
this type to 0, all rows up to 16384 in the range are checked. Microsoft states
that the valid range of TypeGuessRows is 0-16. However, you could set
as high as 16384, and it would still operate correctly.
|
| ImportMixedTypes |
A string type with a default value of Text. If a column contains more
than one type of data while scanning TypeGuessRows rows, the column type is
determined to be Text if the setting value is Text. If the setting value is
Majority Type, the most common column type determines the column type.
For the MIXED=YES option to work correctly, you should you change TypeGuessingRows to 0 in the Microsoft Windows registry so that all rows in the specified range are scanned. As a result, when you use MIXED=YES, the Microsoft Ace engine and the Microsoft Jet engine always assign character type for columns with data of mixed data types. The numeric data is converted to character data. |
when exporting a SAS data set to an existing Excel file, specifies whether to delete the specified Excel file, and load the SAS data set to a sheet in a new Excel file.
| YES |
specifies that the EXPORT procedure deletes the specified Excel file, if it exists, and loads the SAS data set to a sheet in the new Excel file. |
| NO |
specifies that the EXPORT procedure loads the SAS data set to a sheet and appends it to the existing Excel file. If the specified Excel file does not exist, an Excel file is created, and the SAS data set is loaded. |
| Restriction: | Available only when DBMS=EXCEL. |
specifies the number of the port that is listening on the PC Files Server. The valid value is between 1 and 65535. This port or service name displays on the PC Files Server display when the application is started in server mode.
| Default: | 8621 |
| Alias: | PORT_NUMBER |
| Restriction: | Available only for the client server model. |
| Restriction: | The PORT= statement option and the SERVICE= statement option should not be used in the same procedure |
subsets a spreadsheet by identifying the rectangular set of cells to import. The range name is the user defined spreadsheet name, to represent a range of cells within the spreadsheet in the Excel file. The range name is not case sensitive and does not allow any special character, except an underscore. You must define range name with a workbook scope so that the name is visible to SAS.
When data is imported from an Excel file, a sheet name that is appended with a $ character is treated as a range name. The range name refers to the whole sheet.
You can use the DATASETS procedure to list the data set names, that are mapped to the range names. If the displayed range name contains single quotes, keep the single quotes as part of the range name to access the sheet. For example, to retrieve data from the spreadsheet for two separate sheet names, My#Test and CustomerOrders:
RANGE="'My#Test$'"; RANGE="' CustomerOrders$'";
Absolute range identifies the top-left cell that begins the range and the bottom-right cell that ends the range.
Example 1. C2:F12 represents cells within Column C, Row 2, and Column F, Row 12. Here is another example:
RANGE="'summary$a4:b20'";
Example 2, $ indicates the end of the sheet name, colon (:) separates the values for upper left (UL) and lower right (LR) of the cell range. If this statement is not specified, the IMPORT procedure reads the entire spreadsheet as a range.
If the range name is available, it is recommended that you use RANGE= option without the SHEET= option for the IMPORT procedure. To use the absolute range address, it is strongly recommended that you use the full range address with quotes. For example, specify 'sheet_name$A1:C7'n . Use the SHEET= option only when you want to access the whole sheet.
| Restriction: | Supported only for the IMPORT procedure. |
specifies whether to scan the column to determine the length of the text data for each data source column. This option applies only to character data type columns.
| YES |
scans the length of text data for a data source column and uses the length of the longest string of data that it finds as the SAS variable width. If the maximum length that it finds is greater than the value in the TEXTSIZE= option, the smaller value in TEXTSIZE= option is applied as the SAS variable width. |
| NO |
does not scan the length of text data for a data source column. The column length returned from the Microsoft Ace or Microsoft Jet engine is used as the SAS variable width. If the returned column width is greater than what is specified in the TEXTSIZE= option, the smaller value specified in TEXTSIZE= option is applied as the SAS variable width. Specify SCANTEXT=NO when you need to update data in a Microsoft Excel workbook. |
| Alias: | SCAN_TEXT|SCANMEMO |
| Note: | So that the SCANTEXT=YES option works correctly, it is strongly recommended that you change TypeGuessingRows to 0 in the Microsoft Windows registry. This ensures that all rows in the specified range are scanned. |
To correct truncated text data for DBCS users (including Chinese, Japanese, and Korean), you can set the environment variable DBE_DBCS to YES. This doubles the scanned text data length and therefore corrects the problem of truncated text data. To set the DBE_DBCS environment variable, submit this statement: Options set=DBE_DBCS YES; .
The Microsoft Ace or Microsoft Jet engine handles the SCANTEXT= option.
These registry settings could affect the behavior of the SCANTEXT= option. These are located in the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel key of the Microsoft Windows registry.
specifies whether to scan the date/time data while importing data from a date/time column from the Microsoft Excel workbook.
| YES |
scans the date/time column and assigns the TIME. format for a date/time column only if time values are found in the column. |
| NO |
specifies not to scan the date/time column.
|
specifies the PC Files Server name. Excel supports spreadsheet names up to 31 characters. SAS supports data set names up to 32 characters. You must bring up the listener on the PC Files Server before you can establish a connection. You can configure the service name, port number, maximum number of concurrent connections allowed, and data encryption on your PC Files Server.
| Alias: | SERVER_NAME. |
| Note: | Required statement. |
| Restriction: | Available only for client server model. |
| See: | PC Files Server Administration. |
specifies the service name that is defined on your service file for your client and server machines. This port number or service name is displayed on the PC Files Server control panel screen when it is started on the PC. The service name needs to be defined on your UNIX machine and your PC Files Server.
| Alias: | SERVER_NAME |
| Restriction: | SERVICE= and PORT= options should not be used in the same procedure. |
| Restriction: | Available only for client server model. |
identifies a particular spreadsheet in an Excel workbook. If the sheet-name contains a special character such as space in the the EXPORT procedure, SAS converts it to an underscore.
For example, a sheet name "Employee Information" becomes ''Employee_Information".
To ensure that the IMPORT procedure reads the data that you want, identify the spreadsheet by specifying the RANGE= option or the SHEET= option. If the sheet name contains special characters, you can use the RANGE= the RANGE= instead of the SHEET= option to import data.
If you do not specify this statement, the EXPORT procedure writes the spreadsheet name as the SAS data set name.
If your sheet name contains special characters, use PROC DATASETS to list the available data set names. If the sheet name shown contains single quotes, keep the single quotes as part of the sheet name in order to be able to access the sheet.
For example, specify SHEET="'My#Test'";
| Note: | it is recommended that you use the RANGE= option without the SHEET option in the IMPORT procedure. Use the SHEET= option only when you want to import the entire spreadsheet. |
| Note: | If both the range name and the sheet name are missing, the IMPORT procedure reads the first spreadsheet in the workbook. If you do not specify this statement, the IMPORT procedure reads the first spreadsheet that was physically saved in the Excel file. |
| Restriction: | use the SHEET option only when you want to import the entire spreadsheet. |
| Restriction: | Avoid sheet-names that look like cell references, which have <1-3 characters> plus 1 or more digits. For example, A1, IV65536, TRY123, XFD1048576. |
enables the server administrator to allow Integrated Windows Authentication. This is a mechanism for the Windows client and server to exchange credentials.
| Default: | No |
| Note: | SSPI can also be enabled by specifying the -SSPI option on the SAS command line. |
| Restriction: | Valid only when used on a 64-bit Windows PC. |
specifies the SAS maximum variable length that is allowed while importing data from Microsoft Excel text columns. Any text data in Excel whose length exceeds this value is truncated when it is imported into SAS.
| Alias: | DBMAX_TEXT |
specifies whether to assign a DATE or a DATETIME format while importing a date/time column from a Microsoft Excel workbook.
| YES |
assigns the DATE format for the corresponding date/time column in the Microsoft Excel table. |
| NO |
assigns the DATETIME format for the corresponding date/time column in the Microsoft Excel table. |
| See: | The SCANTIME= statement, to assign the appropriate TIME format |
| See also: | Processing Date and Time Values between SAS and Microsoft Excel. |
specifies the version of the file that you want to create.
| Default: | the default value is 97 for .xls files |
| Note: | Always enclose the version in single quotes. |
| Restriction: | If the file exists on the PC Files Server, this statement is ignored. |
| Restriction: | Available only for client server model. |
| Valid Values: | 2007, 2003, 2002, 2000, 97, 95, and 5 |
This example exports a SAS data set called SDF.ORDERS to an Excel 2007 .xlsb file with the Orders sheet name. In this case, SHEET supports only an .xlsb file.
LIBNAME SDF V9 "&sasdir"; PROC EXPORT DATA=SDF.ORDERS FILE="c:\temp\demo.xlsb" DBMS=EXCEL REPLACE; SHEET="Orders"; RUN;
This example imports a range named INVOICE from an Excel workbook file and performs these tasks:
uses the first row of data as column names
scans length for character type columns
assigns DATETIME. format for date/time columns
leaves SAS labels blank
limits the size of text fields to be less than or equal to 512 characters
PROC IMPORT OUT=WORK.INVOICE FILE="&demodir.demo.xlsb" DBMS=EXCEL REPLACE; RANGE="INVOICE"; GETNAMES=YES; SCANTEXT=YES; USEDATE=NO; SCANTIME=YES; DBSASLABEL=NONE; TEXTSIZE=512; /* default is 1024 */ RUN;
The code in this example imports a range named Orders from an Excel file on a PC Files Server. It assigns the DATETIME. format for date/time columns.
PROC IMPORT OUT=WORK.ORDERS DATAFILE="&pcfdir.demo.xls" DBMS=EXCELCS REPLACE; RANGE="Orders"; SERVER="&server"; PORT=8621; USEDATE=NO; SCANTIME=YES; RUN;
This code exports a SAS data set named SDF.CUSTOMER to an Excel workbook file, with Customer sheet name on the PC Files Server.
LIBNAME SDF V9 "&sasdir"; PROC EXPORT DATA=SDF.CUSTOMER FILE="c:\temp\demo.xls" DBMS=EXCELCS REPLACE; SHEET="Customer"; SERVER="&server"; SERVICE=SASPCFILE; RUN;
proc import out=work.test datafile='C:\myFile.xlsb' dbms=excelcs replace; server=filesrv; port=8621; sspi=yes; run;
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.