| Importing XML Documents |
This example imports the following XML document, which was created from a Microsoft Access database. Because the XML document contains an embedded XML schema, you must specify the MSACCESS format type rather than the default GENERIC format type. MSACCESS obtains a variable's attributes from the embedded schema.
<?xml version="1.0" encoding="UTF-8"?> <root xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" xmlns:od="urn:schemas-microsoft-com:officedata"> <xsd:schema> <xsd:element name="dataroot"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="Suppliers"/> </xsd:choice> </xsd:complexType> </xsd:element> <xsd:element name="Suppliers"> <xsd:annotation> <xsd:appinfo> <od:index index-name="PrimaryKey" index-key="SupplierID " primary="yes" unique="yes" clustered="no"/> <od:index index-name="CompanyName" index-key="CompanyName " primary="no" unique="no" clustered="no"/> <od:index index-name="PostalCode" index-key="PostalCode " primary="no" unique="no" clustered="no"/> </xsd:appinfo> </xsd:annotation> <xsd:complexType> <xsd:sequence> <xsd:element name="SupplierID" od:jetType="autonumber" od:sqlSType="int" od:autoUnique="yes" od:nonNullable="yes"> <xsd:simpleType> <xsd:restriction base="xsd:integer"/> </xsd:simpleType> </xsd:element> <xsd:element name="CompanyName" minOccurs="0" od:jetType="text" od:sqlSType="nvarchar"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:maxLength value="40"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="ContactName" minOccurs="0" od:jetType="text" od:sqlSType="nvarchar"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:maxLength value="30"/> </xsd:restriction> </xsd:simpleType> </xsd:element> . . . </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> <dataroot xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"> <Suppliers> <SupplierID>1</SupplierID> <CompanyName>Exotic Flowers</CompanyName> <ContactName>Charlotte Smith</ContactName> <ContactTitle>Purchasing Manager</ContactTitle> <Address>49 Franklin St.</Address> <City>London</City> <PostalCode>EC1 4SD</PostalCode> <Country>UK</Country> <Phone>(272) 444-2222</Phone> </Suppliers> <Suppliers> <SupplierID>2</SupplierID> <CompanyName>New Orleans Cajun Foods</CompanyName> <ContactName>Shelley Martin</ContactName> <ContactTitle>Order Administrator</ContactTitle> <Address>P.O. Box 78934</Address> <City>New Orleans</City> <Region>LA</Region> <PostalCode>70117</PostalCode> <Country>USA</Country> <Phone>(512) 284-3677</Phone> <HomePage>#MYCAJUN.HTM#</HomePage> </Suppliers> . . . </dataroot> </root>
The following SAS program interprets the XML document as a SAS data set:
libname access xml '/u/myid/myfiles/suppliers.xml' xmltype=msaccessxmlmeta=schemadata;
proc print data=access.suppliers (obs=2);
var companyname contactname; run;
The LIBNAME statement assigns the libref ACCESS to the physical location of the XML document (complete pathname, filename, and file extension), and specifies the XML engine. By default, the XML engine expects GENERIC format, so you must include the XMLTYPE= option in order to read the XML document in MSACCESS format and to obtain a variable's attributes from the embedded schema. The option XMLMETA=SCHEMADATA specifies to import both data and metadata-related information from the input XML document.
The PRINT procedure produces the output. The procedures uses the OBS= data set option to print only the first two observations and the VAR statement to print only specific variables (columns).
PROC PRINT Output for ACCESS.SUPPLIERS
The SAS System 1 Obs COMPANYNAME CONTACTNAME 1 Exotic Flowers Charlotte Smith 2 New Orleans Cajun Foods Shelley Martin
Using PROC CONTENTS, the output displays the file's attributes as well as the attributes of each interpreted column (variable), such as the variable's type and length, which are obtained from the embedded XML schema. Without the embedded XML schema, the results for the attributes would be default values.
proc contents data=access.suppliers; run;
PROC CONTENTS Output for ACCESS.SUPPLIERS
The SAS System 2
The CONTENTS Procedure
Data Set Name ACCESS.SUPPLIERS Observations .
Member Type DATA Variables 12
Engine XML Indexes 0
Created . Observation Length 0
Last Modified . Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation Default
Encoding Default
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
5 Address Char 60 $60. $60. Address
6 City Char 15 $15. $15. City
2 CompanyName Char 40 $40. $40. CompanyName
3 ContactName Char 30 $30. $30. ContactName
4 ContactTitle Char 30 $30. $30. ContactTitle
9 Country Char 15 $15. $15. Country
11 Fax Char 24 $24. $24. Fax
12 HomePage Char 256 $256. $256. HomePage
10 Phone Char 24 $24. $24. Phone
8 PostalCode Char 10 $10. $10. PostalCode
7 Region Char 15 $15. $15. Region
1 SupplierID Num 8 F8. F8. SupplierID
Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.