SAS Data Files |
Definition of Integrity Constraints |
Integrity constraints are a set of data validation rules that you can specify in order to restrict the data values that can be stored for a variable in a SAS data file. Integrity constraints help you preserve the validity and consistency of your data. SAS enforces the integrity constraints when the values associated with a variable are added, updated, or deleted.
There are two categories of integrity constraints: general and referential.
General integrity constraints enable you to restrict the values of variables within a single file. There are four types of general constraints:
A referential integrity constraint is created when a primary key integrity constraint in one data file is referenced by a foreign key integrity constraint in another data file.
The foreign key constraint links the data values of one or more variables in the foreign key data file, to corresponding variables and values in the primary key data file. Data values in the foreign key data file must have a matching value in the primary key data file, or they must be null. When data is updated or deleted in the primary key data file, the modifications are controlled by a referential action that is defined as part of the foreign key constraint.
Separate referential actions can be defined for the update and delete operations. There are three types of referential actions:
The requirements for establishing a referential relationship are as follows:
The primary key and foreign key must reference the same number of variables, and the variables must be in the same order.
The variables must be of the same type (character or numeric) and length.
If the foreign key is being added to a data file that already contains data, the data values in the foreign key data file must either match existing values in the primary key data file, or the values must be null.
The foreign key data file can exist in the same SAS library as the referenced primary key data file (intra-libref), or in a different SAS library (inter-libref). However, if the library that contains the foreign key data file is temporary, the library that contains the primary key data file must be temporary as well. In addition, referential integrity constraints cannot be assigned to data files in concatenated libraries.
There is no limit to the number of foreign keys that can reference a primary key. However, additional foreign keys can adversely impact the performance of update and delete operations.
When a referential constraint exists, a primary key integrity constraint is not deleted until all of the foreign keys that reference it have been deleted. There are no restrictions on deleting foreign keys.
Variables in a SAS data file can be part of both a primary key (general integrity constraint) and a foreign key (referential integrity constraint). However, there are restrictions when you define a primary key and a foreign key constraint that use the same variables:
The foreign key's update and delete referential actions must both be RESTRICT.
When the same variables are used in a primary key and foreign key definition, the variables must be defined in a different order.
For an example, see Defining Overlapping Primary Key and Foreign Key Constraints.
Preservation of Integrity Constraints |
These procedures preserve integrity constraints when their operation results in a copy of the original data file:
in Base SAS software, the APPEND, COPY, CPORT, CIMPORT, MIGRATE, and SORT procedures
PROC APPEND
for an existing BASE= data file, integrity constraints in the BASE= file are preserved, but integrity constraints in the DATA= file that is being appended to the BASE= file are not preserved.
for a non-existent BASE= data file, general integrity constraints in the DATA= file that is being appended to the new BASE= file are preserved. Referential constraints in the DATA= file are not preserved.
PROC SORT, and PROC UPLOAD, and PROC DOWNLOAD, when an OUT= data file is not specified
the SAS Explorer window.
You can also use the CONSTRAINT option to control whether integrity constraints are preserved for the COPY, CPORT, CIMPORT, UPLOAD, and DOWNLOAD procedures.
General integrity constraints are preserved in an active state. The state in which referential constraints are preserved depends on whether the procedure causes the primary key and foreign key data files to be written to the same or different SAS libraries (intra-libref versus inter-libref integrity constraints). Intra-libref constraints are preserved in an active state. Inter-libref constraints are preserved in an inactive state; that is, the primary key portion of the integrity constraint is enforced as a general integrity constraint but the foreign key portion is inactive. You must use the DATASETS procedure statement IC REACTIVATE to reactivate the inactive foreign keys.
The following table summarizes the circumstances under which integrity constraints are preserved.
Procedure | Condition | Constraints That Are Preserved |
---|---|---|
APPEND | DATA= data set does not exist |
General constraints
Referential constraints are not affected |
COPY | CONSTRAINT=yes |
General constraints
Intra-libref constraints are referential in an active state Inter-libref constraints are referential in an inactive state |
CPORT/CIMPORT | CONSTRAINT=yes |
General constraints
Intra-libref constraints are referential in an active state Inter-libref constraints are referential in an inactive state |
SORT | OUT= data set is not specified |
General constraints
Referential constraints are not affected |
UPLOAD/DOWNLOAD | CONSTRAINT=yes and OUT= data set is not specified |
General constraints
Intra-libref constraints are referential in an active state Inter-libref constraints are referential in an inactive state |
SAS Explorer window |
|
General constraints |
Indexes and Integrity Constraints |
The unique, primary key, and foreign key integrity constraints store data values in an index file. If an index file already exists, it is used; otherwise, one is created. Consider the following points when you create or delete an integrity constraint:
When a user-defined index exists, the index's attributes must be compatible with the integrity constraint in order for the integrity constraint to be created. For example, when you add a primary key integrity constraint, the existing index must have the UNIQUE attribute. When you add a foreign key integrity constraint, the index must not have the UNIQUE attribute.
The unique integrity constraint has the same effect as the UNIQUE index attribute; therefore, when one is used, the other is not necessary.
The NOMISS index attribute and the not-null integrity constraint have different effects. The integrity constraint prevents missing values from being written to the SAS data file and cannot be added to an existing data file that contains missing values. The index attribute allows missing data values in the data file but excludes them from the index.
When any index is created, it is marked as being "owned" by the user, the integrity constraint, or both. A user cannot delete an index that is also owned by an integrity constraint and vice versa. If an index is owned by both, the index is deleted only after both the integrity constraint and the user have requested the index's deletion. A note in the log indicates when an index cannot be deleted.
Locking Integrity Constraints |
Integrity constraints support both member-level and record-level locking. You can override the default locking level with the CNTLLEV= data set option. For more information, see the CNTLLEV= data set option in SAS Language Reference: Dictionary.
Specifying Integrity Constraints |
You can create integrity constraints in the SQL procedure, the DATASETS procedure, or in SCL (SAS Component Language). The constraints can be specified when the data file is created or can be added to an existing data file. When you add integrity constraints to an existing file, SAS verifies that the existing data values conform to the constraints that are being added.
When you specify integrity constraints, you must specify a separate statement for each constraint. In addition, you must specify a separate statement for each variable to which you want to assign the not-null integrity constraint. When multiple variables are included in the specification for a primary key, foreign key, or a unique integrity constraint, a composite index is created and the integrity constraint enforces the combination of variable values. The relationship between SAS indexes and integrity constraints is described in Indexes and Integrity Constraints. For more information, see Understanding SAS Indexes.
When you add an integrity constraint in SCL, open the data set in utility mode. See Creating Integrity Constraints by Using SCL for an example. Integrity constraints must be deleted in utility open mode. For detailed syntax information, see SAS Component Language: Reference.
When generation data sets are used, you must create the integrity constraints in each data set generation that includes protected variables.
If you add a CHECK constraint to an existing SAS data set or create a SAS data set that includes a CHECK constraint, the data set cannot be accessed by a release before SAS 9.2.
Specifying Physical Location for Inter-Libref Referential Integrity Constraints When Sharing Disk Space |
When you share disk space over a network and access referential integrity constraints in which the foreign key and primary key data files are in different SAS libraries, a standard should be established for the physical location of the shared files. A standard is required when you create the shared files so that network machines use the same physical name in order to access the files. If the physical names do not match, SAS cannot open the referenced foreign key or primary key SAS data file.
For example, a standard might be established that all shared files are placed on disk T: so that network machines use the same pathname in order to access the files.
Here is an example of a problem regarding files that were created without a standard. Suppose a primary key and a foreign key SAS data file were created on machine D4064 in different directories C:\Public\pkey_directory and C:\Public\fkey_directory. The pathnames are stored in the descriptor information of the SAS data files.
To access the primary key data file from a different machine such as F2760, the following LIBNAME statement would be executed:
libname pkds '\\D4064\Public\pkey_directory';
When the primary key data file is opened for update processing, SAS automatically tries to open the foreign key data file by using the foreign key data file's physical name that is stored in the primary key data file, which is C:\Public\fkey_directory. However, that directory does not exist on machine F2760, and so opening the foreign key data file fails.
Listing Integrity Constraints |
PROC CONTENTS and PROC DATASETS report integrity constraint information without special options. In addition, you can print information about integrity constraints and indexes to a data set by using the OUT2= option. In PROC SQL, the DESCRIBE TABLE and DESCRIBE TABLE CONSTRAINTS statements report integrity constraint characteristics as part of the data file definition or alone, respectively. SCL provides the ICTYPE, ICVALUE, and ICDESCRIBE functions for getting information about integrity constraints. Refer to Base SAS Procedures Guide and SAS Component Language: Reference for more information.
Rejected Observations |
You can customize the error message that is associated with an integrity constraint when you create the constraint by using the MESSAGE= and MSGTYPE= options. The MESSAGE= option enables you to prepend a user-defined message to the SAS error message associated with an integrity constraint. The MSGTYPE= option enables you to suppress the SAS portion of the message. For more information, see the PROC DATASETS, PROC SQL, and SCL documentation.
Rejected observations can be collected in a special file by using an audit trail.
Integrity Constraints and CEDA Processing |
When a SAS data file requires processing with CEDA, integrity constraints are not supported. For example, if you transfer a SAS data file with defined integrity constraints from one operating environment such as Windows to a different operating environment such as UNIX, CEDA translates the file for you, but the integrity constraints are not available. For information about CEDA processing, see Processing Data Using Cross-Environment Data Access (CEDA).
The MIGRATE procedure preserves integrity constraints when migrating data files. For more information, see the MIGRATE Procedure in Base SAS Procedures Guide. The CPORT and CIMPORT procedures preserve integrity constraints when transporting SAS data files from one operating environment to another operating environment. The CPORT procedure makes a copy of the data file in a transportable format, and the CIMPORT procedure reads the transport file and creates a new host-specific copy of the data file. For more information, see the CPORT Procedure and CIMPORT Procedure in Base SAS Procedures Guide.
Examples |
The following sample code creates integrity constraints by means of the DATASETS procedure. The data file TV_SURVEY checks the percentage of viewing time spent on networks, PBS, and other channels, with the following integrity constraints:
data tv_survey(label='Validity checking'); length idnum age 4 gender $1; input idnum gender age network pbs other; datalines; 1 M 55 80 . 20 2 F 36 50 40 10 3 M 42 20 5 75 4 F 18 30 0 70 5 F 84 0 100 0 ; proc datasets nolist; modify tv_survey; ic create val_gender = check(where=(gender in ('M','F'))) message = "Valid values for variable GENDER are either 'M' or 'F'."; ic create val_age = check(where=(age >= 18 and age <= 120)) message = "An invalid AGE has been provided."; ic create val_new = check(where=(network <= 100)); ic create val_pbs = check(where=(pbs <= 100)); ic create val_ot = check(where=(other <= 100)); ic create val_max = check(where=((network+pbs+other)<= 100)); quit;
The following sample program creates integrity constraints by means of the SQL procedure. The data file PEOPLE lists employees and contains employment information. The data file SALARY contains salary and bonus information. The integrity constraints are as follows:
proc sql; create table people ( name char(14), gender char(6), hired num, jobtype char(1) not null, status char(10), constraint prim_key primary key(name), constraint gender check(gender in ('male' 'female')), constraint status check(status in ('permanent' 'temporary' 'terminated')) ); create table salary ( name char(14), salary num not null, bonus num, constraint for_key foreign key(name) references people on delete restrict on update set null ); quit;
To add integrity constraints to a data file by using SCL, you must create and build an SCL catalog entry. The following sample program creates and compiles catalog entry EXAMPLE.IC_CAT.ALLICS.SCL.
INIT: put "Test SCL integrity constraint functions start."; return; MAIN: put "Opening WORK.ONE in utility mode."; dsid = open('work.one', 'V');/* Utility mode.*/ if (dsid = 0) then do; _msg_=sysmsg(); put _msg_=; end; else do; if (dsid > 0) then put "Successfully opened WORK.ONE in" "UTILITY mode."; end; put "Create a check integrity constraint named teen."; rc = iccreate(dsid, 'teen', 'check', '(age > 12) && (age < 20)'); if (rc > 0) then do; put rc=; _msg_=sysmsg(); put _msg_=; end; else do; put "Successfully created a check" "integrity constraint."; end; put "Create a not-null integrity constraint named nn."; rc = iccreate(dsid, 'nn', 'not-null', 'age'); if (rc > 0) then do; put rc=; _msg_=sysmsg(); put _msg_=; end; else do; put "Successfully created a not-null" "integrity constraint."; end; put "Create a unique integrity constraint named uq."; rc = iccreate(dsid, 'uq', 'unique', 'age'); if (rc > 0) then do; put rc=; _msg_=sysmsg(); put _msg_=; end; else do; put "Successfully created a unique" "integrity constraint."; end; put "Create a primary key integrity constraint named pk."; rc = iccreate(dsid, 'pk', 'Primary', 'name'); if (rc > 0) then do; put rc=; _msg_=sysmsg(); put _msg_=; end; else do; put "Successfully created a primary key" "integrity constraint."; end; put "Closing WORK.ONE."; rc = close(dsid); if (rc > 0) then do; put rc=; _msg_=sysmsg(); put _msg_=; end; put "Opening WORK.TWO in utility mode."; dsid2 = open('work.two', 'V'); /*Utility mode */ if (dsid2 = 0) then do; _msg_=sysmsg(); put _msg_=; end; else do; if (dsid2 > 0) then put "Successfully opened WORK.TWO in" "UTILITY mode."; end; put "Create a foreign key integrity constraint named fk."; rc = iccreate(dsid2, 'fk', 'foreign', 'name', 'work.one','null', 'restrict'); if (rc > 0) then do; put rc=; _msg_=sysmsg(); put _msg_=; end; else do; put "Successfully created a foreign key" "integrity constraint."; end; put "Closing WORK.TWO."; rc = close(dsid2); if (rc > 0) then do; put rc=; _msg_=sysmsg(); put _msg_=; end; return; TERM: put "End of test SCL integrity constraint" "functions."; return;
The previous code creates the SCL catalog entry. The following code creates two data files, ONE and TWO, and executes the SCL entry EXAMPLE.IC_CAT.ALLICS.SCL:
/* Submit to create data files. */ data one two; input name $ age; datalines; Morris 13 Elaine 14 Tina 15 ; /* after compiling, run the SCL program */ proc display catalog= example.ic_cat.allics.scl; run;
The following sample program segments remove integrity constraints. In the code that deletes a primary key integrity constraint, note that the foreign key integrity constraint is deleted first.
This program segment deletes integrity constraints using PROC SQL.
proc sql; alter table salary DROP CONSTRAINT for_key; alter table people DROP CONSTRAINT gender DROP CONSTRAINT _nm0001_ DROP CONSTRAINT status DROP CONSTRAINT prim_key ; quit;
This program segment removes integrity constraints using PROC DATASETS.
proc datasets nolist; modify tv_survey; ic delete val_max; ic delete val_gender; ic delete val_age; run; quit;
This program segment removes integrity constraints using SCL.
TERM: put "Opening WORK.TWO in utility mode."; dsid2 = open( 'work.two' , 'V' ); /* Utility mode. */ if (dsid2 = 0) then do; _msg_=sysmsg(); put _msg_=; end; else do; if (dsid2 > 0) then put "Successfully opened WORK.TWO in Utility mode."; end; rc = icdelete(dsid2, 'fk'); if (rc > 0) then do; put rc=; _msg_=sysmsg(); end; else do; put "Successfully deleted a foreign key integrity constraint."; end; rc = close(dsid2); return;
The following program segment reactivates a foreign key integrity constraint that has been inactivated as a result of a COPY, CPORT, CIMPORT, UPLOAD, or DOWNLOAD procedure.
proc datasets; modify SAS-data-set; ic reactivate fkname references libref; run; quit;
The following code illustrates defining overlapping primary key and foreign key constraints:
data Singers1; input FirstName $ LastName $ Age; datalines; Tom Jones 62 Kris Kristofferson 66 Willie Nelson 69 Barbra Streisand 60 Paul McCartney 60 Randy Travis 43 ; data Singers2; input FirstName $ LastName $ Style $; datalines; Tom Jones Rock Kris Kristofferson Country Willie Nelson Country Barbra Streisand Contemporary Paul McCartney Rock Randy Travis Country ; proc datasets library=work nolist; modify Singers1; ic create primary key (FirstName LastName); 1 run; modify Singers2; ic create foreign key (FirstName LastName) references Singers1 on delete restrict on update restrict; 2 run; modify Singers2; ic create primary key (LastName FirstName); 3 run; modify Singers1; ic create foreign key (LastName FirstName) references Singers2 on delete restrict on update restrict; 4 run; quit;
Defines a primary key constraint for data set Singers1, for variables FIRSTNAME and LASTNAME. | |
Defines a foreign key constraint for data set Singers2 for variables FIRSTNAME and LASTNAME that references the primary key defined in Step 1. Because the intention is to define a primary key using the same variables, the foreign key update and delete referential actions must both be RESTRICT. | |
Defines a primary key constraint for data set Singers2 for variables LASTNAME and FIRSTNAME. Because those exact same variables are already defined as a foreign key, the order must be different. | |
Defines a foreign key constraint for data set Singers1 for variables LASTNAME and FIRSTNAME that references the primary key defined in Step 3. Because those exact same variables are already defined as a primary key, the order must be different. Because a primary key is already defined using the same variables, the foreign key's update and delete referential actions must both be RESTRICT. |
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.