The UNIVARIATE Procedure

 

Example 4.3 Identifying Extreme Observations and Extreme Values

This example, which uses the data set BPressure introduced in Example 4.1, illustrates how to produce a table of the extreme observations and a table of the extreme values in a data set. The following statements generate the "Extreme Observations" tables for Systolic and Diastolic, which enable you to identify the extreme observations for each variable:

title 'Extreme Blood Pressure Observations';
ods select ExtremeObs;
proc univariate data=BPressure;
   var Systolic Diastolic;
   id PatientID;
run;

The ODS SELECT statement restricts the output to the "ExtremeObs" table; see the section ODS Table Names. The ID statement requests that the extreme observations are to be identified using the value of PatientID as well as the observation number. By default, the five lowest and five highest observations are displayed. You can use the NEXTROBS= option to request a different number of extreme observations.

Output 4.3.1 shows that the patient identified as 'CP' (Observation 7) has the highest values for both Systolic and Diastolic. To visualize extreme observations, you can create histograms; see Example 4.14.

Output 4.3.1 Blood Pressure Extreme Observations
Extreme Blood Pressure Observations

The UNIVARIATE Procedure
Variable: Systolic

Extreme Observations
Lowest Highest
Value PatientID Obs Value PatientID Obs
96 SS 2 130 JW 14
100 FR 3 133 RW 11
108 KD 12 134 JW 16
110 DS 13 140 BL 5
110 JI 8 165 CP 7

Extreme Blood Pressure Observations

The UNIVARIATE Procedure
Variable: Diastolic

Extreme Observations
Lowest Highest
Value PatientID Obs Value PatientID Obs
40 JI 8 80 JW 14
50 DS 13 80 JW 16
50 CK 1 82 HH 22
54 KD 12 90 BL 5
60 RW 11 110 CP 7

The following statements generate the "Extreme Values" tables for Systolic and Diastolic, which tabulate the tails of the distributions:

title 'Extreme Blood Pressure Values';
ods select ExtremeValues;
proc univariate data=BPressure nextrval=5;
   var Systolic Diastolic;
run;

The ODS SELECT statement restricts the output to the "ExtremeValues" table; see the section ODS Table Names. The NEXTRVAL= option specifies the number of extreme values at each end of the distribution to be shown in the tables in Output 4.3.2.

Output 4.3.2 shows that the values 78 and 80 occurred twice for Diastolic and the maximum of Diastolic is 110. Note that Output 4.3.1 displays the value of 80 twice for Diastolic because there are two observations with that value. In Output 4.3.2, the value 80 is only displayed once.

Output 4.3.2 Blood Pressure Extreme Values
Extreme Blood Pressure Values

The UNIVARIATE Procedure
Variable: Systolic

Extreme Values
Lowest Highest
Order Value Freq Order Value Freq
1 96 1 11 130 1
2 100 1 12 133 1
3 108 1 13 134 1
4 110 2 14 140 1
5 112 1 15 165 1

Extreme Blood Pressure Values

The UNIVARIATE Procedure
Variable: Diastolic

Extreme Values
Lowest Highest
Order Value Freq Order Value Freq
1 40 1 11 78 2
2 50 2 12 80 2
3 54 1 13 82 1
4 60 2 14 90 1
5 62 1 15 110 1

A sample program for this example, uniex01.sas, is available in the SAS Sample Library for Base SAS software.