Sample 31582: Creating measures for the course over time.
For data mining and statistics it is very important to derive measures on the analysis subject level (e.g. customer) that describe the course over time of transactional data.
Note:
- PROC REG ist used to calculate the linear trend over time. This is a very efficient method even on large datasets (millions of observations)
- Do not forget to specify the NOPRINT option in PROC REG
- You can specify time horizons for the calcuation with a WHERE-statement
- For segmentation and prediction purposes it is very usefull to create a Group Indicatior variable
This sample was authored by Gerhard Svolba. His book, Data Preparation for Analytics is available from the SAS Publishing online bookstore.
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
*** Step 1 - Create Sample Data;
data wide;
input CustID M1 M2 M3 M4 M5 M6 8.;
Cards;
1 52 54 58 47 38 22
2 22 24 30 28 31 30
3 100 120 110 115 100 95
4 43 43 43 . 42 41
5 20 29 35 39 28 44
6 16 24 18 25 30 24
7 80 70 60 50 60 70
8 90 95 80 100 100 90
9 47 47 47 47 47 47
10 50 52 0 50 0 52
;
run;
PROC TRANSPOSE DATA = WIDE OUT = LONG;
BY custId;
RUN;
DATA LONG;
SET LONG;
FORMAT Month 8.;
RENAME col1 = Usage;
Month = compress(_name_,'M');
DROP _name_;
RUN;
*** Step 2 - Create Measures for Course over time with Proc Reg;
*** a) Long-Term Trend - all six months;
PROC REG DATA = long NOPRINT
OUTEST=Est_LongTerm(KEEP = CustID month
RENAME = (month=LongTerm));
MODEL usage = month;
BY CustID;
RUN;
*** b) Short-Term Trend - only last3 months;
PROC REG DATA = long NOPRINT
OUTEST=Est_ShortTerm(KEEP = CustID month
RENAME = (month=ShortTerm));
MODEL usage = month;
BY CustID;
WHERE month in (4 5 6);
RUN;
DATA mart;
MERGE wide
est_longTerm
est_shortTerm;
BY CustID;
Format ShortTerm LongTerm 8.1;
RUN;
PROC FORMAT;
VALUE est LOW -< -1 = '-'
-1 - 1 = '='
1 <- HIGH = '+';
RUN;
DATA mart;
SET mart;
Format LongShortInd $2.;
LongShortInd = CAT(put(LongTerm,est.),put(ShortTerm,est.));
RUN;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
Cust Short Long
ID M1 M2 M3 M4 M5 M6 LongTerm Term ShortInd
1 52 54 58 47 38 22 -6.0 -12.5 --
2 22 24 30 28 31 30 1.7 1.0 +=
3 100 120 110 115 100 95 -2.3 -10.0 --
4 43 43 43 . 42 41 -0.4 -1.0 ==
5 20 29 35 39 28 44 3.5 2.5 ++
6 16 24 18 25 30 24 1.9 -0.5 +=
7 80 70 60 50 60 70 -2.6 10.0 -+
8 90 95 80 100 100 90 1.0 -5.0 =-
9 47 47 47 47 47 47 0.0 0.0 ==
10 50 52 0 50 0 52 -2.7 1.0 -=
This tip shows how measures that describe the course over time of a certain variable per analysis subject using Proc Reg and SAS Datasteps.
Date Modified: | 2008-04-24 10:26:19 |
Date Created: | 2008-03-25 13:43:21 |
Operating System and Release Information
SAS System | Base SAS | z/OS | | |
OpenVMS VAX | | |
Microsoft® Windows® for 64-Bit Itanium-based Systems | | |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | | |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | | |
Microsoft Windows XP 64-bit Edition | | |
Microsoft® Windows® for x64 | | |
OS/2 | | |
Microsoft Windows 95/98 | | |
Microsoft Windows 2000 Advanced Server | | |
Microsoft Windows 2000 Datacenter Server | | |
Microsoft Windows 2000 Server | | |
Microsoft Windows 2000 Professional | | |
Microsoft Windows NT Workstation | | |
Microsoft Windows Server 2003 Datacenter Edition | | |
Microsoft Windows Server 2003 Enterprise Edition | | |
Microsoft Windows Server 2003 Standard Edition | | |
Microsoft Windows XP Professional | | |
Windows Millennium Edition (Me) | | |
Windows Vista | | |
64-bit Enabled AIX | | |
64-bit Enabled HP-UX | | |
64-bit Enabled Solaris | | |
ABI+ for Intel Architecture | | |
AIX | | |
HP-UX | | |
HP-UX IPF | | |
IRIX | | |
Linux | | |
Linux for x64 | | |
Linux on Itanium | | |
OpenVMS Alpha | | |
OpenVMS on HP Integrity | | |
Solaris | | |
Solaris for x64 | | |
Tru64 UNIX | | |
SAS System | SAS Enterprise Miner | z/OS | | |
Microsoft® Windows® for 64-Bit Itanium-based Systems | | |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | | |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | | |
Microsoft Windows XP 64-bit Edition | | |
Microsoft® Windows® for x64 | | |
Microsoft Windows 95/98 | | |
Microsoft Windows 2000 Advanced Server | | |
Microsoft Windows 2000 Datacenter Server | | |
Microsoft Windows 2000 Server | | |
Microsoft Windows 2000 Professional | | |
Microsoft Windows NT Workstation | | |
Microsoft Windows Server 2003 Datacenter Edition | | |
Microsoft Windows Server 2003 Enterprise Edition | | |
Microsoft Windows Server 2003 Standard Edition | | |
Microsoft Windows XP Professional | | |
Windows Millennium Edition (Me) | | |
Windows Vista | | |
64-bit Enabled AIX | | |
64-bit Enabled HP-UX | | |
64-bit Enabled Solaris | | |
ABI+ for Intel Architecture | | |
AIX | | |
HP-UX | | |
HP-UX IPF | | |
Linux | | |
Linux for x64 | | |
Linux on Itanium | | |
Solaris | | |
Solaris for x64 | | |
Tru64 UNIX | | |