Sample Data Sets


Baseball Data

The Baseball data set contains performance measures and salary levels for regular hitters and leading substitute hitters in Major League Baseball for the year 1986 (Reichler, 1987). There is one observation per hitter.

The following list describes each variable.

name

player’s name

no_atbat

number of times at bat (in 1986)

no_hits

number of hits (in 1986)

no_home

number of home runs (in 1986)

no_runs

number of runs (in 1986)

no_rbi

number of runs batted in (in 1986)

no_bb

number of bases on balls (in 1986)

yr_major

years in the major leagues

cr_atbat

career at-bats

cr_hits

career hits

cr_home

career home runs

cr_runs

career runs

cr_rbi

career runs batted in

cr_bb

career bases on balls

league

player’s league at the end of 1986

division

player’s division at the end of 1986

team

player’s team at the end of 1986

position

positions played (in 1986)

no_outs

number of putouts (in 1986)

no_assts

number of assists (in 1986)

no_error

number of errors (in 1986)

salary

salary, in thousands of dollars (in 1986)

The position variable in the Baseball data set is encoded as follows:

Table A.1: Values for the Position Variable

13

First base and third base

CS

Center field and shortstop

1B

First base

DH

Designated hitter

1O

First base and outfield

DO

Designated hitter and outfield

23

Second base and third base

LF

Left field

2B

Second base

O1

Outfield and first base

2S

Second base and shortstop

OD

Outfield and designated hitter

32

Third base and second base

OF

Outfield

3B

Third base

OS

Outfield and shortstop

3O

Third base and outfield

RF

Right field

3S

Third base and shortstop

S3

Shortstop and third base

C

Catcher

SS

Shortstop

CD

Center field and designated hitter

UT

Utility

CF

Center field