Introduction

Column

Background

An Exploration into MLB Statistics and MLB Salaries, 1985-2016

The purpose of this study is to analyze the relationship between batting and pitching statistics and player salaries in Major League Baseball.

Team front offices must place a monetary value on players each season, which requires an understanding of the most important performance metrics for team success. This study will explore such metrics and their relationship with yearly player salaries.

The dataset is split into two categories, Batting and Pitching, and contains information for all seasons from 1985 to 2016. The batting data holds 5,034 observations of 17 variables, and the pitching data holds 2,396 observations of 20 variables. The data was gathered from Sean Lehman’s Baseball Database and can be found here.

Research Questions

  1. Which batting statistics have the strongest impact on player salary?

  2. Which pitching statistics have the strongest impact on player salary?

  3. Which aspect of baseball has a stronger correlation with higher salaries: batting or pitching?

Column

Batting Variable Classification

The variables studied in the Batting dataset are as follows:

yearID: Year

team_name: Team

playerID: Unique identification code assigned to each player

Name: Player’s name

salary: Player’s salary for given year

G: Games played by batter

R: Runs scored by batter

H: Hits by batter

2B: Doubles hit by batter

3B: Triples hit by batter

HR: Home runs hit by batter

RBI: Runs batted in

SB: Stolen bases by batter

BB: Walks by batter

SO: Strikeouts by batter

BA: Batting average

Pitching Variable Classification

The variables for the Pitching dataset are as follows:

yearID: Year

team_name: Team

playerID: Player ID code

Name: Player’s name

salary: Player’s salary for given year

G.Pitching: Games played by pitcher

GS: Games started by pitcher

IPouts: Outs pitched (innings pitched multiplied by 3)

W: Wins by pitcher

L: Losses by pitcher

CG: Complete Games pitched

SHO: Shutouts pitched

SV: Saves

H.Pitching: Hits allowed by pitcher

ER: Earned runs allowed by pitcher

HR.Pitching: Home runs allowed by pitcher

BB.Pitching: Walks allowed by pitcher

SO.Pitching: Strikeouts pitched

BAOpp: Opponent’s batting average against pitcher

R.pitching: Runs allowed by pitcher

Data

Column

Batting Data

Column

Pitching Data

Exploratory Analysis

Column

Distribution of Batter Salaries

Distribution of Pitcher Salaries

Change in Batter Salaries Over Time

Change in Pitcher Salaries Over Time

Correlation of Batting Statistics

Correlation of Pitching Statistics

Column

Analysis

The two bar charts display the distribution of salaries split into 8 groups. These groupings are meant to show the concentration of players within salary ranges, as this helps get a sense of what salaries separate the highest-value players from the rest of the league. Between 1985 and 2016, the median salary for batters is $2 million and is $1.5 million for pitchers. The middle 50% of batters made a salary between $500,000 and $5 million, and the middle 50% of pitchers were between $439,375 and $4.75 million.

The MLB experienced a financial boom during the time period studied, which has led to more lucrative salaries and a wider distribution of salaries overall. As the box plots show, both batters and pitchers have seen greater variation in salary with the passage of time. Television deals and the growth of baseball internationally have increased league revenue, which has allowed team owners to invest in their players with higher salaries.

By studying the correlation plots the batting and pitching datasets, I discerned which statistics have the strongest relationship with salary. For batters, home runs, RBI, and walks have a positive correlation with salary. As each of these stats concern scoring, it is natural that players who generate more runs and get on base are paid more money. For pitchers, strikeouts and wins had strong positive correlations, and walks allowed had a strong negative relationship. Again, this aligns with basic baseball principles: pitchers who get more outs, control their pitchers better, and win more games are bound to be paid more. There were two surprises in these correlation plots. Triples and stolen bases are shown to have a negative correlation with salary for batters, and games are negative for pitcher salary. Triples are more rarely hit than home runs, and stealing bases creates scoring opportunities, so I expected these variables to positively relate to salary. Similarly, I assumed that pitchers who appeared in more games would be valued more than pitchers who did not, as this can be an indicator of health and longevity.

Correlation

Column

Home Runs by Salary Group

RBI by Salary Group

Walks by Salary Group

Column

Earned Runs by Salary Group

Strikeouts by Salary Group

Wins by Salary Group

Batting v. Pitching

Column

Major Batting Variables

Major Pitching Variables

Column

Analysis

Of the three standout statistics for batting and for pitching, correlation coefficients show that batting has a stronger relationship with salary than pitching. Home runs (.29), Runs Batted In (.27), and Walks (.21) each have higher correlation coefficients than all pitching statistics except for strikeouts, which beats out RBI by just .01. As run creators, batters are often valued more than pitchers. These heatmaps display the actual value teams place on a player’s skills when it comes to determining salary. Year in and year out, the best hitters are always offered the highest salaries, as run creation leads to more wins, which is every team’s goal.

The pitching statistics most strongly correlated to salary cover pitch control and game management. Pitchers with better control and strategy are those who strike out more batters and give up less walks. To qualify for a win, a pitcher must pitch at least 5 innings. Wins are a good indicator of how well a pitcher can manage a game and not let the opposing offense take the lead. This requires precise and accurate pitching, as well as durability. A pitcher who cannot last until the fifth inning is far less valuable than a pitcher who can regularly reach the sixth or seventh inning while maintaining quality pitch control.

Map

Column

Map of Each Team’s Highest Paid Batter

Map of Each Team’s Highest Paid Pitcher

Conclusion

Column

Limitations

The main limitation in this study was the time range. The data only contained 31 years of salaries, compared to the 151 years of major league baseball’s existence. As baseball has become more popular, the league has generated more revenue, which leads to larger salaries alongside naturally-occurring inflation. If I were able to control for the change in the amount of money allocated to salary, I would have been able to gain a clearer understanding of how performance impacts salary.

References

Sean Lehman’s Baseball Database [http://seanlahman.com/download-baseball-database/]

Column

Conclusion

Based on this study, home runs, runs batted in, and walks are the primary determinants of increased salaries for batters. Earned runs, strikeouts, and wins have the greatest impact on pitcher salaries. The top batting statistics have a slightly stronger correlation with salary than pitching statistics, indicating that teams value batting more than pitching.

This is a natural assumption to make for various reasons. In a league where scoring runs wins championships, the last year a pitcher won the Most Valuable Player award was 2014, and only 6 pitchers have won the World Series MVP award in the 21st century. The best pitchers in the game cannot win by themselves–teams need batters to score.

Author

Column

About the Author

My name is Kevin O’Connell and I am a junior at the University of Dayton pursuing a Bachelor’s of Arts in Economics with minors in Data Analytics and Sociology. My projected graduation is May 2025.

This summer, I will be working for The Mather Group, a registered investment advisory firm, as a wealth management intern in Chicago, Illinois. Upon graduation, I hope to work in the financial advising industry.

As far as experience, I currently work as the Product Manager of Heritage Coffeehouse, a division of Flyer Enterprises. I am responsible for all ordering, inventory tracking, and product pricing of the entirely student-run coffee shop. Last summer, I interned with the Illinois Department of Employment Security, where I worked with data validation in Microsoft Excel and managed a social media campaign on long and short-term employment projections.

Feel free to connect with me on LinkedIn here

---
title: "MLB Statistics and Salaries"
author: "Kevin O'Connell"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: default
      navbar-bg: "darkred"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---
<style>

.chart-title {  /* chart_title  */
   font-size: 24px;
  }
body{ /* Normal  */
      font-size: 18px;
  }
<head>
    <base target="_blank">
</head>
</style> 
  
```{r setup, include=FALSE}
pacman::p_load(corrplot, flexdashboard, ggplot, tidyverse, plotly, maps, leaflet, scales, viridis)

full_data <- read_csv("C:\\RStudio_MTH_209\\final_project.csv")

# Changing team variable to remove duplicates
full_data <-  mutate(full_data, team_name = case_when(
  teamID == "ATL" ~ "Atlanta Braves",
  teamID == "BAL" ~ "Baltimore Orioles",
  teamID == "BOS" ~ "Boston Red Sox",
  teamID == "CAL" ~ "Los Angeles Angels",
  teamID == "ANA" ~ "Los Angeles Angels",
  teamID == "LAA" ~ "Los Angeles Angels",
  teamID == "CHA" ~ "Chicago White Sox",
  teamID == "CHN" ~ "Chicago Cubs",
  teamID == "CIN" ~ "Cincinnati Reds",
  teamID == "CLE" ~ "Cleveland Indians",
  teamID == "DET" ~ "Detroit Tigers",
  teamID == "HOU" ~ "Houston Astros",
  teamID == "KCA" ~ "Kansas City Royals",
  teamID == "LAN" ~ "Los Angeles Dodgers",
  teamID == "MIN" ~ "Minnesota Twins",
  teamID == "ML4" ~ "Milwaukee Brewers",
  teamID == "MIL" ~ "Milwaukee Brewers",
  teamID == "MON" ~ "Washington Nationals",
  teamID == "WAS" ~ "Washington Nationals",
  teamID == "NYA" ~ "New York Yankees",
  teamID == "NYN" ~ "New York Mets",
  teamID == "OAK" ~ "Oakland Athletics",
  teamID == "PHI" ~ "Philadelphia Phillies",
  teamID == "PIT" ~ "Pittsburgh Pirates",
  teamID == "SDN" ~ "San Diego Padres",
  teamID == "SEA" ~ "Seattle Mariners",
  teamID == "SFN" ~ "San Francisco Giants",
  teamID == "SLN" ~ "St Louis Cardinals",
  teamID == "TEX" ~ "Texas Rangers",
  teamID == "TOR" ~ "Toronto Blue Jays",
  teamID == "COL" ~ "Colorado Rockies",
  teamID == "FLO" ~ "Miami Marlins",
  teamID == "MIA" ~ "Miami Marlins",
  teamID == "TBA" ~ "Tampa Bay Rays",
  teamID == "ARI" ~ "Arizona Diamondbacks"
))

full_data <- select(full_data, -teamID)

# Separating data based on batting versus pitching
batting <- select(full_data, yearID, team_name, playerID, Name, salary, G, AB, R, 
                  H, '2B', '3B', HR, RBI, SB, BB, SO)
batting <- batting %>%
  filter(!is.na(AB))

# Filtering data for qualified hitters
batting <- batting %>% 
  filter(AB >= 400)

# Adding column for Batting Average
batting <- batting %>%
  mutate(BA = round(H / AB, 3))

pitching <-select(full_data, yearID, team_name, playerID, Name, salary, G.Pitching, 
                  GS, IPouts, W, L, CG, SHO, SV, H.Pitching, ER, HR.Pitching, 
                  BB.Pitching, SO.Pitching, BAOpp, R.Pitching)
pitching <- pitching %>%
  filter(!is.na(G.Pitching))

# Filtering data for qualified starting pitchers
pitching <- pitching %>% 
  filter(IPouts >= 486)
```

Introduction
===

Column {data-width=450}
-----------------------------------------------------------------------

### Background

**An Exploration into MLB Statistics and MLB Salaries, 1985-2016**

The purpose of this study is to analyze the relationship between batting and pitching statistics and player salaries in Major League Baseball.

Team front offices must place a monetary value on players each season, which requires an understanding of the most important performance metrics for team success. This study will explore such metrics and their relationship with yearly player salaries.

The dataset is split into two categories, Batting and Pitching, and contains information for all seasons from 1985 to 2016. The batting data holds **5,034** observations of **17** variables, and the pitching data holds **2,396** observations of **20** variables. The data was gathered from Sean Lehman's Baseball Database and can be found [here](http://seanlahman.com/download-baseball-database/).



### Research Questions

1. Which batting statistics have the strongest impact on player salary?

2. Which pitching statistics have the strongest impact on player salary?

3. Which aspect of baseball has a stronger correlation with higher salaries: batting or pitching?


Column {.tabset data-width=550}
-----------------------------------------------------------------------

### Batting Variable Classification

**The variables studied in the Batting dataset are as follows:**

yearID: Year

team_name: Team

playerID: Unique identification code assigned to each player

Name: Player's name

salary: Player's salary for given year

G: Games played by batter

R: Runs scored by batter

H: Hits by batter

2B: Doubles hit by batter

3B: Triples hit by batter

HR: Home runs hit by batter

RBI: Runs batted in

SB: Stolen bases by batter

BB: Walks by batter

SO: Strikeouts by batter

BA: Batting average

### Pitching Variable Classification

**The variables for the Pitching dataset are as follows:**

yearID: Year

team_name: Team

playerID: Player ID code

Name: Player's name

salary: Player's salary for given year

G.Pitching: Games played by pitcher

GS: Games started by pitcher

IPouts: Outs pitched (innings pitched multiplied by 3)

W: Wins by pitcher

L: Losses by pitcher

CG: Complete Games pitched

SHO: Shutouts pitched

SV: Saves

H.Pitching: Hits allowed by pitcher

ER: Earned runs allowed by pitcher

HR.Pitching: Home runs allowed by pitcher

BB.Pitching: Walks allowed by pitcher

SO.Pitching: Strikeouts pitched

BAOpp: Opponent's batting average against pitcher

R.pitching: Runs allowed by pitcher


Data
===

Column {data-width=550}
-----------------------------------------------------------------------

### Batting Data

```{r batting data table}
DT::datatable(batting, rownames = FALSE, options = list(
                columnDefs = list(list(className = 'dt-center', 
                                       targets = 1:5)), pageLength = 10))
```

Column {data-width=550}
-----------------------------------------------------------------------

### Pitching Data

```{r pitching datatable}
DT::datatable(pitching, rownames = FALSE, options = list(
                columnDefs = list(list(className = 'dt-center', 
                                       targets = 1:5)), pageLength = 10))
```


Exploratory Analysis
===

Column {.tabset data-width=550}
-----------------------------------------------------------------------
### Distribution of Batter Salaries


```{r batting salary bar chart}
batting <- mutate(batting, SalaryGroup = case_when(
  salary < 60000 ~ "<60k",
  salary >= 60000  & salary < 250000 ~ "60k-250k",
  salary >= 250000 & salary < 500000 ~ "250k-500k",
  salary >= 500000 & salary < 1000000 ~ "500k-1M",
  salary >= 1000000 & salary < 2000000 ~ "1M-2M",
  salary >= 2000000 & salary < 5000000 ~ "2M-5M",
  salary >= 5000000 & salary < 10000000 ~ "5M-10M",
  salary >= 10000000 & salary < 20000000 ~ "10M-20M",
  salary >= 20000000 ~ ">20M",
))

batting <- batting %>%
  mutate(SalaryGroup = factor(
    SalaryGroup, levels = c(
      "<60k", "60k-250k", "250k-500k", "500k-1M", "1M-2M", 
      "2M-5M", "5M-10M", "10M-20M", ">20M")))

bat_salaries <- ggplot(batting, aes(x = SalaryGroup)) +
  geom_bar(fill = "darkorange", col = "black") +
  labs(title = "Distribution of Batter Salaries",
       x = "Salary Group",
       y = "Number of Players in Salary Group")

ggplotly(bat_salaries)
```

### Distribution of Pitcher Salaries

```{r pitching salary bar chart}
pitching <- mutate(pitching, SalaryGroup = case_when(
  salary < 60000 ~ "<60k",
  salary >= 60000  & salary < 250000 ~ "60k-250k",
  salary >= 250000 & salary < 500000 ~ "250k-500k",
  salary >= 500000 & salary < 1000000 ~ "500k-1M",
  salary >= 1000000 & salary < 2000000 ~ "1M-2M",
  salary >= 2000000 & salary < 5000000 ~ "2M-5M",
  salary >= 5000000 & salary < 10000000 ~ "5M-10M",
  salary >= 10000000 & salary < 20000000 ~ "10M-20M",
  salary >= 20000000 ~ ">20M",
))

pitching <- pitching %>%
  mutate(SalaryGroup = factor(
    SalaryGroup, levels = c(
      "<60k", "60k-250k", "250k-500k", "500k-1M", "1M-2M", 
      "2M-5M", "5M-10M", "10M-20M", ">20M")))

pitch_salaries <- ggplot(pitching, aes(x = SalaryGroup)) +
  geom_bar(fill = "darkgreen", col = "black") +
  labs(title = "Distribution of Pitcher Salaries",
       x = "Salary Group",
       y = "Number of Players in Salary Group")

ggplotly(pitch_salaries)
```

### Change in Batter Salaries Over Time

```{r batting over time}
bat_time <- ggplot(batting, aes(as.factor(x = yearID), y = salary)) +
  geom_boxplot(fill = "purple") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Batter Salaries Over Time",
       x = "Year (1985-2016)",
       y = "Salary ($)") +
scale_x_discrete(breaks = seq(1985, 2016, by = 5))

ggplotly(bat_time)
```

### Change in Pitcher Salaries Over Time

```{r pitching over time}
pitch_time <- ggplot(pitching, aes(as.factor(x = yearID), y = salary)) +
  geom_boxplot(fill = "lightblue") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Pitcher Salaries Over Time",
       x = "Year (1985-2016)",
       y = "Salary ($)") +
  scale_x_discrete(breaks = seq(1985, 2016, by = 5))

ggplotly(pitch_time)
```

### Correlation of Batting Statistics

```{r batting correlation}
bat_cor <- cor(batting[c(5, 7:17)])

corrplot(bat_cor, type = "upper", order = "hclust",
         tl.col = "black", tl.srt = 45)
```

### Correlation of Pitching Statistics

```{r pitching correlation}
pitch_cor <- cor(pitching[c(5, 6:18)])

corrplot(pitch_cor, type = "upper", order = "hclust",
         tl.col = "black", tl.srt = 45)
```         

Column {data-width=300}
-----

### Analysis

The two bar charts display the distribution of salaries split into 8 groups. These groupings are meant to show the concentration of players within salary ranges, as this helps get a sense of what salaries separate the highest-value players from the rest of the league. Between 1985 and 2016, the median salary for batters is \$2 million and is \$1.5 million for pitchers. The middle 50% of batters made a salary between \$500,000 and \$5 million, and the middle 50% of pitchers were between \$439,375 and \$4.75 million.

The MLB experienced a financial boom during the time period studied, which has led to more lucrative salaries and a wider distribution of salaries overall. As the box plots show, both batters and pitchers have seen greater variation in salary with the passage of time. Television deals and the growth of baseball internationally have increased league revenue, which has allowed team owners to invest in their players with higher salaries. 

By studying the correlation plots the batting and pitching datasets, I discerned which statistics have the strongest relationship with salary. For batters, home runs, RBI, and walks  have a positive correlation with salary. As each of these stats concern scoring, it is natural that players who generate more runs and get on base are paid more money. For pitchers, strikeouts and wins had strong positive correlations, and walks allowed had a strong negative relationship. Again, this aligns with basic baseball principles: pitchers who get more outs, control their pitchers better, and win more games are bound to be paid more. There were two surprises in these correlation plots. Triples and stolen bases are shown to have a negative correlation with salary for batters, and games are negative for pitcher salary. Triples are more rarely hit than home runs, and stealing bases creates scoring opportunities, so I expected these variables to positively relate to salary. Similarly, I assumed that pitchers who appeared in more games would be valued more than pitchers who did not, as this can be an indicator of health and longevity.  


Correlation
===

Column {.tabset data-width=300}
-----------------------------------------------------------------------

### Home Runs by Salary Group

```{r hr by salary}
HR_box <- ggplot(batting, aes(x = SalaryGroup, y = HR)) +
  geom_boxplot(fill = "lightgreen") +
  labs(title = "Distribution of Home Runs by Salary Group",
        x = "Salary Group",
        y = "HR")
ggplotly(HR_box)
```

### RBI by Salary Group
```{r rbi by salary}
RBI_box <- ggplot(batting, aes(x = SalaryGroup, y = RBI)) +
  geom_boxplot(fill = "lightgreen") +
  labs(title = "Distribution of Runs Batted In by Salary Group",
        x = "Salary Group",
        y = "RBI")
ggplotly(RBI_box)
```

### Walks by Salary Group
```{r bb by salary}

BB_box <- ggplot(batting, aes(x = SalaryGroup, y = BB)) +
  geom_boxplot(fill = "lightgreen") +
  labs(title = "Distribution of Walks by Salary Group",
        x = "Salary Group",
        y = "BB")
ggplotly(BB_box)
```

Column {.tabset data-width=300}
-----------------------------------------------------------------------

### Earned Runs by Salary Group

```{r pitch bb salary}
BB_pitch_box <- ggplot(pitching, aes(x = SalaryGroup, y = BB.Pitching)) +
  geom_boxplot(fill = "maroon") +
  labs(title = "Distribution of Earned Runs by Salary Group",
        x = "Salary Group",
        y = "BB")
ggplotly(BB_pitch_box)
```



### Strikeouts by Salary Group

```{r so salary}
SO_box <- ggplot(pitching, aes(x = SalaryGroup, y = SO.Pitching)) +
  geom_boxplot(fill = "maroon") +
  labs(title = "Distribution of Strikeouts by Salary Group",
        x = "Salary Group",
        y = "SO")
ggplotly(SO_box)
```


### Wins by Salary Group
```{r w salary}
W_box <- ggplot(pitching, aes(x = SalaryGroup, y = W)) +
  geom_boxplot(fill = "maroon") +
  labs(title = "Distribution of Wins by Salary Group",
       x = "Salary Group",
       y = "W")
ggplotly(W_box)
```

Batting v. Pitching
===

Column {data-width=450}
-----------------------------------------------------------------------


### Major Batting Variables

```{r bat heatmap}
corcoeff_batting <- cor(batting[c("HR", "RBI", "BB", "salary")])
bat_heat <- corrplot(corcoeff_batting, method = "color", addCoef.col = "black", number.cex = 1)
```

### Major Pitching Variables

```{r pitch heatmap}
corcoeff_pitching <- cor(pitching[c("SO.Pitching", "W", "BB.Pitching", "salary")])
pitch_heat <-corrplot(corcoeff_pitching, method = "color", addCoef.col = "black", number.cex = 1)
```

Column {data-width=450}
-----------------------------------------------------------------------

### Analysis

Of the three standout statistics for batting and for pitching, correlation coefficients show that batting has a stronger relationship with salary than pitching. Home runs (.29), Runs Batted In (.27), and Walks (.21) each have higher correlation coefficients than all pitching statistics except for strikeouts, which beats out RBI by just .01. As run creators, batters are often valued more than pitchers. These heatmaps display the actual value teams place on a player's skills when it comes to determining salary. Year in and year out, the best hitters are always offered the highest salaries, as run creation leads to more wins, which is every team's goal.

The pitching statistics most strongly correlated to salary cover pitch control and game management. Pitchers with better control and strategy are those who strike out more batters and give up less walks. To qualify for a win, a pitcher must pitch at least 5 innings. Wins are a good indicator of how well a pitcher can manage a game and not let the opposing offense take the lead. This requires precise and accurate pitching, as well as durability. A pitcher who cannot last until the fifth inning is far less valuable than a pitcher who can regularly reach the sixth or seventh inning while maintaining quality pitch control.



Map
===

Column {.tabset data-width=450}
-----------------------------------------------------------------------

### Map of Each Team's Highest Paid Batter

```{r batting map}
batting_highest_salary <- batting %>%
  group_by(team_name) %>%
  arrange(desc(salary)) %>%
  slice_head(n = 1) %>%
  ungroup()

team_map <- read_csv("C:\\RStudio_MTH_209\\MLB_map.csv")

team_map <- team_map %>%
  rename(team_name = Teams)

batting_salary_map <- batting_highest_salary %>% left_join(team_map, 
                               by = c("team_name"))

batting_salary_map <- batting_salary_map %>%
  mutate(longitude = as.numeric(longitude),
         latitude = as.numeric(latitude))

icons <- makeAwesomeIcon(
  icon = 'circle',
  library = 'ion',
  markerColor =
    ifelse(batting_salary_map$SalaryGroup == "10M-20M", "#CC0000", "purple"))
 
makeColorsandNames <- data.frame(groups = c('10M-20M','Greater than 20M'),
                                 groups.cent = c('#CC0000','purple'))

batting_map <- leaflet(batting_salary_map) %>%
  setView(lng = -98.5795, lat = 39.8283, zoom = 4.5) %>%
  addTiles() %>%
  addAwesomeMarkers(
    lat = ~latitude,
    lng = ~longitude,
    icon = ~icons,
    popup = ~paste("Name: ", Name, "<br>",
                   "Team: ", team_name, "<br>",
                   "Salary: ", dollar(salary), "<br>",
                   "Year: ", yearID)) %>%
  addLegend(position = 'bottomleft',
            colors = makeColorsandNames[,2],
            labels = makeColorsandNames[,1],
            opacity = 1, title = 'Salary Groups')

batting_map
```

### Map of Each Team's Highest Paid Pitcher

```{r pitching map}
pitching_highest_salary <- pitching %>%
  group_by(team_name) %>%
  arrange(desc(salary)) %>%
  slice_head(n = 1) %>%
  ungroup()


team_map <- read_csv("C:\\RStudio_MTH_209\\MLB_map.csv")

team_map <- team_map %>%
  rename(team_name = Teams)

pitching_salary_map <- pitching_highest_salary %>% left_join(team_map, 
                               by = c("team_name"))

pitching_salary_map <- pitching_salary_map %>%
  mutate(longitude = as.numeric(longitude),
         latitude = as.numeric(latitude))

icons <- makeAwesomeIcon(
  icon = 'circle',
  library = 'ion',
  markerColor =
    ifelse(pitching_salary_map$SalaryGroup == "10M-20M", "#CC0000",
           ifelse(pitching_salary_map$SalaryGroup == ">20M", "purple", "blue"))
)
 
makeColorsandNames <- data.frame(groups = c('10M-20M','Greater than 20M', "Less than 10M"),
                                 groups.cent = c('#CC0000','purple', "blue"))

pitching_map <- leaflet(pitching_salary_map) %>%
  setView(lng = -98.5795, lat = 39.8283, zoom = 4.5) %>%
  addTiles() %>%
  addAwesomeMarkers(
    lat = ~latitude,
    lng = ~longitude,
    icon = ~icons,
    popup = ~paste("Name: ", Name, "<br>",
                   "Team: ", team_name, "<br>",
                   "Salary: ", dollar(salary), "<br>",
                   "Year: ", yearID)) %>%
  addLegend(position = 'bottomleft',
            colors = makeColorsandNames[,2],
            labels = makeColorsandNames[,1],
            opacity = 1, title = 'Salary Groups')

pitching_map

```

Conclusion
===

Column {data-width=450}
---


### Limitations

The main limitation in this study was the time range. The data only contained 31 years of salaries, compared to the 151 years of major league baseball's existence. As baseball has become more popular, the league has generated more revenue, which leads to larger salaries alongside naturally-occurring inflation. If I were able to control for the change in the amount of money allocated to salary, I would have been able to gain a clearer understanding of how performance impacts salary. 

### References

Sean Lehman's Baseball Database  [http://seanlahman.com/download-baseball-database/]


Column {data-width=550}
---

### Conclusion

Based on this study, home runs, runs batted in, and walks are the primary determinants of increased salaries for batters. Earned runs, strikeouts, and wins have the greatest impact on pitcher salaries. The top batting statistics have a slightly stronger correlation with salary than pitching statistics, indicating that teams value batting more than pitching. 

This is a natural assumption to make for various reasons. In a league where scoring runs wins championships, the last year a pitcher won the Most Valuable Player award was 2014, and only 6 pitchers have won the World Series MVP award in the 21st century. The best pitchers in the game cannot win by themselves--teams need batters to score.

Author
===

Column {data-width=500}
---

### About the Author

My name is Kevin O'Connell and I am a junior at the University of Dayton pursuing a Bachelor's of Arts in Economics with minors in Data Analytics and Sociology. My projected graduation is May 2025.

This summer, I will be working for The Mather Group, a registered investment advisory firm, as a wealth management intern in Chicago, Illinois. Upon graduation, I hope to work in the financial advising industry.

As far as experience, I currently work as the Product Manager of Heritage Coffeehouse, a division of Flyer Enterprises. I am responsible for all ordering, inventory tracking, and product pricing of the entirely student-run coffee shop. Last summer, I interned with the Illinois Department of Employment Security, where I worked with data validation in Microsoft Excel and managed a social media campaign on long and short-term employment projections.

Feel free to connect with me on LinkedIn [here](https://www.linkedin.com/in/kevino-connell/)