Author Archives: Patrick

R Tips & Tricks: Write Data to Separate Excel Sheet Tabs

When working with colleagues and managers, sometimes they prefer to have data or analysis written out to an excel workbook so that they can look at the information and sort or filter it, however they see fit. Rather than saving each output to a single csv file and then spending time copying and pasting them all into a single excel sheet (different data outputs on different tabs), let’s see how we can write everything out to a multi-tab excel sheet from R in one shot using the {openxlsx} package.

Data

We will simulate some sports data. In this example, we have two data sources:

  1. Data about teams
  2. Data about players
### Libraries -----------------------
library(tidyverse)
library(openxlsx)


### Create Data -----------------------

teams <- data.frame(
  team = c("Bats", "Sharks", "Tigers", "Bisons"),
  stat1 = round(rnorm(n = 4, mean = 100, sd = 20), 1),
  stat2 = round(rnorm(n = 4, mean = 250, sd = 70), 1),
  stat3 = round(rnorm(n = 4, mean = 0, sd = 3), 2)
)

players <- data.frame(
  player = c("Tom", "Joe", "Karl", "Bob", "Ben", "Albert", "Simon", "Harold", "Ken", "Cal"),
  cool_stat1 = round(rnorm(n = 10, mean = 300, sd = 10), 1),
  cool_stat1 = round(rnorm(n = 10, mean = 20, sd = 5), 1),
  cool_stat1 = round(rnorm(n = 10, mean = 500, sd = 50), 1),
  cool_stat1 = round(rnorm(n = 10, mean = 10, sd = 15), 1)
)

 

Store the data frames in a named list

Next, we need to store each of the data frames we want in our excel output into a named list. Naming of the elements in the list is important as these will be the names given to the excel tabs.

## Store each data frame as a named list -----------------------
list_of_dataframes <- list(teams = teams,
                   players = players)

 

Write the Data to an Excel Workbook

Once we have the data sets we are interested in structured in a named list we are ready to use the {openxlsx} package to write the data to a multi-tab excel workbook.

First, we create an empty excel workbook object to store our results.

## Create an empty excel workbook object -----------------------
blank_excel <- createWorkbook()

 

Next, we loop over our list of data sets and store each data set on its own named tab.

## loop over the list storing each data set on its own tab in the blank excel workbook -----------
Map(function(df, tab_name){     
  
  addWorksheet(blank_excel, tab_name)
  writeData(blank_excel, tab_name, df)
  }, 
  
  list_of_dataframes, names(list_of_dataframes)
)

 

If you’ve done it correctly, in your R console you should see an output indicating each of the data sets that were placed into the workbook.

 

 

Finally, we want to save the results into an excel workbook that we can email out to our co-workers. We use the saveWorkbook() function to save the results to our working directory.

## Save the now populated excel workbook --------------------------------------------
saveWorkbook(blank_excel, file = "league_data.xlsx", overwrite = TRUE)

 

When we open the excel  workbook we see that we have both tabs, properly named for the data they contain.

Shiny – User Defined Chart Parameters

A colleague was working on a web app for his basketball team and asked me if there was a way to create a {shiny} web app that allowed the user to define which parameters they would like to see on the plot. I figured this would be something others might be interested in as well, so here we go!

Load Packages, Helper Functions & Data

I’ll use data from the Lahman baseball database (seasons 2017 – 2019). I’m also going to create two helper functions, one for calculating the z-scores for our stats of interest and one for calculating the t-value from the z-score. The t-value will put the z-score on a 0 to 100 scale for plotting purposes in our polar plot. Additionally, we will use these standardized scores to conditionally format colors on our {gt} table (but we will hide the standardized columns so that the user only sees the raw data and colors). Finally, I’m going to create both a wide and long format of the data as it will be easier to use one or the other, depending on the type of plot or table I am building.

#### Load packages ------------------------------------------------
library(tidyverse)
library(shiny)
library(Lahman)
library(gt)
library(plotly)

theme_set(theme_minimal() + 
            theme(
              axis.text = element_text(face = "bold", size = 12),
              legend.title = element_blank(),
              legend.position = "none"
            ) )

#### helper functions -------------------------------------------

z_score <- function(x){
  z = (x - mean(x, na.rm = T)) / sd(x, na.rm = T)
  return(z)
}

t_score <- function(x){ t = (x * 10) + 50 t = ifelse(t > 100, 100, 
             ifelse(t < 0, 0, t))
  return(t)
}


#### Get Data ---------------------------------------------------

dat <- Batting %>%
  filter(between(yearID, left = 2017, right = 2019),
         AB >= 200) %>% 
  group_by(yearID, playerID) %>%
  summarize(across(.cols = G:GIDP,
         ~sum(.x)),
         .groups = "drop") %>%
  mutate(ba = H / AB,
         obp = (H + BB + HBP) / (AB + HBP + SF),
         slg = ((H - X2B - X3B - HR) + X2B*2 + X3B*3 + HR*4) / AB,
         ops = obp + slg,
         hr_rate = H / AB) %>%
  select(playerID, yearID, AB, ba:hr_rate) %>%
  mutate(across(.cols = ba:hr_rate,
                list(z = z_score)),
         across(.cols = ba_z:hr_rate_z,
                list(t = t_score))) %>%
  left_join(People %>%
              mutate(name = paste(nameLast, nameFirst, sep = ", ")) %>%
              select(playerID, name)) %>%
  relocate(name, .before = yearID)


dat_long <- Batting %>%
  filter(between(yearID, left = 2017, right = 2019),
         AB >= 200) %>% 
  group_by(playerID) %>%
  summarize(across(.cols = G:GIDP,
                   ~sum(.x)),
            .groups = "drop") %>%
  mutate(ba = H / AB,
         obp = (H + BB + HBP) / (AB + HBP + SF),
         slg = ((H - X2B - X3B - HR) + X2B*2 + X3B*3 + HR*4) / AB,
         ops = obp + slg,
         hr_rate = H / AB) %>%
  select(playerID, AB, ba:hr_rate) %>%
  mutate(across(.cols = ba:hr_rate,
                list(z = z_score)),
         across(.cols = ba_z:hr_rate_z,
                list(t = t_score))) %>%
  left_join(People %>%
              mutate(name = paste(nameLast, nameFirst, sep = ", ")) %>%
              select(playerID, name)) %>%
  relocate(name, .before = AB) %>%
  select(playerID:AB, ends_with("z_t")) %>%
  pivot_longer(cols = -c(playerID, name, AB),
               names_to = "stat") %>%
  mutate(stat = case_when(stat == "ba_z_t" ~ "BA",
                          stat == "obp_z_t" ~ "OBP",
                          stat == "slg_z_t" ~ "SLG",
                          stat == "ops_z_t" ~ "OPS",
                          stat == "hr_rate_z_t" ~ "HR Rate"))


dat %>%
  head()

dat_long %>%
  head()

 

The Figures for our App

Before I build the {shiny} app, I wanted to first construct the three figures I will include. The code for these will be accessible in Github, but here is what they look like:

  • For the polar plot, I will allow the user to define which variables they want on the chart.
  • For the time series plot, I am going to create an interactive {plotly} chart that allows the user to select the stat they want to see and then hover over the player’s points and obtain information like the raw value and the number of at bats in the given season via a simple tool tip.
  • The table, as discussed above, will user conditional formatting to provide the user with extra context about how that player performed relative to his peers in a given season.

Because I don’t like to clutter up my {shiny} apps, I tend to build my plots and tables into custom functions. That way, all I need to do is set up a reactive() in the server to obtain the user selected data and then call the function on that data. Here are the functions for the three figures above.

## table function
tbl_func <- function(NAME){ dat %>%
  filter(name == NAME) %>%
  select(yearID, AB:hr_rate, ends_with("z_t")) %>%
  gt(rowname_col = "yearID") %>%
  fmt_number(columns = ba:hr_rate,
             decimals = 3) %>%
  cols_label(
    AB = md("**AB**"),
    ba = md("**Batting Avg**"),
    obp = md("**OBP**"),
    slg = md("**SLG**"),
    ops = md("**OPS**"),
    hr_rate = md("**Home Run Rate**")
  ) %>%
  tab_header(title = NAME) %>%
  opt_align_table_header(align = "left") %>%
  tab_options(column_labels.border.top.color = "transparent",
              column_labels.border.top.width = px(3),
              table.border.top.color = "transparent",
              table.border.bottom.color = "transparent") %>%
  cols_align(align = "center") %>%
  cols_hide(columns = ends_with("z_t")) %>%
    tab_style(
      style = cell_fill(color = "palegreen"),
      location = cells_body(
        columns = ba,
        rows = ba_z_t > 60
      )
    )  %>%
    tab_style(
      style = cell_fill(color = "red"),
      location = cells_body(
        columns = ba,
        rows = ba_z_t < 40 ) ) %>%
    tab_style(
      style = cell_fill(color = "palegreen"),
      location = cells_body(
        columns = obp,
        rows = obp_z_t > 60
      )
    )  %>%
    tab_style(
      style = cell_fill(color = "red"),
      location = cells_body(
        columns = obp,
        rows = obp_z_t < 40 ) ) %>%
    tab_style(
      style = cell_fill(color = "palegreen"),
      location = cells_body(
        columns = slg,
        rows = slg_z_t > 60
      )
    )  %>%
    tab_style(
      style = cell_fill(color = "red"),
      location = cells_body(
        columns = slg,
        rows = slg_z_t < 40 ) ) %>%
    tab_style(
      style = cell_fill(color = "palegreen"),
      location = cells_body(
        columns = ops,
        rows = ops_z_t > 60
      )
    )  %>%
    tab_style(
      style = cell_fill(color = "red"),
      location = cells_body(
        columns = ops,
        rows = ops_z_t < 40 ) ) %>%
    tab_style(
      style = cell_fill(color = "palegreen"),
      location = cells_body(
        columns = hr_rate,
        rows = hr_rate_z_t > 60
      )
    )  %>%
    tab_style(
      style = cell_fill(color = "red"),
      location = cells_body(
        columns = hr_rate,
        rows = hr_rate_z_t < 40
      )
    ) 
}


## Polar plot function
polar_plt <- function(NAME, STATS){ dat_long %>%
    filter(name == NAME,
           stat %in% STATS) %>%
    ggplot(aes(x = stat, y = value, fill = stat)) +
    geom_col(color = "white", width = 0.75) +
    coord_polar(theta = "x") +
    geom_hline(yintercept = seq(50, 50, by = 1), size = 1.2) +
    labs(x = "", y = "") +
    ylim(0, 100)
  
} 


## time series plot function
time_plt <- function(NAME, STAT){
  
  STAT <- case_when(STAT == "BA" ~ "ba",
                    STAT == "OBP" ~ "obp",
                    STAT == "SLG" ~ "slg",
                    STAT == "OPS" ~ "ops",
                    STAT == "HR Rate" ~ "hr_rate")
  
  stat_z <- paste0(STAT, "_z")
  
  p <- dat %>% 
    filter(name == NAME) %>%
    select(yearID, AB, STAT, stat_z) %>%
    setNames(., c("yearID", "AB", "STAT", "stat_z")) %>%
    ggplot(aes(x = as.factor(yearID), 
               y = stat_z,
               group = 1,
               label = NAME,
               label2 = AB,
               lable3 = STAT)) +
    geom_hline(yintercept = 0,
               size = 1.1,
               linetype = "dashed") +
    geom_line(size = 1.2) +
    geom_point(shape = 21,
               size = 6,
               color = "black",
               fill = "white") +
    ylim(-4, 4) 
  
  
  ggplotly(p)
  
}

 

Build the {shiny} app

The below code will construct the {shiny} app. We allow the user to select a player, select the stats of interest for the polar plot, and select the stat they’d like to track over time.

If you’d like to see a video of the app in use, CLICK HERE <shiny – user defined chart parameters>

If you want to run this yourself or build one similar to it you can access my code on GitHub.

 

#### Shiny App ---------------------------------------------------------------

## User Interface
ui <- fluidPage(
  
  titlePanel("MLB Hitters Shiny App\n2017-2019"),
  
  
  sidebarPanel(width = 3,
             selectInput("name",
                             label = "Choose a Player:",
                             choices = unique(dat$name),
                             selected = NULL,
                             multiple = FALSE),
              
              selectInput("stat",
                          label = "Choose stats for polar plot:",
                          choices = unique(dat_long$stat),
                          selected = NULL,
                          multiple = TRUE),
              
              selectInput("time_stat",
                          label = "Choose stat for time series:",
                          choices = unique(dat_long$stat),
                          selected = NULL,
                          multiple = FALSE)
  ),
  
  
  mainPanel(
    
    gt_output(outputId = "tbl"),
    
    fluidRow(
      
      column(6, plotOutput(outputId = "polar")),
      column(6, plotlyOutput(outputId = "time"))
    )
    
  )
)



server <- function(input, output){
  
  ## get player selected for table
  NAME <- reactive({ dat_long %>%
      filter(name == input$name) %>%
      distinct(name, .keep_all = FALSE) %>%
      pull(name)
    })
  
  ## get stats for polar plot
  polar_stats <- reactive({ dat_long %>%
      filter(stat %in% c(input$stat)) %>%
      pull(stat)
    
  })
  
  ## get stat for time series
  ts_stat <- reactive({ dat %>%
      select(ba:hr_rate) %>%
      setNames(., c("BA", "OBP", "SLG", "OPS", "HR Rate")) %>%
      select(input$time_stat) %>% 
      colnames()
    
  })
  
  ## table output
  output$tbl <- render_gt(
      tbl_func(NAME = NAME())
  )
  
  ## polar plot output
  output$polar <- renderPlot(
    
    polar_plt(NAME = NAME(),
              STAT = polar_stats())
    
  )
  
  ## time series plot output
  output$time <- renderPlotly(
    
    time_plt(NAME = NAME(),
             STAT = ts_stat())
    
  )

  
}





shinyApp(ui, server)