What's the most successful Dancing With the Stars "Profession"? Visualizing with {gt}

Nov 24, 2020 58 min read R, DataViz

Motivation

During this pandemic I’ve found a source of comfort in Dancing with the Stars (DWTS). I’ve never watched any other season before and I think a large part of starting now are:

Lack of anything else to watch
The rapper Nelly (and the St. Lunatics) have a near and dear place in my heart.

On the R front, I’ve wanted to mess around with the gt package for a while now but hadn’t had a great reason to. However, I had originally wanted to do a post on whether DWTS has “score inflation” throughout the season, but that wound up being more complicated than I would have liked. So instead why not answer what is the most successful type of star on Dancing with the Stars.

And on the gt front a huge shout-out to Kaustav Sen whose post on gt for the Great American Beer Festival served as a large design inspiration for this post.

The Final Output

At the end of this post, the final output for the table will look like:

The Pre-Processing

Load the Libraries

The main focus of this post is on the gt package to make the table, however, other packages are used to get and work with the data.

library(rvest) #Web Scrape Wikipedia
library(tidyverse) #Data Manipulation / Plots
library(lubridate) #Date Manipulation
library(gt) #Making Fancy Tables / The Focus of This Post
library(glue) #Text Manipulation

Getting all the DWTS Contestants

In order to find the most successful star type, we need to get a list of all the contestants. Fortunately, Wikipedia has a page for every season and on those pages has a list of information about the contestants including, their name, what their known for, and their status from the season.

Since there are 29 completed Dancing with the Stars seasons this seems like a job for a function to iterate through each season’s Wikipedia page to extract that table. One note is that Season 15 was an all-star season so it will be excluded from this analysis Unfortunately, the contestant table isn’t always in the same place on the page, so the function will need to be a little flexible.

dwts_constants <- function(season_number, tbl_number){
  read_html(glue('https://en.wikipedia.org/wiki/Dancing_with_the_Stars_(American_season_{season_number})')) %>% 
    html_nodes('table') %>% 
    .[[tbl_number]] %>%
    html_table() %>% 
    mutate(season = season_number) %>% 
    janitor::clean_names()
}

Given a season number, and the number table on the page to extract the above function will extract and lightly clean the data. The following code will append all of the contestants on top of each other.

contestants <- dwts_constants(1, 2) %>% 
  bind_rows(dwts_constants(2, 3) ) %>% 
  bind_rows(dwts_constants(3, 3) ) %>% 
  bind_rows(dwts_constants(4, 3) ) %>% 
  bind_rows(dwts_constants(5, 2) ) %>% 
  bind_rows(dwts_constants(6, 2) ) %>% 
  bind_rows(dwts_constants(7, 2) ) %>% 
  bind_rows(dwts_constants(8, 2) ) %>% 
  bind_rows(dwts_constants(9, 2) ) %>% 
  bind_rows(dwts_constants(10, 2)) %>% 
  bind_rows(dwts_constants(11, 2)) %>% 
  bind_rows(dwts_constants(12, 2)) %>% 
  bind_rows(dwts_constants(13, 2)) %>% 
  bind_rows(dwts_constants(14, 3)) %>% 
  #bind_rows(dwts_constants(15, 2)) %>%  #Season 15 is an All-Star Season
  bind_rows(dwts_constants(16, 2)) %>% 
  bind_rows(dwts_constants(17, 2)) %>% 
  bind_rows(dwts_constants(18, 2)) %>% 
  bind_rows(dwts_constants(19, 2)) %>% 
  bind_rows(dwts_constants(20, 2)) %>% 
  bind_rows(dwts_constants(21, 2)) %>% 
  bind_rows(dwts_constants(22, 2)) %>% 
  bind_rows(dwts_constants(23, 2)) %>% 
  bind_rows(dwts_constants(24, 2)) %>% 
  bind_rows(dwts_constants(25, 2)) %>% 
  bind_rows(dwts_constants(26, 2)) %>% 
  bind_rows(dwts_constants(27, 2)) %>% 
  bind_rows(dwts_constants(28, 2)) %>% 
  bind_rows(dwts_constants(29, 2))

Directly from this function the raw data looks like:

celebrity	notability_known_for	professional_partner	status	season	result	professional_partner_a	ref	professional_partner_a_7	celebrity_12_13
Trista Sutter	The Bachelorette star	Louis Van Amstel	Eliminated 1ston June 8, 2005	1	NA	NA	NA	NA	NA
Evander Holyfield	Heavyweight boxer	Edyta Sliwinska	Eliminated 2ndon June 15, 2005	1	NA	NA	NA	NA	NA
Rachel Hunter	Supermodel	Jonathan Roberts	Eliminated 3rdon June 22, 2005	1	NA	NA	NA	NA	NA

Cleaning the data

Looking at the raw data there is a lot of data cleaning to be done:

The contestant’s result shows up in two different columns (result, status)
The result field has both placing information as well as dates for when they were either eliminated or won. For example Eliminated 1st needs to be turned into placed last (depending on how many contestants there were that season)
The data contains contestants who withdrew so their place had nothing to do with their “Profession”
The result field can be cleaned up to be standardized
The notability field needs to be standardized

All of these steps are handled in the following code:

contestant_clean <- contestants %>%
  mutate(
    #Compress Fields That Have Different Names Per Season
    result = coalesce(result, status),
    #Get the dates when Eliminations / Wins Happen
    status_date = mdy(str_extract(result, "\\w+ \\d+, \\d{4}")),
    #Get the Order of Elimination
    eliminated_state = str_extract(result, "Eliminated \\d+") %>% 
      str_remove('Eliminated ') %>%
      as.numeric()
  ) %>% 
  # Remove Contestants that Withdraw
  filter(!str_detect(result, 'Withdrew')) %>% 
  group_by(season) %>% 
  # Add the number of contestants for each season
  mutate(n_contestants = n()) %>% 
  ungroup() %>% 
  #Overwrite Places for 1st/2nd/3rd
  mutate(
    place = case_when(
      str_detect(result, "Winner") ~ 1,
      str_detect(result, "Runner|Second") ~ 2,
      str_detect(result, "Third") ~ 3,
      str_detect(result, "Fourth") ~ 4,
      TRUE ~ n_contestants - eliminated_state + 1
    ),
    # Standardize What Contestants Are "Known For"
    known_for = case_when(
      str_detect(str_to_lower(notability_known_for), 
                 'actor|actress|disney') ~ 'Actor/Actress',
      str_detect(str_to_lower(notability_known_for), 
                 'singer|rapper|band|composer') ~ 'Musician',
      str_detect(str_to_lower(notability_known_for), 
                 'model|miss usa') ~ 'Model',
      str_detect(str_to_lower(notability_known_for),
                 'nhl|nfl|nba|boxer|olympi|diva|tennis|soccer|football|lakers|swim|ufc|nascar|snowboard|wwe|mlb|basketball|rodeo|skier|race car|jockey|dolphins|steelers|packers|lakers|indy 500') ~ 'Athlete',
      str_detect(str_to_lower(notability_known_for), 
                 'journ|anchor|host|caster|personality') ~ 'Media Personality',
      str_detect(str_to_lower(notability_known_for), 
                 'bachelor|star|chef') ~ 'Reality TV Star',
      str_detect(str_to_lower(notability_known_for), 
                 'comedian|magician|entertainer') ~ 'Entertainer',
      str_detect(str_to_lower(notability_known_for), 
                 'owner|co-founder|business|designer') ~ 'Businessperson',
      TRUE ~ "Other"
    )
  ) %>% 
  # Fix Celebrity Column for Season 29
  mutate(celebrity = if_else(is.na(celebrity), celebrity_12_13, celebrity)) %>% 
  # Remove Unneeded Columns
  select(-contains('professional'), -ref, -status, 
         -eliminated_state, -celebrity_12_13) %>% 
  #Want Scores to be between 0 and 1 where 1 is Last Place and 0 is first place.
  mutate(scaled_place = (place-1)/(n_contestants-1))

The scaled_place variable will be used to create a standardized density plot by putting each season on a 1 (Last Place) to 0 (1st Place) scale regardless of the number of contestants in the season. The cleaned data now looks like:

celebrity	notability_known_for	season	result	status_date	n_contestants	place	known_for	scaled_place
Trista Sutter	The Bachelorette star	1	Eliminated 1ston June 8, 2005	2005-06-08	6	6	Reality TV Star	1.0
Evander Holyfield	Heavyweight boxer	1	Eliminated 2ndon June 15, 2005	2005-06-15	6	5	Athlete	0.8
Rachel Hunter	Supermodel	1	Eliminated 3rdon June 22, 2005	2005-06-22	6	4	Model	0.6
Joey McIntyre	New Kids on the Block singer	1	Third placeon June 29, 2005	2005-06-29	6	3	Musician	0.4
John O’Hurley	Actor & game show host	1	Runner-upon July 6, 2005	2005-07-06	6	2	Actor/Actress	0.2

Using Regular Expressions, I’ve collapsed 237 different levels into 9 which are:

Profession	Examples
Actor/Actress	Zendaya, Alexa PenaVega, Amber Riley
Athlete	Jamie Anderson, Antonio Brown, Martina Navratilova
Businessperson	Steve Wozniak, Robert Herjavec, Mark Cuban
Entertainer	Penn Jillette, Marie Osmond, Margaret Cho
Media Personality	Jerry Springer, Bobby Bones, Giselle Fernandez
Model	Bonner Bolton, Shandi Finnessey, Sailor Brinkley-Cook
Musician	Joey McIntyre, Gavin DeGraw, Nick Carter
Other	Sean Spicer, Buzz Aldrin, Noah Galloway
Reality TV Star	The Situation, Lisa Vanderpump, Terra Jolé

Constructing The Table

Organizing the Data

For the table, the information we want is for each “Profession”:

How many contestants were there?
What percentages came in 1st, 2nd, 3rd, and Last?

Some quick dplyr magic will allow us to collapse the list of contestants into the structure we want. We’ll also set the order of the table by the descending percentage of first place wins by “profession”.

contestant_summary <- contestant_clean %>% 
  group_by(known_for) %>% 
  summarize(
    num_stars = n(),
    pct_1st_place = sum(place == 1)/n(),
    pct_2nd_place = sum(place == 2)/n(),
    pct_3rd_place = sum(place == 3)/n(),
    pct_last_place = sum(n_contestants == place) / n()
  ) %>% 
  arrange(-pct_1st_place)

Using {gt} to Build the Table

Now onto actually constructing the table with gt. The gt package provides a grammar for tables similar to what ggplot2 does for charts. The package provides this visualization to show the different parts of a table:

Step 1: The basic construction

The most basic construction of a table is done by using the gt() function.

(g1 <- gt(contestant_summary))

known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

When I said basic, I meant basic.

Step 2: Adding Titles and Subtitles

The tab_header() function allows alterations to the header of the table. The title and subtitle arguments create the title and subtitle respectively. A nice feature of gt is the html() function will allows the use of HTML and CSS to style these titles. There is also a md() function that allows for markdown rendering.

(g2 <- g1 %>% 
  tab_header(
    title = html('Most <span style="color:#F2CB05">Successful</span> Dancing With the Stars <i>"Professions"</i>'),
    subtitle = html(
      "<span style = 'color: grey'>Covering Seasons 1 to 29 (excluding All-Star Season 15)</span>"
    )
  ))

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

Step 3: Adding More Style to the Title

The tab_style() function adds various formatting to the table rows and cells. The style section of the arguments tells gt what the style will be and the location argument says where that style should be applied.

The google_font() function allows access to all the fonts on the Google Fonts site.

In this step I’m making the title left-justified, size XX-Large, and using the Anton font.

(g3 <- g2 %>% 
  tab_style(
    style = cell_text(
      font = google_font("Anton"), 
      align = "left", 
      size = "xx-large"
    ),
    locations = cells_title("title")
  )
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

Step 4: Add Styling to the Subtitles

Similar to step 3, this step applies formatting to the subtitle

(g4 <- g3 %>% 
  tab_style(
    style = cell_text(
      font = google_font("Caveat"),
      align = "left", 
      size = "x-large"
    ),
    locations = cells_title("subtitle")
  ) 
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

Step 5: Adding a Spanner Column

A spanner column is a column header that is merged across a number of different columns. It is added with the tab_spanner() function:

(g5 <- g4 %>% 
  tab_spanner(
    label = "Distribution of Results",
    columns = 3:6
  )
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	Distribution of Results
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

Step 6: Styling the Spanner

Similar to the title and subtitle, we can use tab_style() to apply specific styles to the spanner via the cells_column_spanners() function.

(g6 <- g5 %>% 
  tab_style(
    style = cell_text(
      font = google_font("Courgette"), 
      size = "medium", 
      weight = "bold"
    ),
    locations = cells_column_spanners("Distribution of Results")
  )
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	Distribution of Results
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

Step 7: Style the Column Headers and the Profession Column

You can apply the same style to different parts of the table by using a list() for the locations argument. Here the style is being applied to all column labels (cells_column_labels(everything())) and to the values in the first column (cells_body(columns = 1)).

(g7 <- g6 %>% 
  tab_style(
    style = cell_text(
      font = google_font("Secular One"), 
      size = "large"
    ),
    locations = list(
      cells_column_labels(everything()), 
      cells_body(columns = 1)
    )
  )  
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	Distribution of Results
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

Step 8: Styling the cells

Applying a center alignment to the 2nd through 6th columns.

(g8 <- g7 %>% 
  tab_style(
    style = cell_text(
      font = google_font("Spartan"), 
      size = "medium",
      align = 'center'
    ),
    locations = cells_body(columns = 2:6)
  )
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	Distribution of Results
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	0.13924051	0.10126582	0.06329114	0.10126582
Musician	38	0.07894737	0.10526316	0.15789474	0.13157895
Actor/Actress	130	0.07692308	0.09230769	0.06923077	0.04615385
Reality TV Star	26	0.07692308	0.07692308	0.07692308	0.03846154
Model	14	0.07142857	0.07142857	0.00000000	0.07142857
Media Personality	21	0.04761905	0.04761905	0.04761905	0.23809524
Businessperson	5	0.00000000	0.00000000	0.00000000	0.20000000
Entertainer	5	0.00000000	0.00000000	0.20000000	0.40000000
Other	9	0.00000000	0.00000000	0.44444444	0.00000000

Step 9: Turn Cell Decimals to Percentages

There are a number of fmt_* functions to handle formatting for values. The fmt_percent function will apply a percent format to all the columns beginning with “pct_”. While this is the first instance of using tidyselect syntax for telling gt what columns to use, the package can take names, column numbers, or tidyselect.

(g9 <- g8 %>% 
  fmt_percent(
    columns = starts_with('pct'),
    decimals = 1,
    drop_trailing_zeros = TRUE
  )
 )

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
known_for	num_stars	Distribution of Results
known_for	num_stars	pct_1st_place	pct_2nd_place	pct_3rd_place	pct_last_place
Athlete	79	13.9%	10.1%	6.3%	10.1%
Musician	38	7.9%	10.5%	15.8%	13.2%
Actor/Actress	130	7.7%	9.2%	6.9%	4.6%
Reality TV Star	26	7.7%	7.7%	7.7%	3.8%
Model	14	7.1%	7.1%	0%	7.1%
Media Personality	21	4.8%	4.8%	4.8%	23.8%
Businessperson	5	0%	0%	0%	20%
Entertainer	5	0%	0%	20%	40%
Other	9	0%	0%	44.4%	0%

Step 10: Have some fun by turning column headers into emojis

Like other markdown text in R gt can also support emojis! Here we can add in medals for 1st, 2nd, and 3rd…. and a personal favorite emoji to represent last. Emojis can be added into markdown through the emo::ji() function.

(g10 <- g9 %>% 
  cols_label(
    known_for = "",
    num_stars = paste0("# ",emo::ji('star'), "s"),
    pct_1st_place = paste0(emo::ji("1st_place_medal"), "(1st)"),
    pct_2nd_place = paste0(emo::ji("2nd_place_medal"), "(2nd)"),
    pct_3rd_place = paste0(emo::ji("3rd_place_medal"), "(3rd)"),
    pct_last_place = paste0(emo::ji("poo"), " (last)")
  )
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
	# ⭐s	Distribution of Results
	# ⭐s	🥇(1st)	🥈(2nd)	🥉(3rd)	💩 (last)
Athlete	79	13.9%	10.1%	6.3%	10.1%
Musician	38	7.9%	10.5%	15.8%	13.2%
Actor/Actress	130	7.7%	9.2%	6.9%	4.6%
Reality TV Star	26	7.7%	7.7%	7.7%	3.8%
Model	14	7.1%	7.1%	0%	7.1%
Media Personality	21	4.8%	4.8%	4.8%	23.8%
Businessperson	5	0%	0%	0%	20%
Entertainer	5	0%	0%	20%	40%
Other	9	0%	0%	44.4%	0%

Step 11: Add a source and do some formatting

There a couple things going on in this step:

I’m adding a source line with tab_source_note() and using md() to allow me to use markdown style formatting.
I’m using tab_options() to remove the top border from the table and shrink the gaps between the rows in the table.
I’m using cols_width() to tell gt to make the first column 200px wide

(g11 <- g10 %>%
  tab_source_note(md("**Data:** DWTS Wikipedia Articles | **Table Author:** JLaw")) %>%
  tab_options(
    table.border.top.color = "white",
    data_row.padding = px(0),
  ) %>% 
  cols_width(
    1 ~ px(200),
  )
)

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
	# ⭐s	Distribution of Results
	# ⭐s	🥇(1st)	🥈(2nd)	🥉(3rd)	💩 (last)
Athlete	79	13.9%	10.1%	6.3%	10.1%
Musician	38	7.9%	10.5%	15.8%	13.2%
Actor/Actress	130	7.7%	9.2%	6.9%	4.6%
Reality TV Star	26	7.7%	7.7%	7.7%	3.8%
Model	14	7.1%	7.1%	0%	7.1%
Media Personality	21	4.8%	4.8%	4.8%	23.8%
Businessperson	5	0%	0%	0%	20%
Entertainer	5	0%	0%	20%	40%
Other	9	0%	0%	44.4%	0%
Data: DWTS Wikipedia Articles \| Table Author: JLaw

Step 12: Adding a Color Scale for the % Columns

The data_color function allows for doing conditional formatting based on the values in the columns. The columns argument allows to specific which colors should receive the formatting. The colors argument defines the palette. And the apply_to argument can take the values of “fill” to fill the background or “text” to change the color of the text.

(g12 <- g11 %>% 
  data_color(
    columns = vars(pct_1st_place, pct_2nd_place, pct_3rd_place, pct_last_place),
    colors = scales::col_numeric(
      palette = c("white", "#3fc1c9"),
      #F2CB05 = Gold COlor
      domain = NULL
    ),
    apply_to = "fill",
  )
 )

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
	# ⭐s	Distribution of Results
	# ⭐s	🥇(1st)	🥈(2nd)	🥉(3rd)	💩 (last)
Athlete	79	13.9%	10.1%	6.3%	10.1%
Musician	38	7.9%	10.5%	15.8%	13.2%
Actor/Actress	130	7.7%	9.2%	6.9%	4.6%
Reality TV Star	26	7.7%	7.7%	7.7%	3.8%
Model	14	7.1%	7.1%	0%	7.1%
Media Personality	21	4.8%	4.8%	4.8%	23.8%
Businessperson	5	0%	0%	0%	20%
Entertainer	5	0%	0%	20%	40%
Other	9	0%	0%	44.4%	0%
Data: DWTS Wikipedia Articles \| Table Author: JLaw

This looks pretty good… but we can do better!!!

Turning it up to 11 by adding in Density Plots

In order to add in ggplots into a row in the table we need to:

Build a function to create the plot for each row of the table
Use purrr:map() to add the plot as a list-column to the table
Use gt::text_transform to insert the image into the table

NOTE: Since this required making a new data set much of the gt code is repeating the first section but is provided in its entirely for completeness.

Writing the function to build the chart

For the function I want it to take a “profession” and return a density part using the scaled_place variable defined at the top. The function takes in a profession label and a dataset and returns a density plot.

plot_dens <- function(profession, data) {
  
  plot_data <- 
    data %>% 
    filter(known_for == {{ profession }}) 
  
  plot <- 
    plot_data %>% 
    ggplot(aes(x = scaled_place)) +
    geom_density(aes(y = ..scaled..), fill = 'gold') +
    annotate("text", x = 0, y = -.05, 
             label = "1st\nPlace", size = 10, color = "grey40", vjust = 1) +
    annotate("text", x = 1, y = -.05, 
             label = "Last\nPlace", size = 10, color = "grey40", vjust = 1) +
    coord_cartesian(
      xlim = c(-.1, 1.1),
      ylim = c(-.7, NA)
    ) + 
    theme_void()
  
  plot
  
}

Adding the plots into the data set

The main part of this step is using the map() function to iterate through the professions and use them as input into the function defined above. The column plots is a list-column containing all the ggplot information.

The left join is because I want to add in a column for the most recent winner in each category.

contestant_summary_with_graph <- contestant_summary %>% 
  mutate(plots = purrr::map(contestant_summary$known_for %>% unique, 
                            plot_dens, data = contestant_clean)) %>% 
  left_join(
  ###Add in Recent Winner Images
  contestant_clean %>% 
    filter(place == 1) %>% 
    group_by(known_for) %>% 
    slice_max(season, n = 1) %>% 
    select(celebrity, season, known_for) %>% 
    ungroup() %>% 
    transmute(
      known_for,
      lbl = paste0(celebrity,' (Season ',season,")")
    )
  )

Creating the Final Table

In order to turn the plots into columns the text_transform() function is used to take the plots column and run a function that calls ggplot_image with certain height and aspect ratio parameters on each row in the table.

text_transform(
    locations = cells_body(vars(plots)),
    fn = function(x) {
      map(contestant_summary_with_graph$plots, ggplot_image, 
          height = px(120), aspect_ratio = 1.5)
    }
  )

Now we can put it all together. Besides adding in the plots, there’s a few steps to format the Most Recent Winner cell. But nothing that hasn’t been covered earlier.

#Base Table
gt(contestant_summary_with_graph) %>% 
  #Add Titles
  tab_header(
    title = html('Most <span style="color:#F2CB05">Successful</span> Dancing With the Stars <i>"Professions"</i>'),
    subtitle = html(
      "<span style = 'color: grey'>Covering Seasons 1 to 29 (excluding All-Star Season 15)</span>"
    )
  ) %>% 
  #Format Title
  tab_style(
    style = cell_text(
      font = google_font("Anton"), 
      align = "left", 
      size = "xx-large"
    ),
    locations = cells_title("title")
  ) %>% 
  #Format Subtitle
  tab_style(
    style = cell_text(
      font = google_font("Caveat"),
      align = "left", 
      size = "x-large"
    ),
    locations = cells_title("subtitle")
  )  %>% 
  #Adding Spanning Column
  tab_spanner(
    label = "Distribution of Results",
    columns = 3:7
  ) %>% 
  #Style The Spanner Column
  tab_style(
    style = cell_text(
      font = google_font("Courgette"), 
      size = "medium", 
      weight = "bold"
    ),
    locations = cells_column_spanners("Distribution of Results")
  ) %>% 
  #Style the Column Labels and Profession Column
  tab_style(
    style = cell_text(
      font = google_font("Secular One"), 
      size = "large"
    ),
    locations = list(
      cells_column_labels(everything()), 
      cells_body(columns = 1)
    )
  )  %>% 
  #Style the Cells
  tab_style(
    style = cell_text(
      font = google_font("Spartan"), 
      size = "medium",
      align = 'center'
    ),
    locations = cells_body(columns = 2:6)
  ) %>% 
  #Format Cells to %s
  fmt_percent(
    columns = starts_with('pct'),
    decimals = 1,
    drop_trailing_zeros = TRUE
  ) %>% 
  #Turn Headers to Emojis
  cols_label(
    known_for = "",
    num_stars = paste0("# ",emo::ji('star'), "s"),
    pct_1st_place = paste0(emo::ji("1st_place_medal"), "(1st)"),
    pct_2nd_place = paste0(emo::ji("2nd_place_medal"), "(2nd)"),
    pct_3rd_place = paste0(emo::ji("3rd_place_medal"), "(3rd)"),
    pct_last_place = paste0(emo::ji("poo"), " (last)"),
    plots = "",
    lbl = "Most Recent Winner"
  ) %>% 
  ###Add in Source and Doing Some Minor Formatting
  tab_source_note(md("**Data:** DWTS Wikipedia Articles | **Table Author:** JLaw")) %>%
  tab_options(
    table.border.top.color = "white",
    data_row.padding = px(0),
  ) %>% 
  cols_width(
    1 ~ px(200)
  ) %>% 
###Add a Color Scale for 1st Place
  data_color(
    columns = vars(pct_1st_place, pct_2nd_place, pct_3rd_place, pct_last_place),
    colors = scales::col_numeric(
      palette = c("white", "#3fc1c9"),
      #F2CB05 = Gold COlor
      domain = NULL
    ),
    apply_to = "fill",
  ) %>% 
  ######################NEW THINGS START HERE#########################
  # Add In Density Plots (NEW)
  text_transform(
    locations = cells_body(vars(plots)),
    fn = function(x) {
      map(contestant_summary_with_graph$plots, ggplot_image, 
          height = px(120), aspect_ratio = 1.5)
    }
  ) %>% 
  text_transform(
    locations = cells_body(vars(lbl)),
    fn = function(x){
      if_else(!is.na(x), str_replace_all(x, " \\(", "<br> \\("), "")
    }
  ) %>% 
  tab_style(
    style = cell_text(
      style = 'italic',
      size = px(13),
      v_align = 'middle',
      align = 'left'
    ),
    locations = cells_body(columns = vars(lbl))
  ) %>%
  cols_width(
    8 ~ px(100)
  )

Most Successful Dancing With the Stars "Professions"
Covering Seasons 1 to 29 (excluding All-Star Season 15)
	# ⭐s	Distribution of Results				Most Recent Winner
	# ⭐s	🥇(1st)	🥈(2nd)	🥉(3rd)	💩 (last)	Most Recent Winner
Athlete	79	13.9%	10.1%	6.3%	10.1%	Adam Rippon (Season 26)
Musician	38	7.9%	10.5%	15.8%	13.2%	Kellie Pickler (Season 16)
Actor/Actress	130	7.7%	9.2%	6.9%	4.6%	Jordan Fisher (Season 25)
Reality TV Star	26	7.7%	7.7%	7.7%	3.8%	Kaitlyn Bristowe (Season 29)
Model	14	7.1%	7.1%	0%	7.1%	Brooke Burke (Season 7)
Media Personality	21	4.8%	4.8%	4.8%	23.8%	Bobby Bones (Season 27)
Businessperson	5	0%	0%	0%	20%
Entertainer	5	0%	0%	20%	40%
Other	9	0%	0%	44.4%	0%
Data: DWTS Wikipedia Articles \| Table Author: JLaw

So what is the most successful “profession” in DWTS?

Seems pretty clearly to be the athletes as close to 14% of the Athletes have wound up winning. On the other end of the spectrum, the Media Personalities have faired less well with the lowest winning percentage of a group with 10+ stars and nearly 1 in 4 coming in last place… Reality TV Stars, while in the middle of the pack has been surging with ex-Bachelorettes winning the last two seasons (28 and 29).

gt rvest