Author Archives: Henry

Learning ggplot2 on Paper – Components

Kindle Vocabulary Builder Export For Free

In the past two months, I spent most of the time on Tidyverse and Harry Potter books and finally I completed all of them lately, feeling appreciated in the quarantine life.

My quarantine life in the past two months: #tidyverse and #HarryPotter. Feel like #hogwarts having a data science course now. #RStats #QuarantineLife pic.twitter.com/K1oEmlUpyi
— Henry Wang (@henrywangnl) May 18, 2020

These two different things connected together when I was looking for ways to export Kindle Vocabulary Builder, which is a vocabulary list consists of the words I look up in the kindle.

Continue reading →

Probability in R

Binomial Distribution

Let’s say I’m flipping a fair coin (50% chance of “heads” and 50% chance of “tails”) 10 times in an experiment, how many “heads” I will end up with?

If lucky, I may have 10 “heads” at most, or on the other hand, I don’t have any “heads” at all, but most likely it is in between. Let’s suppose the number of “heads” (X) is 4 at my first try.

Continue reading →

ggplot2 Theme Elements Demonstration

1 Reply

ggplot2 is one of the most powerful packages in r for data visualizations and it is essential to master the underlying grammar of graphics to fully utilize its power. While the theming system of ggplot2 allows you to customize the appearance of the plot according to our needs in practice, it is always a frustration to identify the elements on the plot you want to change as you may find it difficult to remember the element names and their corresponding functions to modify, at least this is the case for me.

You may find it difficult to identify the theme elements on the plot you want to change, let alone remember the element names or the corresponding functions to modify. I hope this demonstration and the example could help you: https://t.co/EnPjCMiXKn. #RStats #ggplot2 #tidyverse pic.twitter.com/oOYkACPGa9
— Henry Wang (@henrywangnl) May 10, 2020

Continue reading →

Media Mix Modeling in R

1 Reply

While attribution measurements are widely used in the digital marketing field, Media Mix Modeling (MMM) still plays an important role in evaluating marketing effectiveness across multiple channels at a higher level. Here is an example of how to do MMM in R with a free dataset from Kaggle.

Data Preparation

library(tidyverse)

# Import data
media.raw <- read_csv("mediamix.csv")

# Tidy data
media <- media.raw %>%
      mutate(TV = tv_sponsorships + tv_cricket + tv_RON) %>%
      mutate(Digital = rowSums(.[9:13])) %>%
      select(TV, radio, Magazines, OOH, Digital, sales) %>%
      rename(Radio = radio, Sales = sales)

# Examining data
View(media)

Three TV-related channels are combined as TV variable. Similarly, the Digital variable is computed from channels such as Social, Display, Search, etc. The final data structure is shown as follows. This article is to examine the relationship between the dependent variable of Sales and the independent variables of TV, Radio, Magazines, OOH, and Digital. The numbers represent the media cost across channels.

Continue reading →