Tag Archives: R

Kindle Vocabulary Builder Export For Free

Why Export Kindle Vocabulary Builder?

You may decide to export Kindle Vocabulary Builder for the following two reasons. First, Kindle only keeps up to a maximum of 2000 words, so your new words after 2000 will not be displayed in Kindle. Exporting and cleaning the vocabulary could help you backup it on a regular basis and keep reading.

Second, exporting the vocabulary out of kindle actually allows you to learn the vocabulary in a more efficient way. One of the features in Kindle Vocabulary Builder is that it keeps a record of the context (“usage”) every time you look up a word. That means each vocabulary keeps your lookup frequency. For example, below is the lookup frequency for my vocabulary from Harry Potter books.

harry potter vocabulary lookup frequency

It shows that most of the vocabulary (77%) were only looked up once across seven Harry Potter books. Thus, I don’t think it makes sense to study this vocabulary. However, you can not distinguish this vocabulary in Kindle Vocabulary Builder, making it less efficient.

Continue reading →

Probability in R

I can still remember the nightmare when I first studied Statistics in my Bachelor a few years ago. It was a nightmare because we were asked to do all the exercises and exams on paper without cheat sheets and even calculators. However, the Statistics course in my Master turned out much more interesting as all the exercises and exams were conducted online and we were allowed to use Excel formulas as well. A couple of days ago when I came across Foundations of Probability in R on DataCamp I realized that learning statistics could be much better with random simulation and data visualization in R. This course is quite fundamental and nothing new to me but the idea behind it is very appealing.

Binomial Distribution

Let’s say I’m flipping a fair coin (50% chance of “heads” and 50% chance of “tails”) 10 times in an experiment, how many “heads” I will end up with?

If lucky, I may have 10 “heads” at most, or on the other hand, I don’t have any “heads” at all, but most likely it is in between. Let’s suppose the number of “heads” (X) is 4 at my first try.

Continue reading →

ggplot2 Theme Elements Demonstration

ggplot2 is one of the most powerful packages in r for data visualizations and it is essential to master the underlying grammar of graphics to fully utilize its power. While the theming system of ggplot2 allows you to customize the appearance of the plot according to our needs in practice, it is always a frustration to identify the elements on the plot you want to change as you may find it difficult to remember the element names and their corresponding functions to modify, at least this is the case for me.

You may find it difficult to identify the theme elements on the plot you want to change, let alone remember the element names or the corresponding functions to modify. I hope this demonstration and the example could help you: https://t.co/EnPjCMiXKn. #RStats #ggplot2 #tidyverse pic.twitter.com/oOYkACPGa9
— Henry Wang (@henrywangnl) May 10, 2020

Continue reading →

Media Mix Modeling in R

While attribution measurements are widely used in the digital marketing field, Media Mix Modeling (MMM) still plays an important role in evaluating marketing effectiveness across multiple channels at a higher level. Here is an example of how to do MMM in R with a free dataset from Kaggle.

Data Preparation

library(tidyverse)

# Import data
media.raw <- read_csv("mediamix.csv")

# Tidy data
media <- media.raw %>%
      mutate(TV = tv_sponsorships + tv_cricket + tv_RON) %>%
      mutate(Digital = rowSums(.[9:13])) %>%
      select(TV, radio, Magazines, OOH, Digital, sales) %>%
      rename(Radio = radio, Sales = sales)

# Examining data
View(media)

Three TV-related channels are combined as TV variable. Similarly, the Digital variable is computed from channels such as Social, Display, Search, etc. The final data structure is shown as follows. This article is to examine the relationship between the dependent variable of Sales and the independent variables of TV, Radio, Magazines, OOH, and Digital. The numbers represent the media cost across channels.

Continue reading →

Mediation Analysis in SPSS and R

Background

In the past few months I have been working on my Master’s Thesis and I just completed the data analysis part, leaving the discussion and conclusion parts. For some reason, this data analysis is conducted in SPSS and I’m always wondering if I could repeat it in R. The topic of my Master’s Thesis is “The influence of IT Capability on New Product Development Performance”. Specifically, my research question is “To what extent does IT Capability relate to NPD Performance and to what extent does NPD Process mediate the relationship?”.

The data was collected with a questionnaire that is designed based on a thorough literature review. After data cleaning, construct validity tests with Exploratory Factor Analysis and construct reliability tests, the latent constructs are computed and the enhanced operational model is presented below:

Continue reading →