# +: addition
5 + 3[1] 8
# -: subtraction
5 - 3[1] 2
# *: multiplication
5 * 3[1] 15
# /: division
5 / 3[1] 1.666667
# ^: exponentiation
5 ^ 3[1] 125
In this section, we are going to review the very basic structure of R. R has many built in functions, ranging from basic arithmetic operations to regression models. It’s like basic apps that come with your smartphones when you buy one. It doesn’t require anything else on your part, such as going to app store and downloading an app. Of course, you can also download other more advanced “packages” on R, just like you can download other apps on your phone! (More on this in a bit.)
First, let’s start with very basic arithmetic operations.
[1] 8
[1] 2
[1] 15
[1] 1.666667
[1] 125
R is an object based language, which means that we can save values into “objects”. To do so, you can use either = or <-. An object name can be anything of your choice, as long as it doest NOT start with a number or contain a space.
# use "<-" or "=" to assign
result <- 5 + 3
result = 5 + 3 # avoid using "=" for assignments
class(result) # check class of the object[1] "numeric"
You can also assign text as objects (use quotation marks). IF you forget the quotation marks, R will give you an error.
Reassigning a value to an existing object replaces the original.
[1] TRUE
[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] TRUE
[1] "logical"
A vector combines multiple values of the same type.
[1] 25 21 18 29 35
[1] "numeric"
[1] 5
[1] "cake" "banana" "dog" "apple"
[1] "character"
[1] 4
[1] TRUE FALSE TRUE FALSE
[1] "logical"
[1] 4
Combine multiple vectors:
[1] "25" "21" "18" "29" "35" "cake" "banana" "dog"
[9] "apple" "TRUE" "FALSE" "TRUE" "FALSE"
[1] "character"
[1] 13
Reminder: A vector must have a single type. Mixed types will be coerced.
Coercion hierarchy: character > numeric > logical
[1] 25 21 18 29 35 1 0 1 0
[1] "numeric"
Subset a vector:
[1] 25
[1] 21 18 29 35
[1] 21 18 29
[1] 25 21 18 29
Create numeric sequences:
[1] 1 2 3 4 5 6 7 8 9
[1] 1 1 1 1 1 1 1 1 1
[1] 1 2 3 4 5 6 7 8 9
[1] 1 3 5 7 9
[1] 9 7 5 3 1
Basic vector calculations:
[1] 35
[1] 18
[1] 18 35
[1] 128
[1] 25.6
[1] 9591750
[1] 6.69328
[1] 44.8
[1] 5.000000 4.582576 4.242641 5.385165 5.916080
[1] 18 21 25 29 35
[1] "apple" "banana" "cake" "dog"
[1] FALSE FALSE TRUE TRUE
Just like smartphones, R has many ready to be used applications. We call them “packages”. Inside a package, you can find different functions and datasets.
Just like you’d have to install and open up smartphone applications, we also need to install and bring up the packages.
⚠️ You have to “load” the packages every time you open up your R Studio using the library code.
If you get an error, install it first:
You only have to install a package ONCE. (Just like you only need to download an app ONCE.)
Now, I am finally introducing the tidyverse package, which is one of the most commonly used packages to clean, manipulate, and create visualizations of data. It is actually a compilation of multiple packages such as dplyr or ggplot2. If you want to know more, please refer to this website on tidyverse.
Let’s first install the tidyverse package and read data. ✅ If you would like to follow along, the data used here is available on the Topics page.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ dplyr::select() masks MASS::select()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 44 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (8): id, gender, division, year, participation, homework, midterm, final...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Below is a set of basic R functions that are useful to explore your data. (In the next section, we will start to use other functions in tidyverse to manipulate and wrangle the data.)
# A tibble: 6 × 8
id gender division year participation homework midterm final_exam
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 2 100 99 89 64
2 2 0 1 2 100 97 85 82
3 3 0 1 2 100 98 74 81
4 4 0 1 2 100 99 85 89
5 5 1 1 2 100 100 90 80
6 6 0 1 2 100 100 94 96
# A tibble: 6 × 8
id gender division year participation homework midterm final_exam
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 39 0 3 4 25 56 40 65
2 40 0 3 4 50 95 65 76
3 41 1 3 4 25 62 60 70
4 42 0 4 2 100 94 78 74
5 43 1 4 3 100 100 90 90
6 44 1 4 4 100 100 98 85
[1] "id" "gender" "division" "year"
[5] "participation" "homework" "midterm" "final_exam"
[1] 44 8
id gender division year
Min. : 1.00 Min. :0.0000 Min. :1.000 Min. :2.000
1st Qu.:11.75 1st Qu.:0.0000 1st Qu.:1.000 1st Qu.:2.000
Median :22.50 Median :0.0000 Median :1.000 Median :2.000
Mean :22.50 Mean :0.3182 Mean :1.932 Mean :2.523
3rd Qu.:33.25 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:3.000
Max. :44.00 Max. :1.0000 Max. :4.000 Max. :4.000
participation homework midterm final_exam
Min. : 0.00 Min. : 56.00 Min. : 40.00 Min. :46.00
1st Qu.: 75.00 1st Qu.: 82.75 1st Qu.: 70.00 1st Qu.:66.50
Median :100.00 Median : 97.00 Median : 81.00 Median :80.50
Mean : 84.09 Mean : 90.39 Mean : 78.43 Mean :76.39
3rd Qu.:100.00 3rd Qu.: 99.00 3rd Qu.: 89.00 3rd Qu.:86.25
Max. :100.00 Max. :100.00 Max. :100.00 Max. :97.00
You can access variables (columns) by linking the column name with a dataset.
[1] 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
[39] 0 0 1 0 1 1
[1] 100 100 100 100 100 100 100 100 75 75 100 100 75 100 100 100 75 75 0
[20] 100 75 50 75 100 50 100 100 100 100 100 100 100 100 75 50 75 75 100
[39] 25 50 25 100 100 100
You should also remember the “brackets” method to bring a specific row, column, or a cell.