Independence: Otherwise, you rely on someone else having given you exactly the right tool
Honesty: Otherwise, you end up distorting your problem to match the tools you have
Clarity: Making your method something a machine can do disciplines your thinking and makes it public; that's science
Download and review at https://www.r-project.org/
Basic interaction with R is by typing in the console, a.k.a. terminal or command-line
You type in commands, R gives back answers (or errors)
Menus and other graphical interfaces are extras built on top of the console
2 sorts of things (objects): data and functions
Data: things like 7, “seven”, \( 7.000 \), the matrix \( \left[ \begin{array}{ccc} 7 & 7 & 7 \\ 7 & 7 & 7\end{array}\right] \)
Functions: things like \( \log{} \), \( + \) (two arguments), \( < \) (two), \( \mod{} \) (two), mean
(one)
A function is a machine which turns input objects (arguments) into an output object (return value), possibly with side effects, according to a definite rule
Different kinds of data object
All data is represented in binary format, by bits (TRUE/FALSE, YES/NO, 1/0)
TRUE
or FALSE
in RNA
, NaN
, etc.7+5
[1] 12
7-5
[1] 2
7*5
[1] 35
7^5
[1] 16807
7/5
[1] 1.4
7 %% 5
[1] 2
7 %/% 5
[1] 1
Comparisons are also binary operators; they take two objects, like numbers, and give a Boolean
7 > 5
[1] TRUE
7 < 5
[1] FALSE
7 >= 7
[1] TRUE
7 <= 5
[1] FALSE
7 == 5
[1] FALSE
7 != 5
[1] TRUE
Basically “and” and “or”:
(5 > 7) & (6*7 == 42)
[1] FALSE
(5 > 7) | (6*7 == 42)
[1] TRUE
typeof()
function returns the type
is.
foo()
functions return Booleans for whether the argument is of type foo
as.
foo()
(tries to) “cast” its argument to type foo — to translate it sensibly into a foo-type value
typeof(7)
[1] "double"
is.numeric(7)
[1] TRUE
is.na(7)
[1] FALSE
is.na(7/0)
[1] FALSE
is.na(0/0)
[1] TRUE
Why is 7/0 not NA, but 0/0 is?
is.character(7)
[1] FALSE
is.character("7")
[1] TRUE
is.character("seven")
[1] TRUE
is.na("seven")
[1] FALSE
as.character(5/6)
[1] "0.833333333333333"
as.numeric(as.character(5/6))
[1] 0.8333333
6*as.numeric(as.character(5/6))
[1] 5
5/6 == as.numeric(as.character(5/6))
[1] FALSE
(why is that last FALSE?)
We can give names to data objects; these give us variables
A few variables are built in:
pi
[1] 3.141593
Variables can be arguments to functions or operators, just like constants:
pi*10
[1] 31.41593
cos(pi)
[1] -1
Most variables are created with the assignment operator, <-
or =
approx.pi <- 22/7
approx.pi
[1] 3.142857
diameter.in.cubits = 10
approx.pi*diameter.in.cubits
[1] 31.42857
The assignment operator also changes values:
circumference.in.cubits <- approx.pi*diameter.in.cubits
circumference.in.cubits
[1] 31.42857
circumference.in.cubits <- 30
circumference.in.cubits
[1] 30
Using names and variables makes code: easier to design, easier to debug, less prone to bugs, easier to improve, and easier for others to read
Avoid “magic constants”; use named variables you will be graded on this!
Named variables are a first step towards abstraction
What names have you defined values for?
ls()
[1] "approx.pi" "circumference.in.cubits"
[3] "diameter.in.cubits"
objects()
[1] "approx.pi" "circumference.in.cubits"
[3] "diameter.in.cubits"
Getting rid of variables:
rm("circumference.in.cubits")
ls()
[1] "approx.pi" "diameter.in.cubits"
Group related data values into one object, a data structure
A vector is a sequence of values, all of the same type
x <- c(7, 8, 10, 45)
x
[1] 7 8 10 45
is.vector(x)
[1] TRUE
c()
function returns a vector containing all its arguments in order
x[1]
is the first element, x[4]
is the 4th element
x[-4]
is a vector containing all but the fourth element
vector(length=6)
returns an empty vector of length 6; helpful for filling things up later
weekly.hours <- vector(length=5)
weekly.hours[5] <- 8
Operators apply to vectors “pairwise” or “elementwise”:
y <- c(-7, -8, -10, -45)
x+y
[1] 0 0 0 0
x*y
[1] -49 -64 -100 -2025
Recycling repeats elements in shorter vector when combined with longer
x + c(-7,-8)
[1] 0 0 3 37
x^c(1,0,-1,0.5)
[1] 7.000000 1.000000 0.100000 6.708204
Single numbers are vectors of length 1 for purposes of recycling:
2*x
[1] 14 16 20 90
Can also do pairwise comparisons:
x > 9
[1] FALSE FALSE TRUE TRUE
Note: returns Boolean vector
Boolean operators work elementwise:
(x > 9) & (x < 20)
[1] FALSE FALSE TRUE FALSE
To compare whole vectors, best to use identical()
or all.equal()
:
x == -y
[1] TRUE TRUE TRUE TRUE
identical(x,-y)
[1] TRUE
identical(c(0.5-0.3,0.3-0.1),c(0.3-0.1,0.5-0.3))
[1] FALSE
all.equal(c(0.5-0.3,0.3-0.1),c(0.3-0.1,0.5-0.3))
[1] TRUE
Lots of functions take vectors as arguments:
mean()
, median()
, sd()
, var()
, max()
, min()
, length()
, sum()
: return single numberssort()
returns a new vectorhist()
takes a vector of numbers and produces a histogram, a highly structured object, with the side-effect of making a plotecdf()
produces a cumulative-density-function objectsummary()
gives a five-number summary of numerical vectorsany()
and all()
are useful on Boolean vectorsVector of indices:
x[2];x[4]
[1] 8
[1] 45
x[c(2,4)]
[1] 8 45
Vector of negative indices
x[c(-1,-3)]
[1] 8 45
(why that, and not 8 10
?)
Boolean vector:
x[x>9]
[1] 10 45
y[x>9]
[1] -10 -45
which()
turns a Boolean vector in vector of TRUE indices:
places <- which(x > 9)
places
[1] 3 4
y[places]
[1] -10 -45
You can give names to elements or components of vectors
names(x) <- c("v1","v2","v3","fred")
names(x)
[1] "v1" "v2" "v3" "fred"
x[c("fred","v1")]
fred v1
45 7
note the labels in what R prints; not actually part of the value
names(x)
is just another vector (of characters):
names(y) <- names(x)
sort(names(x))
[1] "fred" "v1" "v2" "v3"
which(names(x)=="fred")
[1] 4