Control Structures and Functions

BUAN 327 Yegin Genc

Agenda

• Conditionals: switching between different calculations
• Iteration: Doing something over and over
• Vectorizing: Avoiding explicit iteration

Control Structures: if

if(<condition>) {
## do something
} else {
## do something else
}
if(<condition1>) {
## do something
} else if(<condition2>)  {
## do something different
} else {
## do something different
}


if

This is a valid if/else structure.

if(x > 3) {
y <- 10
} else {
y <- 0
}


So is this one.

y <- if(x > 3) {
10
} else {
0
}


if

Of course, the else clause is not necessary.

if(<condition1>) {

}

if(<condition2>) {

}


loops for

for loops take an interator variable and assign it successive values from a sequence or vector. For loops are most commonly used for iterating over the elements of an object (list, vector, etc.)

for(i in 1:10) {
print(i)
}


This loop takes the i variable and in each iteration of the loop gives it values 1, 2, 3, …, 10, and then exits.

for

These three loops have the same behavior.

x <- c("a", "b", "c", "d")

for(i in 1:4) {
print(x[i])
}

for(i in seq_along(x)) {
print(x[i])
}

for(letter in x) {
print(letter)
}

for(i in 1:length(x)) print(x[i])


Nested for loops

for loops can be nested.

x <- matrix(1:6, 2, 3)

for(i in 1:nrow(x)) {
for(j in 1:ncol(x)) {
print(x[i, j])
}
}


Be careful with nesting though. Nesting beyond 2–3 levels is often very difficult to read/understand.

while

While loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth.

count <- 0
while(count < 10) {
print(count)
count <- count + 1
}


While loops can potentially result in infinite loops if not written properly. Use with care!

while

Sometimes there will be more than one condition in the test.

z <- 5

while(z >= 3 && z <= 10) {
print(z)
coin <- rbinom(1, 1, 0.5)

if(coin == 1) {  ## random walk
z <- z + 1
} else {
z <- z - 1
}
}


Conditions are always evaluated from left to right.

repeat

Repeat initiates an infinite loop; these are not commonly used in statistical applications but they do have their uses. The only way to exit a repeat loop is to call break.

x0 <- 1
tol <- 1e-8

repeat {
x1 <- computeEstimate()

if(abs(x1 - x0) < tol) {
break
} else {
x0 <- x1
}
}


repeat

The loop in the previous slide is a bit dangerous because there’s no guarantee it will stop. Better to set a hard limit on the number of iterations (e.g. using a for loop) and then report whether convergence was achieved or not.

next, return

next is used to skip an iteration of a loop

for(i in 1:100) {
if(i <= 20) {
## Skip the first 20 iterations
next
}
## Do something here
}


return signals that a function should exit and return a given value

Control Structures

Summary

• Control structures like if, while, and for allow you to control the flow of an R program

• Infinite loops should generally be avoided, even if they are theoretically correct.

• Control structures mentiond here are primarily useful for writing programs; for command-line interactive work, the *apply functions are more useful.

Avoiding iteration

R has many ways of avoiding iteration, by acting on whole objects

• It's conceptually clearer
• It leads to simpler code
• It's faster (sometimes a little, sometimes drastically)

Vectorized arithmetic

How many languages add 2 vectors:

c <- vector(length(a))
for (i in 1:length(a)) {  c[i] <- a[i] + b[i]  }


a+b


or a triple for() loop for matrix multiplication vs. a %*% b

• Clarity: the syntax is about what we're doing
• Concision: we write less
• Abstraction: the syntax hides how the computer does it
• Generality: same syntax works for numbers, vectors, arrays, … - Speed: modifying big vectors over and over is slow in R; work gets done by optimized low-level code

Vectorized calculations

Many functions are set up to vectorize automatically

abs(-3:3)

 3 2 1 0 1 2 3

log(1:7)

 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101


See also apply() from last week

We'll come back to this in great detail later

Vectorized conditions: ifelse()

ifelse(x^2 > 1, 2*abs(x)-1, x^2)


1st argument is a Boolean vector, then pick from the 2nd or 3rd vector arguments as TRUE or FALSE

Summary

• if, nested if, switch
• Iteration: for, while
• Avoiding iteration with whole-object (“vectorized”) operations

Functions - Agenda

• Defining functions: Tying related commands into bundles
• Interfaces: Controlling what the function can see and do
• Example: Parameter estimation code

Why Functions?

Data structures tie related values into one object

Functions tie related commands into one object

In both cases: easier to understand, easier to work with, easier to build into larger things

For example

circle.area <- function(r) { return(pi*r^2) }
circle.area(2)

 12.56637


Our functions get used just like the built-in ones:

x=seq(1, 10, 1)
circle.area(x)

    3.141593  12.566371  28.274334  50.265482  78.539816 113.097336
 153.938040 201.061930 254.469005 314.159265


Area of different shapes

shapes=data.frame(cbind(type=c('circle',  'square', 'circle','square'), dimension=c(1 , 2 , 3 , 4)))
shapes$dimension<-as.numeric(shapes$dimension)


type dimension area
circle 1 3.141593
square 2 4.000000
circle 3 28.274334
square 4 16.000000

Pseudocode


areas <- vector of 0s to be populated with results

for a shape in shapes:

if type is circle
then shape.area is  (shape's dimension)^2*pi

else if type is square
then shape.area is  (shape's dimension)^2

area<- function(my.dataframe) {

rownum= dim(my.dataframe)

#create vector that will keep the results
results<-rep(0, rownum)

#iterate through the entries calculate the area based on the type
for (i in 1:rownum){
dimension=my.dataframe[i,2]
if (my.dataframe[i,1]=='circle')
{
results[i]<-pi*dimension^2
}
else if (my.dataframe[i,1]=='square')
{
results[i]<- dimension^2
}
}
return(results)
}

m.results=area(shapes)


Interfaces: the inputs or arguments; the outputs or return value

Calls other functions ifelse(), abs(), operators ^ and >
could also call other functions we've written

return() says what the output is
alternately, return the last evaluation; I like explicit returns better

Comments: Not required by R, but a Very Good Idea
One-line description of purpose; listing of arguments; listing of outputs

What should be a function?

• Things you're going to re-run, especially if it will be re-run with changes
• Chunks of code you keep highlighting and hitting return on
• Chunks of code which are small parts of bigger analyses
• Chunks which are very similar to other chunks

will say more about design later

What the function can see and do

• Each function has its own environment
• Names here over-ride names in the global environment
• Internal environment starts with the named arguments
• Assignments inside the function only change the internal environment
There are ways around this, but they are difficult and best avoided; see Chambers, ch. 5, if you must
• Names undefined in the function are looked for in the environment the function gets called from
not the environment of definition

Internal environment examples

x <- 7
y <- c("A","C","G","T","U")
adder <- function(y) { x<- x+y; return(x) }

 8

x

 7

y

 "A" "C" "G" "T" "U"


Internal environment examples cont'd.

circle.area <- function(r) { return(pi*r^2) }
circle.area(c(1,2,3))

  3.141593 12.566371 28.274334

truepi <- pi
pi <- 3   # Valid in 1800s Indiana, or drowned R'lyeh
circle.area(c(1,2,3))

  3 12 27

pi <- truepi      # Restore sanity
circle.area(c(1,2,3))

  3.141593 12.566371 28.274334


Summary

• Functions bundle related commands together into objects: easier to re-run, easier to re-use, easier to combine, easier to modify, less risk of error, easier to think about
• Interfaces control what the function can see (arguments, environment) and change (its internals, its return value)
• Calling functions we define works just like calling built-in functions: named arguments, defaults