Control Structures and Functions

BUAN 327
Yegin Genc

Agenda

Conditionals: switching between different calculations
Iteration: Doing something over and over
Vectorizing: Avoiding explicit iteration

Control Structures: if

if(<condition>) {
        ## do something
} else {
        ## do something else
}
if(<condition1>) {
        ## do something
} else if(<condition2>)  {
        ## do something different
} else {
        ## do something different
}

if

This is a valid if/else structure.

if(x > 3) {
        y <- 10
} else {
        y <- 0
}

So is this one.

y <- if(x > 3) {
        10
} else { 
        0
}

if

Of course, the else clause is not necessary.

if(<condition1>) {

}

if(<condition2>) {

}

loops

alt text

for

for loops take an interator variable and assign it successive values from a sequence or vector. For loops are most commonly used for iterating over the elements of an object (list, vector, etc.)

for(i in 1:10) {
        print(i)
}

This loop takes the i variable and in each iteration of the loop gives it values 1, 2, 3, …, 10, and then exits.

for

These three loops have the same behavior.

x <- c("a", "b", "c", "d")

for(i in 1:4) {
        print(x[i])
}

for(i in seq_along(x)) {
        print(x[i])
}

for(letter in x) {
        print(letter)
}

for(i in 1:length(x)) print(x[i])

Nested for loops

for loops can be nested.

x <- matrix(1:6, 2, 3)

for(i in 1:nrow(x)) {
        for(j in 1:ncol(x)) {
                print(x[i, j])
        }   
}

Be careful with nesting though. Nesting beyond 2–3 levels is often very difficult to read/understand.

while

While loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth.

count <- 0
while(count < 10) {
        print(count)
        count <- count + 1
}

While loops can potentially result in infinite loops if not written properly. Use with care!

while

Sometimes there will be more than one condition in the test.

z <- 5

while(z >= 3 && z <= 10) {
        print(z)
        coin <- rbinom(1, 1, 0.5)

        if(coin == 1) {  ## random walk
                z <- z + 1
        } else {
                z <- z - 1
        } 
}

Conditions are always evaluated from left to right.

repeat

Repeat initiates an infinite loop; these are not commonly used in statistical applications but they do have their uses. The only way to exit a repeat loop is to call break.

x0 <- 1
tol <- 1e-8

repeat {
        x1 <- computeEstimate()

        if(abs(x1 - x0) < tol) {
                break
        } else {
                x0 <- x1
        } 
}

repeat

The loop in the previous slide is a bit dangerous because there’s no guarantee it will stop. Better to set a hard limit on the number of iterations (e.g. using a for loop) and then report whether convergence was achieved or not.

next, return

next is used to skip an iteration of a loop

for(i in 1:100) {
        if(i <= 20) {
                ## Skip the first 20 iterations
                next 
        }
        ## Do something here
}

return signals that a function should exit and return a given value

Control Structures

Summary

Control structures like if, while, and for allow you to control the flow of an R program
Infinite loops should generally be avoided, even if they are theoretically correct.
Control structures mentiond here are primarily useful for writing programs; for command-line interactive work, the *apply functions are more useful.

Avoiding iteration

R has many ways of avoiding iteration, by acting on whole objects

It's conceptually clearer
It leads to simpler code
It's faster (sometimes a little, sometimes drastically)

Vectorized arithmetic

How many languages add 2 vectors:

c <- vector(length(a))
for (i in 1:length(a)) {  c[i] <- a[i] + b[i]  }

How R adds 2 vectors:

a+b

or a triple for() loop for matrix multiplication vs. a %*% b

Advantages of vectorizing

Clarity: the syntax is about what we're doing
Concision: we write less
Abstraction: the syntax hides how the computer does it
Generality: same syntax works for numbers, vectors, arrays, … - Speed: modifying big vectors over and over is slow in R; work gets done by optimized low-level code

Vectorized calculations

Many functions are set up to vectorize automatically

abs(-3:3)

[1] 3 2 1 0 1 2 3

log(1:7)

[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101

Vectorized conditions: ifelse()

ifelse(x^2 > 1, 2*abs(x)-1, x^2)

1st argument is a Boolean vector, then pick from the 2nd or 3rd vector arguments as TRUE or FALSE

Summary

if, nested if, switch
Iteration: for, while
Avoiding iteration with whole-object (“vectorized”) operations

Functions - Agenda

Defining functions: Tying related commands into bundles
Interfaces: Controlling what the function can see and do
Example: Parameter estimation code

Why Functions?

Data structures tie related values into one object

Functions tie related commands into one object

In both cases: easier to understand, easier to work with, easier to build into larger things

For example

circle.area <- function(r) { return(pi*r^2) }
circle.area(2)

[1] 12.56637

Our functions get used just like the built-in ones:

x=seq(1, 10, 1)
circle.area(x)

 [1]   3.141593  12.566371  28.274334  50.265482  78.539816 113.097336
 [7] 153.938040 201.061930 254.469005 314.159265

Area of different shapes

shapes=data.frame(cbind(type=c('circle',  'square', 'circle','square'), dimension=c(1 , 2 , 3 , 4)))
shapes$dimension<-as.numeric(shapes$dimension)

type	dimension	area
circle	1	3.141593
square	2	4.000000
circle	3	28.274334
square	4	16.000000

Pseudocode


areas <- vector of 0s to be populated with results

for a shape in shapes:

  if type is circle
      then shape.area is  (shape's dimension)^2*pi
        add areas to results

  else if type is square
      then shape.area is  (shape's dimension)^2
        add areas to results

area<- function(my.dataframe) {

  #start with getting number of entries in the table
  rownum= dim(my.dataframe)[1]

  #create vector that will keep the results
  results<-rep(0, rownum)

  #iterate through the entries calculate the area based on the type
  for (i in 1:rownum){
    dimension=my.dataframe[i,2]
    if (my.dataframe[i,1]=='circle')
        {
        results[i]<-pi*dimension^2
        }
    else if (my.dataframe[i,1]=='square')
        {
        results[i]<- dimension^2
        }
  }
  return(results)
  }

m.results=area(shapes)

Interfaces: the inputs or arguments; the outputs or return value

Calls other functions ifelse(), abs(), operators ^ and >
could also call other functions we've written

return() says what the output is
alternately, return the last evaluation; I like explicit returns better

Comments: Not required by R, but a Very Good Idea
One-line description of purpose; listing of arguments; listing of outputs

What should be a function?

Things you're going to re-run, especially if it will be re-run with changes
Chunks of code you keep highlighting and hitting return on
Chunks of code which are small parts of bigger analyses
Chunks which are very similar to other chunks

will say more about design later

What the function can see and do

Each function has its own environment
Names here over-ride names in the global environment
Internal environment starts with the named arguments
Assignments inside the function only change the internal environment
There are ways around this, but they are difficult and best avoided; see Chambers, ch. 5, if you must
Names undefined in the function are looked for in the environment the function gets called from
not the environment of definition

Internal environment examples

x <- 7
y <- c("A","C","G","T","U")
adder <- function(y) { x<- x+y; return(x) }
adder(1)

[1] 8

[1] 7

[1] "A" "C" "G" "T" "U"

Internal environment examples cont'd.

circle.area <- function(r) { return(pi*r^2) }
circle.area(c(1,2,3))

[1]  3.141593 12.566371 28.274334

truepi <- pi
pi <- 3   # Valid in 1800s Indiana, or drowned R'lyeh
circle.area(c(1,2,3))

[1]  3 12 27

pi <- truepi      # Restore sanity
circle.area(c(1,2,3))

[1]  3.141593 12.566371 28.274334

Summary

Functions bundle related commands together into objects: easier to re-run, easier to re-use, easier to combine, easier to modify, less risk of error, easier to think about
Interfaces control what the function can see (arguments, environment) and change (its internals, its return value)
Calling functions we define works just like calling built-in functions: named arguments, defaults

Control Structures and Functions

BUAN 327 Yegin Genc

Agenda

Control Structures: if

if

if

loops

for

for

Nested for loops

while

while

repeat

repeat

next, return

Control Structures

Avoiding iteration

Vectorized arithmetic

Advantages of vectorizing

Vectorized calculations

Vectorized conditions: ifelse()

Summary

Functions - Agenda

Why Functions?

For example

Area of different shapes

Pseudocode

What should be a function?

What the function can see and do

Internal environment examples

Internal environment examples cont'd.

Summary

BUAN 327
Yegin Genc