Control Structures and Functions

BUAN 327
Yegin Genc

Agenda

  • Conditionals: switching between different calculations
  • Iteration: Doing something over and over
  • Vectorizing: Avoiding explicit iteration

Control Structures: if

if(<condition>) {
        ## do something
} else {
        ## do something else
}
if(<condition1>) {
        ## do something
} else if(<condition2>)  {
        ## do something different
} else {
        ## do something different
}

if

This is a valid if/else structure.

if(x > 3) {
        y <- 10
} else {
        y <- 0
}

So is this one.

y <- if(x > 3) {
        10
} else { 
        0
}

if

Of course, the else clause is not necessary.

if(<condition1>) {

}

if(<condition2>) {

}

loops

alt text

for

for loops take an interator variable and assign it successive values from a sequence or vector. For loops are most commonly used for iterating over the elements of an object (list, vector, etc.)

for(i in 1:10) {
        print(i)
}

This loop takes the i variable and in each iteration of the loop gives it values 1, 2, 3, …, 10, and then exits.

for

These three loops have the same behavior.

x <- c("a", "b", "c", "d")

for(i in 1:4) {
        print(x[i])
}

for(i in seq_along(x)) {
        print(x[i])
}

for(letter in x) {
        print(letter)
}

for(i in 1:length(x)) print(x[i])

Nested for loops

for loops can be nested.

x <- matrix(1:6, 2, 3)

for(i in 1:nrow(x)) {
        for(j in 1:ncol(x)) {
                print(x[i, j])
        }   
}

Be careful with nesting though. Nesting beyond 2–3 levels is often very difficult to read/understand.

while

While loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth.

count <- 0
while(count < 10) {
        print(count)
        count <- count + 1
}

While loops can potentially result in infinite loops if not written properly. Use with care!

while

Sometimes there will be more than one condition in the test.

z <- 5

while(z >= 3 && z <= 10) {
        print(z)
        coin <- rbinom(1, 1, 0.5)

        if(coin == 1) {  ## random walk
                z <- z + 1
        } else {
                z <- z - 1
        } 
}

Conditions are always evaluated from left to right.

repeat

Repeat initiates an infinite loop; these are not commonly used in statistical applications but they do have their uses. The only way to exit a repeat loop is to call break.

x0 <- 1
tol <- 1e-8

repeat {
        x1 <- computeEstimate()

        if(abs(x1 - x0) < tol) {
                break
        } else {
                x0 <- x1
        } 
}

repeat

The loop in the previous slide is a bit dangerous because there’s no guarantee it will stop. Better to set a hard limit on the number of iterations (e.g. using a for loop) and then report whether convergence was achieved or not.

next, return

next is used to skip an iteration of a loop

for(i in 1:100) {
        if(i <= 20) {
                ## Skip the first 20 iterations
                next 
        }
        ## Do something here
}

return signals that a function should exit and return a given value

Control Structures

Summary

  • Control structures like if, while, and for allow you to control the flow of an R program

  • Infinite loops should generally be avoided, even if they are theoretically correct.

  • Control structures mentiond here are primarily useful for writing programs; for command-line interactive work, the *apply functions are more useful.

Avoiding iteration

R has many ways of avoiding iteration, by acting on whole objects

  • It's conceptually clearer
  • It leads to simpler code
  • It's faster (sometimes a little, sometimes drastically)

Vectorized arithmetic

How many languages add 2 vectors:

c <- vector(length(a))
for (i in 1:length(a)) {  c[i] <- a[i] + b[i]  }

How R adds 2 vectors:

a+b

or a triple for() loop for matrix multiplication vs. a %*% b

Advantages of vectorizing

  • Clarity: the syntax is about what we're doing
  • Concision: we write less
  • Abstraction: the syntax hides how the computer does it
  • Generality: same syntax works for numbers, vectors, arrays, … - Speed: modifying big vectors over and over is slow in R; work gets done by optimized low-level code

Vectorized calculations

Many functions are set up to vectorize automatically

abs(-3:3)
[1] 3 2 1 0 1 2 3
log(1:7)
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101

See also apply() from last week

We'll come back to this in great detail later

Vectorized conditions: ifelse()

ifelse(x^2 > 1, 2*abs(x)-1, x^2)

1st argument is a Boolean vector, then pick from the 2nd or 3rd vector arguments as TRUE or FALSE

Summary

  • if, nested if, switch
  • Iteration: for, while
  • Avoiding iteration with whole-object (“vectorized”) operations

Functions - Agenda

  • Defining functions: Tying related commands into bundles
  • Interfaces: Controlling what the function can see and do
  • Example: Parameter estimation code

Why Functions?

Data structures tie related values into one object

Functions tie related commands into one object

In both cases: easier to understand, easier to work with, easier to build into larger things

For example

circle.area <- function(r) { return(pi*r^2) }
circle.area(2)
[1] 12.56637

Our functions get used just like the built-in ones:

x=seq(1, 10, 1)
circle.area(x)
 [1]   3.141593  12.566371  28.274334  50.265482  78.539816 113.097336
 [7] 153.938040 201.061930 254.469005 314.159265

Area of different shapes

shapes=data.frame(cbind(type=c('circle',  'square', 'circle','square'), dimension=c(1 , 2 , 3 , 4)))
shapes$dimension<-as.numeric(shapes$dimension)

type dimension area
circle 1 3.141593
square 2 4.000000
circle 3 28.274334
square 4 16.000000

Pseudocode


areas <- vector of 0s to be populated with results

for a shape in shapes:

  if type is circle
      then shape.area is  (shape's dimension)^2*pi
        add areas to results

  else if type is square
      then shape.area is  (shape's dimension)^2
        add areas to results
area<- function(my.dataframe) {

  #start with getting number of entries in the table
  rownum= dim(my.dataframe)[1]

  #create vector that will keep the results
  results<-rep(0, rownum)

  #iterate through the entries calculate the area based on the type
  for (i in 1:rownum){
    dimension=my.dataframe[i,2]
    if (my.dataframe[i,1]=='circle')
        {
        results[i]<-pi*dimension^2
        }
    else if (my.dataframe[i,1]=='square')
        {
        results[i]<- dimension^2
        }
  }
  return(results)
  }

m.results=area(shapes)

Interfaces: the inputs or arguments; the outputs or return value

Calls other functions ifelse(), abs(), operators ^ and >
could also call other functions we've written

return() says what the output is
alternately, return the last evaluation; I like explicit returns better

Comments: Not required by R, but a Very Good Idea
One-line description of purpose; listing of arguments; listing of outputs

What should be a function?

  • Things you're going to re-run, especially if it will be re-run with changes
  • Chunks of code you keep highlighting and hitting return on
  • Chunks of code which are small parts of bigger analyses
  • Chunks which are very similar to other chunks

will say more about design later

What the function can see and do

  • Each function has its own environment
  • Names here over-ride names in the global environment
  • Internal environment starts with the named arguments
  • Assignments inside the function only change the internal environment
    There are ways around this, but they are difficult and best avoided; see Chambers, ch. 5, if you must
  • Names undefined in the function are looked for in the environment the function gets called from
    not the environment of definition

Internal environment examples

x <- 7
y <- c("A","C","G","T","U")
adder <- function(y) { x<- x+y; return(x) }
adder(1)
[1] 8
x
[1] 7
y
[1] "A" "C" "G" "T" "U"

Internal environment examples cont'd.

circle.area <- function(r) { return(pi*r^2) }
circle.area(c(1,2,3))
[1]  3.141593 12.566371 28.274334
truepi <- pi
pi <- 3   # Valid in 1800s Indiana, or drowned R'lyeh
circle.area(c(1,2,3))
[1]  3 12 27
pi <- truepi      # Restore sanity
circle.area(c(1,2,3))
[1]  3.141593 12.566371 28.274334

Summary

  • Functions bundle related commands together into objects: easier to re-run, easier to re-use, easier to combine, easier to modify, less risk of error, easier to think about
  • Interfaces control what the function can see (arguments, environment) and change (its internals, its return value)
  • Calling functions we define works just like calling built-in functions: named arguments, defaults