Exploring Data with Graphics

BUAN 327
Yegin Genc

  • One of the great strengths of R is the graphics capabilities.
  • Not only is it very easy to generate great looking graphs, but it is very simply to extend the standard graphics abilities to include conditional graphics.
  • These are very useful both when exploring data and when doing statistical analysis.

Graphical Environments

  • Base package provides the simplest graphs: easy to remember, provides low level of analysis.
    plot(), hist()

  • Lattice is more options to create higher level of analysis.

    • syntax is similar to base functions
    • visual aspects (color, font etc) are harder to its alternatives (i.e. ggplot)
  • Ggplot is also good for higher level of analysis.

    • very detailed and well-thought-out visual functions
    • syntax is harder to learn (but not too hard to remember once learned.)

Base Graphics

  • plot: generic x-y plotting
  • barplot: bar plots
  • boxplot: box-and-whisker plot
  • hist: histograms
  • pie: pie charts
  • dotchart: cleveland dot plots
  • image, heatmap, contour, persp: functions to generate image-like plots
  • qqnorm, qqline, qqplot: distribution comparison plots
  • pairs, coplot: display of multivariant data

Lattice vs GGplot

Jury is still out on which is better

#install.packages('lattice') #if not installed already
require(lattice)
histogram(~mpg$hwy|mpg$year)

plot of chunk unnamed-chunk-2


ggplot(mpg) +
  geom_histogram(aes(x=hwy , fill=as.factor(year) )) + 
  facet_grid(~ year)         

plot of chunk unnamed-chunk-3

Histograms

#histograms
histogram(~hwy, mpg)

plot of chunk unnamed-chunk-4

#histograms
histogram(~hwy|year, mpg)

plot of chunk unnamed-chunk-5

#histograms
histogram(~hwy|as.factor(year)+as.factor(cyl), mpg)

plot of chunk unnamed-chunk-6

Density plots

densityplot(~hwy|class, mpg)

plot of chunk unnamed-chunk-7

densityplot(~hwy+cty|class, mpg)

plot of chunk unnamed-chunk-8

Q-Q Plots

  • To check if distributional assumptions are accurate.
  • If points follow a straight line than assumptions are valid.

plot of chunk unnamed-chunk-9

qqmath(~hwy, mpg)

plot of chunk unnamed-chunk-10

#conditional plot
qqmath(~hwy | class, mpg)

plot of chunk unnamed-chunk-11

Box plots

A.k.a. Box and whiskers plots. Hence the command bwplot()

bwplot(~hwy, mpg)

plot of chunk unnamed-chunk-12

#conditional
bwplot(hwy~class, mpg)

plot of chunk unnamed-chunk-13

#conditional
bwplot(hwy~class|as.factor(year), mpg)

plot of chunk unnamed-chunk-14

Multivariate Plots

Scatter plots

xyplot(hwy~cty, mpg)

plot of chunk unnamed-chunk-15

xyplot(hwy~cty|manufacturer, mpg)

plot of chunk unnamed-chunk-16

xyplot(hwy~cty|as.factor(displ), mpg)

plot of chunk unnamed-chunk-17

Summary

  • One of the best way to explore data is to visualize it.
  • Graphs help us
    • understand and describe the data
    • spot interesting phenomena
    • ask right questions