ACM/ESE 118
Methods in Applied Statistics and Data Analysis
Winter 2007


Instructor
Tapio Schneider
112 N. Mudd
tapio@gps.caltech.edu
Office Hours: T 3–4 (or by appointment)

  Lectures
TTh 10:30-11:55
Arms, Room 155

First meeting: 4 January 2007

 

Home

Handouts

Homework

Computing


Description: Introduction to fundamental ideas and techniques of statistical modeling, with an emphasis on conceptual understanding and on the analysis of real data sets. Assignments will draw on data analysis problems in various science and engineering fields.

Prerequisite: Ma 2 or other introductory course in probability and statistics; knowledge of linear algebra

Syllabus:

  • Simple linear regression (least squares estimation, analysis of residuals) (Weisberg, ch. 1-2.9)
  • Inferences about model parameters, multiple linear regression (Weisberg, ch. 3.1-3.4)
  • Analysis of variance, comparison of models, model selection (Weisberg ch. 3.5, 10)
  • Assessing goodness-of-fit, outliers, influential observations (Weisberg, ch. 8, 9)
  • Collinearity and rank-deficiency, singular value decomposition, regularization
  • Ridge regression, regularization by truncated singular value decomposition
  • Choosing regularization parameters (generalized cross-validation, L-curve)
  • Principal component analysis, linear discriminant analysis
  • Hierarchical cluster analysis
  • Resampling methods and the bootstrap

Textbooks:

  1. Weisberg, S., Applied Linear Regression, 3rd. ed., Wiley (2005) [required].
  2. Efron, B., and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall (1993).
  3. Hansen, P.-C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, Society for Industrial and Applied Mathematics (1998).
  4. Johnson, R. A., and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th ed., Prentice Hall (2002).
  5. Venables, W. N., and B. D. Ripley, Modern Applied Statistics with S, 4th. ed., Springer (2002).

Handouts: All handouts will be stored in a binder in 150 S. Mudd and/or posted online.

Recitation session: Wednesdays 3–4, 162 S. Mudd (but in 155 Arms on January 10)

Teaching assistants and office hours:

  • Agostino Capponi (acapponi@caltech): Wednesdays 6:30–7:30, 160J Jorgensen
  • Roger Donaldson (rdonald@acm): Mondays 1:30–2:30, Red Door
  • Hannes Helgason (hannes@acm): Wednesdays 2–3, 226 Guggenheim

Introduction to statistical computing with R: Wednesday, January 10, 3–3:50, 155 Arms

Grading:

  • Homework assignments: 60%
    • Homework assignments will be distributed on Thursdays and are due in class the following Thursday.
    • Late homework sets will be penalized by 25% off achievable score per day late (exceptions for medical reasons).
    • There will be 7 or 8 assignments; the lowest score will be dropped in the final grade.
  • Final exam (take-home): 40%.
  • Use of sources without citing them in homework sets or in the final exam results in failing grade for course.