Description:
Introduction to fundamental ideas and techniques of statistical
modeling, with an emphasis on conceptual understanding and on the
analysis of real data sets. Assignments will draw on
data analysis problems in various science and
engineering fields.
Prerequisite:
Ma 2 or other introductory course in probability
and statistics; knowledge of linear algebra
Syllabus:
- Simple linear regression (least squares estimation,
analysis of residuals) (Weisberg, ch. 1-2.9)
- Inferences about model parameters, multiple linear
regression (Weisberg, ch. 3.1-3.4)
- Analysis of variance, comparison of models, model
selection (Weisberg ch. 3.5, 10)
- Assessing goodness-of-fit, outliers, influential
observations (Weisberg, ch. 8, 9)
- Collinearity and rank-deficiency, singular value
decomposition, regularization
- Ridge regression, regularization by truncated
singular value decomposition
- Choosing regularization parameters (generalized
cross-validation, L-curve)
- Principal component analysis, linear discriminant
analysis
- Hierarchical cluster analysis
- Resampling methods and the bootstrap
Textbooks:
- Weisberg, S., Applied Linear Regression,
3rd. ed., Wiley (2005) [required].
- Efron, B., and R. J. Tibshirani, An Introduction to the
Bootstrap, Chapman and Hall (1993).
- Hansen, P.-C., Rank-Deficient and Discrete Ill-Posed
Problems: Numerical Aspects of Linear Inversion,
Society for Industrial and Applied Mathematics (1998).
- Johnson, R. A., and D. W. Wichern, Applied
Multivariate Statistical Analysis, 5th ed.,
Prentice Hall (2002).
- Venables, W. N., and B. D. Ripley, Modern Applied Statistics
with S, 4th. ed., Springer (2002).
Handouts:
All handouts will be stored in a binder in 150 S. Mudd and/or
posted online.
Recitation session:
Wednesdays 34, 162 S. Mudd (but in 155 Arms on January 10)
Teaching assistants and office hours:
- Agostino Capponi (acapponi@caltech): Wednesdays
6:307:30, 160J Jorgensen
- Roger Donaldson (rdonald@acm): Mondays 1:302:30, Red Door
- Hannes Helgason (hannes@acm): Wednesdays 23, 226
Guggenheim
Introduction to statistical
computing with R:
Wednesday, January 10, 33:50, 155 Arms
Grading:
- Homework assignments: 60%
- Homework assignments will be distributed on Thursdays
and are due in class the following Thursday.
- Late homework sets will be penalized by 25% off
achievable score per day late (exceptions for medical
reasons).
- There will be 7 or 8 assignments; the
lowest score will be dropped in the final grade.
- Final exam (take-home): 40%.
- Use of sources without citing them in homework sets or
in the final exam results in failing grade for course.
|