ACM/ESE 118

Methods in Applied Statistics and Data Analysis

Instructor

Tapio Schneider
112 N. Mudd
tapio@caltech.edu

Textbooks (on SFL reserve)

Montgomery, D. C., E. A. Peck, and G. G. Vining, Introduction to Linear Regression Analysis, 4th. ed., Wiley (2006) [required].

Efron, B., and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall (1993).

Hansen, P.-C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, Society for Industrial and Applied Mathematics (1998).

Johnson, R. A., and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th ed., Prentice Hall (2002).

Teaching Assistants

John Bruer (jbruer@caltech.edu)
Tuesdays 4-5pm, 243 Annenberg

Zhihong Tan (tan@caltech.edu)
Mondays 5-6pm, 176 S. Mudd

Schedule

Lectures
TTh 1:00-2:25pm, 155 Arms

Recitation Session
Wednesdays 5:15-6pm, 162 S. Mudd (starting January 12)

 

Announcements / Homework / Grading / Schedule and Handouts / Computing

Comments

If you click the "Submit" button, you will send an anonymous comment to Professor Schneider.

 

Announcements

1/4 There will be a special recitation session to introduce the R statistics package on Friday, January 7th, 3-4pm in 155 Arms. Example script from tutorial: Rintro.r; example data: forbes.dat
3/1 The solutions to HW sets 1–6 are available at the mailboxes outside 150 S. Mudd.
3/2 Due date for homework 7 moved to 3/9.
3/30 Graded finals available at mailboxes outside 150 S. Mudd.
TOP

Homework

HW Due date Homework
Homework solutions will be handed out in class and can be picked up at the mailboxes outside 150 S. Mudd.

TOP

Grading Policy

  • Homework assignments: 60%
    • Homework assignments will be distributed on Thursdays and are due in class the following Thursday.
    • Late homework sets will be penalized by 25% off achievable score per day late (exceptions for medical reasons).
    • There will be 7 assignments; the lowest score will be dropped in the final grade.
    • Collaboration on homework sets is encouraged, but please turn in solutions individually and state on your solutions with whom you collaborated.
  • Final exam: 40%.
  • Use of sources without citing them in homework sets or in the final exam results in failing grade for course.
TOP

Schedule and Handouts

Week
Description
Reading/Handouts
1/4
Simple linear regression (least squares estimation, analysis of residuals) Syllabus; mean and variance estimation; Montgomery et al., ch. 1-2
1/11
Inferences about model parameters, confidence intervals, analysis of variance Montgomery et al., ch. 2; simple linear regression example; sampling distribution of parameter estimates
1/18
Multiple linear regression, estimation and inferences about parameters Montgomery et al., ch. 3; multiple regression example (sales)
1/25
Comparison of models, model selection Montgomery et al., ch. 8, 9; model comparison example (bats) and original paper; model selection example (crimes).
2/1
Variable selection with Mallows' Cp and cross-validation; assessing goodness-of-fit, outliers, influential observations. Montgomery et al., ch. 9 and 6. R script to illustrate outliers and influence statistics: outliers.R
2/8
Collinearity and rank-deficiency, singular value decomposition, regularization by truncated singular value decomposition. Montgomery et al., ch. 11. Singular value decomposition.
2/15
Ridge regression. Choosing regularization parameters (generalized cross-validation, L-curve). Principal component analysis. Ridge regression. Choice of regularization parameter.
2/22
Principal component analysis, linear discriminant analysis PCA example: spike sorting.
3/1
Linear discriminant analysis. Resampling methods. LDA chapter from Mardia et al., Mutlivariate Analysis.
3/8
Resampling methods and the bootstrap Example 1; example 2.
TOP

Computing