In the Fall of 2013, Harvard University offered CS109 Data Science which is an excellent introductory course for those interested in getting a jump start into this exciting field. Most of the class materials including video lecture archives and slides are freely available online. This is a fantastic way to get ivy-league quality education, albeit without university credit. The course is currently taught by two Harvard professors: Hanspeter Pfister (Computer Science) and Joe Blitzstein (Statistics).
This course introduces the following aspects of data science:
- Data munging, cleaning, and sampling
- Data management to be able to access big data quickly and reliably
- Exploratory data analysis to generate hypotheses and intuition
- Prediction based on statistical methods such as regression and classification
- Communication of results through visualization, stories, and summaries
The course is based on Python for all programming assignments and projects. IPython notebooks for CS109 are available on https://github.com/cs109/content