Course Description

Data mining is used to discover patterns and relationships in data. Emphasis is placed on large complex data sets such as those in very large databases or through web mining. In this course, we will study the most common methods and techniques used in analyzing and modeling real world data. Course topics include linear models, classification, regularization, decision trees, association rules, clustering, and case based methods.

Course Instructor



Time & Location

Summer Quarter: Jun. - Aug. 2024
Lecture: Monday, Wednesday 4:30 PM - 5:50 PM
Review: Friday 4:30 PM - 5:50 PM
Location: Packard 101

Office Hours

You can find an up-to-date list of times here. We will be hosting office hours both in person and over Zoom (using QueueStatus). Please check specific office hour sessions for details.

Grade Breakdown

  • Four Homeworks: 50 pts each
  • In-person midterm: 100 pts
  • Final Exam: 200 pts
    • Optional: Final Project (200 pts)
    • If both the Final Exam & Project are completed, we will take the max of the two scores, i.e. max(Final Exam, Final Project)

Course Discussions

We use ed for course communication. Any questions regarding course content and course organization should be posted on ed. You are strongly encouraged to answer other students' questions when you know the answer.

Assignment Details

See here for more details concerning assignments

Course Project Details

See here for more details concerning the course project.

FAQ

How will the classes be taught?

All lectures this quarter will be presented in person. Recordings will subsequently uploaded to Canvas.

What are the pre-requisites?
For your reference, here are some reviews (taken from CS229): To supplement lecture material, additional lectures will be held on certain Fridays in person from 4:30 - 5:20 (announced ahead of time). Attendance to these lectures is optional, but encouraged.
Is there a textbook for this course?

We rely heavily on An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (Springer, 2nd ed., 2021) for this course. The book is also available at the Stanford Bookstore and free online through the Stanford Libraries.

We also occasionally rely on material and readings from The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman (Springer, 2nd ed., 2009).

Can I audit or sit in?
In general we are very open to sitting-in guests if you are a member of the Stanford community (registered student, staff, and/or faculty). Out of courtesy, we would appreciate that you first email us or talk to the instructor after the first class you attend. If the class is too full and we're running out of space, we would ask that you please allow registered students to attend.
What if I cannot make the exam dates?
We must receive prior notification and justification of your impending absence in order to authorize a make-up exam. Messages must be sent by email at least a week prior to the start of the exam. An exam must be made up within one week of the original exam date. There will be no exceptions.
What if I'm taking the exam remotely through SCPD?

Remote SCPD students must designate an "exam monitor" to proctor their exams (local students have the option of taking the exam at Stanford at the standard in-class time in the standard classroom). You will find general information on SCPD exam monitor protocol here.

Please call or email SCPD directly for more information on choosing an exam monitor, where to send exam solutions, etc. You will have a window of 24 hours after the exam time at Stanford to complete and return the exam. Exam-specific instructions (e.g., resources allowed and time limit) will be provided within each exam and also in advance through the website and/or mailing list.


Acknowledgments. HTML taken from various CS courses given at Stanford: cs229, cs231a, cs231n, and cs236.