The course will be lecture-based and will also offer some hands-on tutorials. The project component will be flexible and will involve data collection, manipulation, and analysis. For further details on the course content, please refer to its outline (pdf). This course is offered by the School of Computer Science at the Carleton University.
Seminars are held every Tuesday from 11:35 AM to 2:25 PM in SA 518 (Olga's class), SA 501 (Elio's class), SA 318 (Majid's class).
Announcements
- Due to COVID-19, class on March 17th is cancelled. Data Day 7.0 is postponed. Classes on March 24 and April 7 will be offered online (see Slack for details)
- We will be using Slack for course communication, news and reminders. Please join DATA 5000 channel by January 8.
- Project teams must be formed and emailed to the instructors no later than January 13.
- Welcome to DATA 5000! Lectures start on January 7th.
Content Overview
The course covers topics relevant to data science: working with data, exploratory data analysis, data mining, machine learning. The concepts are illustrated using the R language. Students also receive hands-on tutorials (e.g., Tableau, IBM Cognos Analytics). Students will be evaluated by their course projects.
Instructors
Olga Baysal
Email: olga.baysal@carleton.ca
Office: HP 5414
Office hours: by appointment or via Slack
Website: http://olgabaysal.com/
Elio Velazquez
Email: elio.velazquez@carleton.ca
Office: N/A
Office hours: by appointment or via Slack
Website:
Majid Komeili
Email: majid.komeili@carleton.ca
Office: HP 5436
Office hours: by appointment or via Slack
Website: https://people.scs.carleton.ca/~majidkomeili/
Tentative Schedule
It is important to note that this schedule is evolving and will change based on how the class is progressing.
Tuesday January 7, 2020 - Lecture 1: What is Data Science?/ Introduction to R (in SA 518).
Tuesday January 14, 2020 - Lecture 2: Working with Data (in SA 518).
Tuesday January 21, 2020 - Lecture 3: Visualization and Exploration.
Tuesday January 28, 2020 - Lecture 4: Data Mining and Machine Learning I.
Tuesday February 4, 2020 - Lecture 5: Machine Learning II.
Tuesday February 11, 2020 - Guest Lecture: Matthew Holden, School of Computer Science and James Green, Systems and Computer Engineering (in SA 518).
Tuesday February 18, 2020 - NO CLASS (Winter Break)
Tuesday February 25, 2020 - IBM Watson Studio Tutorial (in SA 518).
Tuesday March 3, 2020 - Tableau Tutorial (in SA 518).
Tuesday March 10, 2020 - IBM Cognos Analytics Tutorial by Omar Khan, IBM Canada. (in SA 518).
Tuesday March 17, 2020 - Cancelled due to COVID-19.
Tuesday March 24, 2020 - Guest Lecture: Tracey Lauriault, School of Journalism and Communication (via cuLearn).
Tuesday March 31, 2020 - Data Day 7.0 is postponed.
Tuesday April 7, 2020 - Project Presentations (via cuLearn).
Evaluation
- Paper presentation: 10% (paper selection due January 21 )
- Project proposal: 10% (due January 21, 11:59 PM)
- Presentation outline: 5% (due March 10, 11:59 PM)
- Poster presentation: 10% (March 24)
- Project presentation: 15% (April 7)
- Project report: 50% (due April 14, 11:59 PM)
Paper presentation
Each group needs to choose a conference publication on the topic of Data Science to present in class (15 minute talk). Paper selection due January 21, 2020. A 8-12 page conference proceeding (e.g., IEEE International Conference on Data Science, SIGKDD/KDD Conference, etc.) will be approved by the instructor. Presentations will be scheduled throughout the course between 11:35-14:25.
Project proposal
The project forms an integral part of this course. The project is to be completed in group of two students.
You have two options: you can choose to mine and analyze one of the provided datasets or come up with an idea of your own that relates to the course material. In either case, the project topic will require my approval (via the project proposal).
Before you undertake your project you will need to submit a proposal for approval. The proposal should be short (max 2 page PDF in ACM format). The proposal should include a problem statement, the motivation for the project, and set of objectives you aim to accomplish. I will read these and provide comments. This will be due on January 21 by 11:59 PM via email to Olga Elio or Majid, respectively.
Presentation outline
You would need to submit your project presentation outline describing the structure of your slides and preliminary content (in PDF format). This will be due on March 10 by 11:59 PM via email.
Poster presentation
Each group will have the opportunity to present their project's posters during the poster presentation day on March 24 (in SA 518). The independent jury will evaluate posters and select winners.
Project presentation
Each group will have the opportunity to present their project in class on April 7 . This presentation should take the form of a 15 minute (hard maximum) conference-style talk and describe the motivation for your work, what you did, and what you found. If a demo is the best way to describe what you did, feel free to include one in the middle of the talk. Please allocate 3-5 minute time for questions after the project has been presented.
The proposed structure of your presentation:
- Introduction (describe the problem and motivation)
- Research questions
- Methodology: data collection, data cleanup, data mining, data analysis (statistics, machine learning), etc.
- Results (achieved, preliminary, or anticipated)
- Implications (why does this study matter? how can your findings be used?)
- Conclusion (summary, main contributions)
Project report
The required length of the written report varies from project to project (8-10 pages, double column format); all reports must be formatted according to the ACM format and submitted as a PDF. This report will constitute 50% of the course grade. This will be due on April 14 by 11:59 PM via email.
Datasets
- Defence Research and Development Canada /Government of Canada (posted on Slack)
- GitHub repository via GHTorrent
- MSR Mining Challenge datasets (various datasets for different years)
- Tera-PROMISE repository
- Open Data @ Government of Canada
- Machine learning data set repo
- Kaggle Datasets
- Kaggle Competitions
- IAPR
- Datamob
- KDnuggets:
- in R
Resources
The following books are suggested but not required:
- "Doing Data Science: Straight Talk From the Frontline" by Cathy O'Neil and Rachel Schutt, O'Reilly Media, 2013
- "Data Mining and Business Analytics with R" by Johannes Ledolter, Wiley, 2013
- "Data Science for Business: what you need to know about data mining and data-analytic thinking" by Foster Provost and Tom Fawcett, O'Reilly Media, 2013.
The following books are good references for data mining and machine learning algorithms:
- "An Introduction to Statistical Learning: with Applications in R" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Springer, 2013
- "The Elements of Statistical Learning: Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, 2011.
The following are good references for R (just to name a few):
- "Cookbook for R" by Winston Chang
- "The R Inferno" by Patrick Burns
- Quick-R
- "Software for Data Analysis Programming with R" by John Chambers, Springer, 2008.
Contact
The best way to get in touch with instructor is via email: olga.baysal[at]carleton.ca, elio.velazquez[at]cmail.carleton.ca or majid.komeli[at].carleton.ca. However, for any public course related communication we will be using Slack DATA 5000 channel. For private messages, please email instructor directly or send a private message on Slack.
University Policies
Academic Integrity
Academic Integrity is everyone’s business because academic dishonesty affects the quality of every Carleton degree. Each year students are caught in violation of academic integrity and found guilty of plagiarism and cheating. In many instances they could have avoided failing an assignment or a course simply by learning the proper rules of citation. See the academic integrity for more information.
Academic Accommodations for Students with Disabilities
The Paul Menton Centre for Students with Disabilities (PMC) provides services to students with Learning Disabilities (LD), psychiatric/mental health disabilities, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorders (ASD), chronic medical conditions, and impairments in mobility, hearing, and vision. If you have a disability requiring academic accommodations in this course, please contact PMC at 613-520-6608 or pmc@carleton.ca for a formal evaluation. If you are already registered with the PMC, contact your PMC coordinator to send me your Letter of Accommodation at the beginning of the term, and no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). After requesting accommodation from PMC, meet with me to ensure accommodation arrangements are made. Please consult the PMC website for the deadline to request accommodations for the formally-scheduled exam (if applicable).
Religious Obligation
Write to the instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website.