DATA 5000: Introduction to Data Science

The course will be lecture-based and will also offer some hands-on tutorials. The project component will be flexible and will involve data collection, manipulation, and analysis. For further details on the course content, please refer to the DATA5000 Outline (pdf). This course is offered by the School of Computer Science at Carleton University.

Seminars are held every Thursday from 11:35 AM to 2:25 PM via Zoom (see Discord for links).

Announcementstop

  • We will be using Discord for course communication, announcements, and reminders. You will receive an invite to join our channel via email (please email instructor if you face any issues).
  • Project teams must be formed and emailed to the instructors no later than January 18.
  • Welcome to DATA 5000! Lectures start on January 12th.

Content Overviewtop

The course covers topics relevant to data science: working with data, exploratory data analysis, data mining, machine learning. The concepts are illustrated using the R language. Students also receive hands-on tutorials (e.g., Tableau, IBM Cognos Analytics). Students will be evaluated by their course projects.

Instructorstop

Olga Baysal
Email: olga.baysal[at]carleton.ca
Office: HP 5431
Office hours: by appointment via Zoom or Discord
Website: http://olgabaysal.com/

Ahmed El-Roby
Email: ahmed.elroby[at]carleton.ca
Office: HP 5433
Office hours: by appointment via Zoom or Discord
Website: https://people.scs.carleton.ca/~ahmedelroby/

Majid Komeili
Email: majid.komeili[at]carleton.ca
Office: HP 5436
Office hours: by appointment via Zoom or Discord
Website: https://people.scs.carleton.ca/~majidkomeili/

Teaching Assistanttop

Yingjun Dai
Email: yingjundai[at]cmail.carleton.ca
Office hours: TBA

Tentative Scheduletop

It is important to note that this schedule is evolving and will change based on how the class is progressing.

Thursday January 12, 2023 - Lecture 1: What is Data Science?

Thursday January 19, 2023 - Lecture 2: Working with Data.

Thursday January 26, 2023 - Lecture 3: Visualization and Exploration.

Thursday February 2, 2023 - Lecture 4: Data Mining and Machine Learning I.

Thursday February 9, 2023 - Lecture 5: Machine Learning II.

Thursday February 16, 2023 - Paper presentations.

Thursday February 23, 2023 - NO Class (Winter Break)

Thursday March 02, 2023 - Tableau Tutorial by Josh Gillmore.

Thursday March 9, 2022 - IBM Cognos Analytics Tutorial by Matthew Denham.

Thursday March 16, 2023 - Microsoft Tutorial by Mohamed Sharaf.

Thursday March 23, 2023 - Guest Lectures by Prof. Tracey Lauriault, School of Journalism and Communication, and Prof. Mohamed Al Guindy, Sprott School of Business.

Thursday March 30, 2023 - No Class (Poster presentations are on March 28 at Data Day 9.0).

Thursday April 06, 2023 - Project Presentations.

Evaluationtop

  • Paper presentation: 10% (paper selection due February 2)
  • Project proposal: 10% (due January 26, 11:59 PM)
  • Poster draft: 5% (due March 16, 11:59 PM)
  • Poster submission: N/A (due March 23, 11:59 PM)
  • Poster presentation: 15% (March 28)
  • Project presentation: 10% (April 6)
  • Project report: 50% (due April 13, 11:59 PM)

Paper presentationtop

Each group needs to choose a conference publication on the topic of Data Science to present in class (15 minute talk). Paper selection due February 2, 2023. A 8-12 page conference proceeding (e.g., IEEE International Conference on Data Science, SIGKDD/KDD Conference, etc.) will be approved by the instructor. Papers will be presented on February 16.

Project proposaltop

The project forms an integral part of this course. The project is to be completed in group of two-three students. Each group would have one technical expert (a student from Computer Science, Systems and Computer Engineering, Information Technology, Physics, Chemistry), and one or two domain expert(s) (e.g., from Communication, Geography, Biology, History, Psychology, Economics, Business, Health Sciences, Cognitive Science, Public Policy and Administration, International Affairs). Domain experts may contribute to finding the right problem, justifying why it is important to study it, extracting the value and implications of the work. Technical experts do the heavy lifting of building models. The main goal for students is to learn how to work on a multidisciplinary team, i.e., for domain experts, it is about learning technical terminology, while for technical experts, how to fruitfully work with domain experts.

You have two options: you can choose to mine and analyze one of the provided datasets or come up with an idea of your own that relates to the course material. In either case, the project topic will require my approval (via the project proposal).

Before you undertake your project you will need to submit a proposal for approval. The proposal should be short (max 2 page PDF in ACM format). The proposal should include a problem statement, the motivation for the project, and set of objectives you aim to accomplish. I will read these and provide comments. This will be due on January 26 by 11:59 PM via email to Olga, Majid, or Ahmed, respectively.

Poster drafttop

You would need to submit your poster draft including the structure of your poster and content (in PDF format). Instuctors will review posters and offer feedback. This will be due on March 16 by 11:59 PM via email.

Poster presentationtop

Each group will have the opportunity to present their project's posters during the Data Day 9.0 poster competition on March 28. The independent jury will evaluate posters and select winners.

Project presentationtop

Each group will have the opportunity to present their project in class on April 6. This presentation should take the form of a 20 minute (hard maximum) conference-style talk and describe the motivation for your work, what you did, and what you found. If a demo is the best way to describe what you did, feel free to include one in the middle of the talk. Please allocate 3-5 minute time for questions after the project has been presented.

The proposed structure of your presentation:

  1. Introduction (describe the problem and motivation)
  2. Research questions
  3. Methodology: data collection, data cleanup, data mining, data analysis (statistics, machine learning), etc.
  4. Results (achieved, preliminary, or anticipated)
  5. Implications (why does this study matter? how can your findings be used?)
  6. Conclusion (summary, main contributions)

Project reporttop

The required length of the written report varies from project to project (8-10 pages, double column format); all reports must be formatted according to the ACM or IEEE formats and submitted as a PDF. This report will constitute 50% of the course grade. This will be due on April 13 by 11:59 PM via email.

Datasetstop

Resourcestop

The following books are suggested but not required:

The following books are good references for data mining and machine learning algorithms:

The following are good references for R (just to name a few):

Contacttop

The best way to get in touch with instructor is via email: olga.baysal[at]carleton.ca, ahmedelroby[at]cunet.carleton.ca or majid.komeli[at]carleton.ca. However, for any public course related communication we will be using Discord DATA 5000 channel. For private messages, please email instructor directly or send a private message on Discord.

University Policiestop

Student Academic Integrity Policy

Academic Integrity is everyone's business because academic dishonesty affects the quality of every Carleton degree. Each year students are caught in violation of academic integrity and found guilty of plagiarism and cheating. In many instances they could have avoided failing an assignment or a course simply by learning the proper rules of citation. See Academic Integrity for more information.

Plagiarism

As defined by Senate, "plagiarism is presenting, whether intentional or not, the ideas, expression of ideas or work of others as one's own". Such reported offences will be reviewed by the office of the Dean of Science. Standard penalty guidelines can be found here: https://science.carleton.ca/academic-integrity/.

Academic Accommodations for Students with Disabilities

The Paul Menton Centre for Students with Disabilities (PMC) provides services to students with Learning Disabilities (LD), psychiatric/mental health disabilities, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorders (ASD), chronic medical conditions, and impairments in mobility, hearing, and vision. If you have a disability requiring academic accommodations in this course, please contact PMC at 613-520-6608 or pmc@carleton.ca for a formal evaluation. If you are already registered with the PMC, contact your PMC coordinator to send me your Letter of Accommodation at the beginning of the term, and no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). After requesting accommodation from PMC, meet with me to ensure accommodation arrangements are made. Please consult the PMC website for the deadline to request accommodations for the formally-scheduled exam (if applicable).

Religious Obligation

Write to the instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website.

Pregnancy Obligation

Write to the instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website.

Survivors of Sexual Violence

As a community, Carleton University is committed to maintaining a positive learning, working and living environment where sexual violence will not be tolerated, and survivors are supported through academic accommodations as per Carleton's Sexual Violence Policy. For more information about the services available at the university and to obtain information about sexual violence and/or support, visit: carleton.ca/sexual-violence-support.