DATA 5000: Introduction to Data Science

The course will be lecture-based and will also offer some hands-on tutorials. The project component will be flexible and will involve data collection, manipulation, and analysis. For further details on the course content, please refer to its outline (pdf). This course is offered by the School of Computer Science at the Carleton University.

Seminars are held every Monday from 8:35 AM to 11:25 AM in SA 311/SA 505.


  • Project reports are due April 11th by 11:59 pm.
  • Classes on March 28th and April 04th will be held in SA 311 (Olga's teams) and SA 505 (Boyan's teams). Please bring your laptops to the class!
  • Guest Lectures on March 14 an March 21 will be in HP 5345.
  • IBM Watson Analytics lecture is on Monday March 07 in RB 3201.
  • IBM SPSS lecture is on Monday February 22 in Tory 447.
  • Remember ot bring your laptops for IBM Lecture on Monday February 1 (HP 5345).
  • Project teams are formed and posted on Slack. Project proposals are due Monday January 25.
  • You should have received an invitation email to join Slack today (January 11). If you haven't, please email me with your email address info.
  • Project teams must be formed and emailed to instructor no later than January 17.
  • Welcome to DATA 5000! Lectures start on January 11th.

Content Overviewtop

The course covers topics relevant to data science: working with data, exploratory data analysis, data mining, machine learning. The concepts are illustrated using the R language. Students also receive introduction to IBM Cognos Workspace, IBM Watson Analytics and IBM SPSS Modeler. Students will be evaluated by their course projects.


Olga Baysal
Office: HP 5125D
Office hours: by appointment or via Slack

Boyan Bejanov
Office: N/A
Office hours: by appointment or via Slack

Tentative Scheduletop

It is important to note that this schedule is evolving and will change based on how the class is progressing.

Monday January 11, 2016 - Lecture 1

Monday January 18, 2016 - Lecture 2

Working with Data

Monday February 1, 2016 - IBM Cognos Workspace Tutorial (in HP 5345) by Michael Symchych.

Monday February 15, 2016 - NO CLASS (reading week)

Monday February 22, 2016 - IBM SPSS Tutorial (in Tory 447) by Rebecca Twose.

Monday February 29, 2016 - Lecture 5

Machine Learning II (data files: pts_2clusters.csv, MarketBasket.csv)

Introduction to NoSQL

Monday March 7, 2016 - IBM Watson Analytics Tutorial (RB 3201) by Mark Zlamal.

Monday March 14, 2016 - Guest Lectures (HP 5345)

  • Stephan Jou (Interset) (slides)
  • Prof. Alex Ramirez (Sprott) (slides)
  • Prof. Alex Wong (Biology) (slides)

Monday March 21, 2016 - Guest Lectures (HP 5345)

  • Prof. Merlyna Lim (Journalism and Communication) (slides)
  • Prof. James Green (Systems and Computer Engineering) (slides)

Monday March 28, 2016 - Project Presentations (SA 311 - Olga's teams; SA 505 - Boyan's teams)

SA 311:

SA 505:

Monday April 4, 2016 - Project Presentations (SA 311 - Olga's teams; SA 505 - Boyan's teams)

SA 311:

SA 505:


  • Project proposal: 10%
  • Presentation outline: 10%
  • Project presentation: 30%
  • Project report: 50%

Project proposaltop

The project forms an integral part of this course. The project is to be completed in group of two students.

You have two options: you can choose to mine and analyze one of the provided datasets or come up with an idea of your own that relates to the course material. In either case, the project topic will require my approval (via the project proposal).

Before you undertake your project you will need to submit a proposal for approval. The proposal should be short (max 2 page PDF in ACM format). The proposal should include a problem statement, the motivation for the project, and set of objectives you aim to accomplish. I will read these and provide comments. This will be due on January 25 by 11:59 PM via email.

Presentation outlinetop

You would need to submit your project presentation outline describing the structure of your slides and preliminary content (in PDF format). This will be due on March 17 by 11:59 PM via email.

Project presentationtop

Each group will have the opportunity to present their project in class on March 28 or April 04 . This presentation should take the form of a 20 minute (hard maximum) conference-style talk and describe the motivation for your work, what you did, and what you found. If a demo is the best way to describe what you did, feel free to include one in the middle of the talk. Please allocate 3-5 minute time for questions after the project has been presented.

The proposed structure of your presentation:

  1. Introduction (describe the problem and motivation)
  2. Research questions
  3. Methodology: data collection, data cleanup, data mining, data analysis (statistics, machine learning), etc.
  4. Results (achieved, preliminary, or anticipated)
  5. Implications (why does this study matter? how can your findings be used?)
  6. Conclusion (summary, main contributions)

Project reporttop

The required length of the written report varies from project to project (8-10 pages, double column format); all reports must be formatted according to the ACM format and submitted as a PDF. This report will constitute 50% of the course grade. This will be due on April 11 by 11:59 PM via email.



The following books are suggested but not required:

The following books are good references for data mining and machine learning algorithms:

The following are good references for R (just to name a few):


The best way to get in touch with instructor is via email: olga.baysal[at] or boyanbejanov[at] However, for any public course related communication we will be using Slack DATA 5000 channel. For private messages, please email instructor directly.

University Policiestop

Academic Integrity

Academic Integrity is everyone’s business because academic dishonesty affects the quality of every Carleton degree. Each year students are caught in violation of academic integrity and found guilty of plagiarism and cheating. In many instances they could have avoided failing an assignment or a course simply by learning the proper rules of citation. See the academic integrity for more information.

Academic Accommodations for Students with Disabilities

The Paul Menton Centre for Students with Disabilities (PMC) provides services to students with Learning Disabilities (LD), psychiatric/mental health disabilities, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorders (ASD), chronic medical conditions, and impairments in mobility, hearing, and vision. If you have a disability requiring academic accommodations in this course, please contact PMC at 613-520-6608 or for a formal evaluation. If you are already registered with the PMC, contact your PMC coordinator to send me your Letter of Accommodation at the beginning of the term, and no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). After requesting accommodation from PMC, meet with me to ensure accommodation arrangements are made. Please consult the PMC website for the deadline to request accommodations for the formally-scheduled exam (if applicable).

Religious Obligation

Write to the instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website.