COMP 5117: Mining Software Repositories

Software development projects generate impressive amounts of data. Mining software repositories research aims to extract information from the various artifacts produced during the evolution of a software system and inferring the relationships between them. This course will introduce the methods and tools of mining software repositories and artifacts used by software developers and researchers. The course will be seminar-based and will involve weekly reading and discussion. The project component will be flexible but will likely involve some programming. For further details on the course content, please refer to its outline (pdf). This course is offered by the School of Computer Science at the Carleton University.

Seminars are held every Tuesday from 11:35 AM to 2:25 PM via Zoom Meeting (meeting details are posted on Slack).


  • Submit your paper review (due 11:59 PM every Monday; latest by 11:00 AM on Tuesday).
  • Please send me your paper selection list (minimum 3-5 papers) by Tuesday September 22, 2020.
  • We will be using Slack for course communication, news and reminders. Please join COMP5117 channel.
  • Welcome to COMP5117! Our lectures start on Tuesday September 15, 2020.

Content Overviewtop

The course will be adjusted according to students’ interests and experience. This is an overview of the kinds of topics the course could cover:

  • Mining software repositories (data extraction and analysis)
  • Development team processes
  • Software development tools and environments
  • Software analytics
  • Software visualization
  • Mining social data
  • Software evolution
  • Quantitative and qualitative evaluation of software engineering research

Tentative Scheduletop

It is important to note that this schedule is evolving and will change based on your interests and how the class is progressing.

Tuesday September 15 - Introduction

  1. Introduction to the course.
    Presented by Olga Baysal

Tuesday September 22 - Vision

  1. Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. The State of the ML-universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. MSR 2020.
    Presented by Nicholas Dmytryk.
  2. Nicolas Gold, Jens Krinke. Ethical Mining – A Case Study on MSR Mining Challenges. MSR 2020.
    Presented by Shriya Satish.
  3. Rolf-Helge Pfeiffer. What constitutes Software? An Empirical, Descriptive Study of Artifacts. MSR 2020.
    Presented by Geetika Sharma.

Tuesday September 29 - Program Comprehension

  1. Sarah Fakhoury, Devjeet Roy, Adnan Hassan, and Venera Arnaoudova. "Improving Source Code Readability: Theory and Practice". ICPC 2019.
    Presented by Yaqing Zhu.
  2. Nischal Shrestha, Colton Botta, Titus Barik, and Chris Parnin. Here We Go Again: Why Is It Difficult for Developers to Learn Another Programming Language? ICSE 2020.
    Presented by Zixun Xiang.
  3. Shubhankar Suman Singh and Smruti R. Sarangi. SoftMon: A tool to compare similar open-source software from a performance perspective. MSR 2020.
    Presented by Saranya Kamangode.

Tuesday October 06 - NO CLASS

Tuesday October 13 - Mining Social Media Data

  1. Ahmad Abdellatif, Diego Costa, Khaled Badran, Rabe Abdalkareem, Emad Shihab. "Challenges in Chatbot Development: A Study of Stack Overflow Posts". MSR 2020.
    Presented by Projna Saha.
  2. Hongbo Fang, Daniel Klug, Hemank Lamba, James Herbsleb, Bogdan Vasilescu. "Need for tweet. How open-source developers use Twitter to talk about their GitHub work". MSR 2020.
    Presented by Ravisha Sharma.
  3. Ross C. Phillips, Denise Gorse. "Mutual-Excitation of Cryptocurrency Market Returns and Social Media Topics". ICFET 2018.
    Presented by Davoud Saljoughi.

Tuesday October 20 - Bugs and Failures

  1. João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, Juliana Freire. "A large-scale study about quality and reproducibility of Jupyter notebooks". MSR 2019.
    Presented by Rezwan Hassan Khan.
  2. Yiwen Wu, Yang Zhang, Tao Wang, Huaimin Wang. "A Tale of Docker Build Failures: A Preliminary Study". MSR 2020.
    Presented by Srikanth Bandapally.

Tuesday October 27 - NO CLASS (Reading Week)

Tuesday November 3 - Mining Android Data

  1. Paolo Calciati, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller. "Automatically Granted Permissions in Android apps". MSR 2020.
    Presented by Sreeram Sankarasubramanian.
  2. John Jenkins, Haipeng Cai. "Leveraging Historical Versions of Android Apps for Efficient and Precise Taint Analysis". MSR 2018.
    Presented by Pushkar Thakkar.
  3. Haoyu Wang, Hao Li, Li Li, Yao Guo, Guoai Xu. "Why are Android Apps Removed From Google Play? A Large-scale Empirical Study". MSR 2018.
    Presented by Komal Nayyar.

Tuesday November 10 - Collaborative Development

  1. Viktoria Stray; Nils Brede Moe; Mehdi Noroozi. "Slack Me If You Can! Using Enterprise Social Networking Tools in Virtual Agile Teams". ICGSE 2019.
    Presented by Hema Sri Kambhampati.
  2. Reed Milewicz, Gustavo H Pinto, Paige Rodeghero. "Characterizing the roles of contributors in open-source scientific software projects". MSR 2019.
    Presented by Saneer Gera.
  3. Mohamad Mortada, Hamdy Michael Ayas, Regina Hebig. "Why do Software Teams Deviate from Scrum? Reasons and Implications". ICSSP 2020.
    Presented by Shriya Nitin Singhania.

Tuesday November 17 - Sentiment Analysis

  1. Nicole Novielli, Daniela Girardi, F. Lanubile. "A Benchmark Study on Sentiment Analysis for Software Engineering Research". MSR 2018.
    Presented by Lakshmi Prasanna.
  2. Eeshita Biswas, K Vijay-Shanker, Lori L Pollock. "Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts". MSR 2019.
    Presented by Alekhya Ketharaju.

Tuesday November 24 - Machine Learning/Deep Learning

  1. Hadhemi Jebnoun, Houssem Ben Braiek, Mohammad Masudur Rahman, and Foutse Khomh. "The Scent of Deep Learning Code: An Empirical Study". MSR 2020.
    Presented by Ummey Tanin.
  2. Tyson Bulmer, Lloyd Montgomery, Daniela Damian. "Predicting Developers’ IDE Commands with Machine Learning". MSR 2018.
    Presented by Nerosha Senthil Kumar.
  3. Saikat Mondal, Mohammad Masudur Rahman, Chanchal K. Roy. "Can Issues Reported at Stack Overflow Questions be Reproduced? An Exploratory Study". MSR 2019.
    Presented by Nawab Haider Ghani.

Tuesday December 1 - Mixed Topics (Technical Debt, Games, Vulnerability Prediction)

  1. Saulo Soares de Toledo, Antonio Martini, Agata Przybyszewska, Dag I.K. Sjøberg. "Architectural Technical Debt in Microservices: A Case Study in a Large Company ". TechDebt 2019.
    Presented by Srivathsan Morkonda.
  2. Luca Pascarella, Fabio Palomba, Massimiliano Di Penta, Alberto Bacchelli. "How Is Video Game Development Different from Software Development in Open Source?". MSR 2018.
    Presented by Ikram Hussain.
  3. Chen Yang, Andrew Santosa, Ang Ming Yi, Abhishek Sharma , Asankhaya Sharma, David Lo. "A Machine Learning Approach for Vulnerability Curation". MSR 2020.
    Presented by Dafei Zhao.

Tuesday December 8 - Project Presentations (15 minutes for each team)


  • Weekly paper reviews: 10%
  • Class participation and discussion: 20%
  • Paper presentation: 10%
  • Course Project: 60% (10% project presentation + 50% project report)

Weekly Paper Reviewstop

Each week you are expected to carefully read two papers. In addition, you are to submit a review of one of the papers (you choose which one). However, if you are doing a paper presentation, then you are excused for that week.

Reviews are due by 11:00 AM on the morning of the class. Please send me email with the Subject "[COMP 5117] Paper Review Student_Name".

A review should be about 500-1000 words long, and submitted as a PDF file.

Your review should address the following points:

  1. What were the primary contributions of the paper as the author sees it?
  2. What were the main contributions of the paper as you (the reader) see it?
  3. How does this work move the research forward (or how does the work apply to you)?
  4. How was the work validated?
  5. How could this research be extended?
  6. How could this research be applied in practice?

Class Participationtop

Each week you are expected to read all presented papers, as well as participate in the class discussion.

Paper Presentationstop

In a typical week, we will examine two or three research papers. I will present a few of them on my own, but the other presentations will be done by students.

You will get to select three to five papers you want to present from the course (in the order of your first to last preferences). Please make your selections from the proceedings of the MSR or ICSE conferences (2018-2020): MSR 2020, MSR 2019, MSR 2018, ICSE 2020, ICSE 2019, ICSE 2018. Once you have selected your papers, email me your selection of three papers.This must be done by Tuesday September 22 via email. I will generate a cohesive class schedule once everyone has selected their papers. Each student will be assigned to present one or two papers in class depending on the class size.

You are then to design a presentation of about 20-25 minutes that is both informative and entertaining. Don't feel limited to just the content of the papers.

You should also come prepared with a set of questions to foster a 15-20 minute discussion session that you will lead to follow the presentation (this is where the other students earn their class participation marks).

When you design your talk, keep in mind that the audience has already read the papers. Remind us of the motivation, the big ideas, the context of the problem being addressed, and how all of this relates to what we've already seen in the course.

Presentations can be done using Open Office, Powerpoint, Keynote, or PDF. You must share a set of slides (only PDF) to the Slack channel prior your talk.

Course Projecttop

The project forms an integral part of this course. The projects can be done individually or completed in groups of two students.

You have two options: either create a submission for the 2021 MSR Mining Challenge or come up with an idea of your own that relates to the course material. In either case, the project topic will require my approval (via the proposal).

If you decide to do the MSR challenge, you can optionally decide to submit it to the conference, but note that the deadline is February 2021. Talk to me if you are interested in exploring this. Otherwise, you can just decide to do the challenge as your class project and ignore the actual conference submission.

There are three deliverables for your project:

  1. Project proposal. Before you undertake your project you will need to submit a proposal for approval. The proposal should be short (max 2 page PDF in ACM format). The proposal should include a problem statement, the motivation for the project, and set of objectives you aim to accomplish. I will read these and provide comments. The proposal is not for marks but must be completed in order to pass the course. This will be due on September 29 by 11:59 PM via email.

  2. Written report. The required length of the written report varies from project to project (8-10 pages, double column format); all reports must be formatted according to the ACM format and submitted as a PDF. This report will constitute 100% of the project report grade. This will be due on December 15 by 11:59 PM via email.

  3. Project presentation. Each group will have the opportunity to present their project in class on December 08. This presentation should take the form of a 15 minute (hard maximum) conference-style talk and describe the motivation for your work, what you did, and what you found. If a demo is the best way to describe what you did, feel free to include one in the middle of the talk. Please allocate 3-5 minute time for questions after the project has been presented.
  4. The proposed structure of your presentation:

    1. Introduction (describe the problem and motivation)
    2. Research questions
    3. Methodology: data collection, data cleanup, data mining, data analysis (statistics, machine learning), etc.
    4. Results (achieved, preliminary, or anticipated)
    5. Implications (why does this study matter? how can your findings be used?)
    6. Conclusion (summary, main contributions)


The best way to get in touch with me is via email: olga.baysal[at]

University Policiestop

Academic Integrity

Academic Integrity is everyone's business because academic dishonesty affects the quality of every Carleton degree. Each year students are caught in violation of academic integrity and found guilty of plagiarism and cheating. In many instances they could have avoided failing an assignment or a course simply by learning the proper rules of citation. See the academic integrity for more information.

Academic Accommodations for Students with Disabilities

The Paul Menton Centre for Students with Disabilities (PMC) provides services to students with Learning Disabilities (LD), psychiatric/mental health disabilities, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorders (ASD), chronic medical conditions, and impairments in mobility, hearing, and vision. If you have a disability requiring academic accommodations in this course, please contact PMC at 613-520-6608 or for a formal evaluation. If you are already registered with the PMC, contact your PMC coordinator to send me your Letter of Accommodation at the beginning of the term, and no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). After requesting accommodation from PMC, meet with me to ensure accommodation arrangements are made. Please consult the PMC website for the deadline to request accommodations for the formally-scheduled exam (if applicable).

Pregnancy Obligation

Write to the instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website.

Religious Obligation

Write to the instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website.