Code review as part of a research workflow

2020-Jan-28

We have adopted a process of code review, in which research code is evaluated by other lab or project members before entering the primary code base. Here, I describe this approach and describe its advantages (and some challenges) and give some tips on its implementation.

Research projects in my lab involve a lead researcher (typically an honours student, PhD student, or postdoc) and myself as supervisor, with also potentially research assistants and collaborators. The key elements of the implementation of such projects are:

Code that prepares experiments (such as generating stimuli and running simulations).
Code that runs the experiments and stores the data.
Code that analyses the data and produces textual and visual summaries.
Manuscripts and conference material (slides and posters) that report the project.

In typical research projects, active review and acceptance of the elements by multiple people on the project only occurs for the final stage—in which the project is reported. However, there would also seem to be great potential benefits to a review process in the previous stages, in which the project is developed and implemented.

Here, I will describe and reflect upon our adoption of such a code review workflow.

Repositories and code review

In our project organisation system, each of the project elements has a repository under version control using git and stored on github under our lab. The main repository state, called main, is protected and cannot be directly changed. Instead, alterations are made within a specific branch, which can then be incorporated into main via a pull request. As part of the pull request, another member of the project is invited to review the proposed changes. The reviewer can add comments and questions and request changes, often leading to a set of iterations within the pull request. When the reviewer is satisfied with the changes, they mark them as accepted and merge them into the main state.

For example, consider a situation in which the stimuli for an experiment are being created:

The lead researcher and the supervisor discuss the requirements, and the lead researcher is assigned the task of writing the code to generate the stimuli.
They would start by creating a new branch, perhaps called stim_gen, and would then add and commit the new functionality into this branch.
They then create a pull request, which notifies the supervisor of the changes and requests their review.
The supervisor can then update their code with the contents of the stim_gen branch, allowing them to read and run the changes locally. They might spot that an aspect of the generated stimuli is not likely to be desirable—perhaps a parameter needed tweaking—so they leave a comment on the relevant section of code.
The author can then view the comment, and can either justify their choice or make a change and update the state of the branch.
Once satisfied, the review can accept the changes and merge them into the main state—where they can then be used to generate the stimuli for the experiment, with the lead researcher and the supervisor having shared knowledge of and responsibility for their characteristics.

Advantages

This might seem like a complicated sequence of events, and one might wonder why they would introduce this complexity into their workflow. Here are what I consider to be a few key advantages.

Training

Students do not typically begin research with much experience in computer programming, and must acquire such skills during their research training. Code review is a great way to gradually boost skills and confidence in the necessary programming concepts. In the beginning, the student can serve as a reviewer of code written by others on the project. This allows them to see a direct connection between the requirements and goals of their specific research and how they are met using code—this is typically a much more productive and stimulating way of learning than following generic instructional material. Importantly, they retain a connection to the project and feeling of ownership in a way that they may not if the code was not visible to them or their input on key implementation decisions was not sought.

As students advance in their skills, they can transition towards writing code and creating pull requests for their supervisor or others on the project to review. The review process allows them to have confidence that the responsibility for producing correct code is not theirs alone, and to gain feedback that will enable them to rapidly advance their skills. Furthermore, this also allows the supervisor to balance the need for students to have the autonomy and freedom to direct and implement their projects and the requirement as supervisor to have responsibility and oversight for the project and its integrity.

Collaboration

Research projects are often done with outside collaborators—that is, with people that are not within the group of the lead researcher and the supervisor. These collaborators typically bring some specific expertise to a project, and often that expertise is technical. This can create a challenge in ensuring that all of the project members are in agreement on the details of a project.

Code review can help with this challenge by allowing the lead researcher and/or the supervisor to inspect the contributions of the collaborator before they enter the main code base. Even if the collaborator is not part of the code review process, it can still be useful for the within-lab researchers to introduce the code via a pull request—this can facilitate discussions around the suitability of the code and the understanding of its functionality.

Communication

Code is often the most fundamental source of information about a research project, and we are encouraged to make code available alongside publications about the associated research for inspection by the broader community. The usefulness of such code is likely to be related to how readily it can be comprehended.

Having a review process means that the code has already been written with the ability for it to be understood in mind, since the reviewer must be able to follow and understand the code in order to approve it. It is likely that this automatically causes the code to increase in readability. Indeed, I have found that knowing that someone else will be trying to understand my code can have dramatic effects on its resulting structure—even more so when I know that a specific person will be looking at it in detail.

Disadvantages and challenges

It is important to note a few disadvantages and challenges in adopting a code review workflow:

It is another thing for students to learn. Students are confronted with an array of methods, techniques, and skills that are not directly related to their research domain. Each will involve a trade-off with time that could be spent on other things. In my experience, the time that students spend learning and applying version control and code review is worthwhile.
The review process must be handled with care. As with all forms of feedback, it is important that reviewers (and authors) be encouraging and empathetic in their interactions. This is particularly important for coding, with many likely to feel trepidation in participating and receiving feedback while learning.
The review interface could be improved. Using the github platform, some desirable aspects of a review are currently missing or clunky—for example, commenting on code that has not been changed by a pull request and commenting on images.

Tips for implementing a code review process

On balance, I consider a code review process to be a positive and beneficial incorporation into a research workflow. Here are a few tips and suggestions if you are considering implementing such processes:

Consider starting simply. If you are not currently using a version control system and an interface like github, you can still get started with a code review process by simply sharing files.
Enforce a consistent coding style. We use black to automatically convert Python code into a prescribed style. This makes reviewing code much easier, and removes any discussion of stylistic preferences.
Seek and provide code review. Perhaps you are a student and are in a lab in which a code review process is not available or feasible. You might want to consider reaching out to some of your peers and offering to review their code, or asking them to review yours (best to run this by your supervisor and others on the project beforehand). You often do not need to be a domain expert to be able to give feedback on potential bugs or points of misunderstanding. Being a trusted reviewer, and having trusted reviewers, can be a great motivation for writing the code that will allow you to implement your research agenda.

Summary

Code review is a process in which other members of a research team examine and evaluate code before it is incorporated into the primary code base. We suggest that code review can provide significant advantages in research training, collaboration, and communication. Considered along with some disadvantages and challenges, I think that code review is a worthwhile addition to a research workflow.

Acknowledgements

Thanks to the members of my lab for their willingness to apply a code review process. Thanks in particular to Lindsay Peterson, who gave a talk on this topic at ResBaz Sydney 2019—you can watch Lindsay’s 5-minute ‘lightning’ talk on “Code review to enhance research training, collaboration, and communication” (begins at 27:30), which gives an overview of many of the points raised here.

Resources of interest

Awesome code review
Blischak, Davenport, & Wilson (2016) A Quick Introduction to Version Control with Git and GitHub, PLoS Computational Biology, 12(1), e1004668.
Perez-Riverol, Y. et al. (2016) Ten Simple Rules for Taking Advantage of Git and GitHub, PLoS Computational Biology, 12(7), e1004947.