Automation for Git Repositories
When I look at my own (open source) work, how I use Git repositories has changed dramatically over the past two years.
In the past, a repository was simply a place for source code. It was one of many tools used to organize a project. But with GitHub, more and more functionality moved into a central place. Now a repository is also the home of issues, discussions, and even the project's website.
Additionally, configuration-as-code has been a growing trend. And these configurations have moved into the repository as well. Pipelines for continuous integration. Deployment configurations for Heroku or other platforms. Configuration for a wide variety of bots that automate development tasks.
What I find myself with is thus a growing number of repositories, each with more and more configuration files. As someone who values consistency and a great developer experience, I want these to be up-to-date. When I learn how to improve my CI in one project, I want to update every other project as well.
I started to think about this problem as templates that I want to keep up-to-date. But while toying with the idea, I noticed that this is actually just a concrete implementation of a generic algorithm. What I really want is to take an action based on a trigger. So when a file is updated in one repository (the trigger), I want to copy it to another (the action). Some actions I only want to perform sometimes, which I can control through a condition. For example, only perform the action when the default branch is updated.
What I want to build over the next few weeks is a service that can do that.
Features of a Prototype
Given that time is limited, I want to be strategic in the design and implementation of the product. There are a lot of things that I can do, but what is it that I need to do?
For a functional product, at least the following features must work:
- Allow users to configure workflows
- Subscribe to events from platforms like GitHub
- Execute workflows when their conditions are met
Each of these have varying levels of complexity based on the implementation. For example, it would be great to have a graphical user interface to configure the workflow. But then again, it is also possible to define the workflow in a configuration file. The latter can be build in much less time, and is thus probably a better solution for this stage.
Besides these functional requirements, there are also non-functional requirements. One category of those are restrictions and recommendations imposed by the platforms. GitHub, for example, provides a few best practices with its Probot framework. One of them is to keep the configuration as code in the repository whenever possible. These also need to be considered when designing the features.
Features of a Future Product
While I am limited in what I can build now, I still want to consider what the product could look like in the far future. I have to be careful not to fall into the trap of premature optimization, though. I want the vision to guide my design, without forcing me into a product that is too complex for the current moment.
In yesterday's post, I mentioned that I see two long-term futures for this project. The first is open sourcing the project once I am done or running out of time. The second is offering the product as a service so that I can learn more about business as well. While there are some differences, they share a lot of common features.
Allowing the users to run the software themselves is an important feature for both an open source or a commercial product. Given the nature of the tool, it requires far-reaching access to source code. This might be an acceptable risk for open source or side projects. But others might want to run the software themselves to ensure only they have access to their code.
Self-hosting limits how the application can be built. A monolith is better than a complex microservice architecture. A widely available database like PostgreSQL better than vendor-dependent solutions like DynamoDB. Both make it easier for someone else to pick up the product and deploy it to their own infrastructure.
Graphical User Interface
A graphical user interface can greatly improve the configuration of the software. It shortens the feedback loop, and can highlight information when it is relevant to the user. For example, validations can catch mistakes in real-time. The user no longer has to wait for the software to run and report an issue. And the UI can surface features that the user might not even know existed.
A UI also makes it possible to manage workflows across repositories. Instead of having to copy configuration files, a user can simply click a button to add or remove a workflow from a repository. When viewing a workflow, a user can quickly see where it is being used. This is especially valuable for larger organizations with many repositories.
Triggers, conditions, and actions will probably be hard-coded in the first version of the product. Users can combine them in workflows, but they cannot define their own. This should be fine for most projects, but power users will run into limitations. Since this is a product for developers, it is only logical to allow them to create their own actions.
How to execute custom code in a safe and responsible way, I don't know. This makes this a feature for later.
Most of the requirements and constraints don't matter for the technical architecture. The idea is generic enough that it can be implemented with any web framework, be it in Ruby, Rust, or Node. Some things will be easier in one tool compared to the others, but overall it balances out.
To me, the biggest challenge is building a secure product that follows the platform guidelines.
On GitHub, for example, it is convention to have the configuration as code in the repository. This works well for workflows that only touch a single repository. But what about organization-wide automation? Where does their configuration go?
Another concern is honoring access controls. A company might use code reviews to enforce the four-eyes principle. What does this mean for workflows that are created outside of GitHub or GitLab? Do they need to be reviewed as well? Does every execution of a workflow need to be reviewed?
These are far more challenging questions than the technical implementation. These kinds of questions are probably the reason we have so much configuration-as-code now. Adding a workflow to the configuration file automatically follows the same process as any other code change.
Goals for Today
I will be spending the rest of the day thinking about the questions above. Configuration-as-code seems like absolutely the right path for a variety of reasons. But the whole project was started because managing files across repositories is painful. So how would a good user experience look like?
There are two ideas specifically that I want to explore. The first is centralizing the configuration somewhere, for example in a dedicated repository. The other using the automation I am building to also manage the configuration.
Day 2 will be a tough one.