The code quality pyramid

January 27, 2023

Last Update: February 15, 2023

For discussion and comments please contact me or use this thread on Reddit.

Introduction
The pyramid
The layers
Summary

Introduction

One of the problems in software development is dividing up work into tasks and choosing which ones to do first. In complex systems, it is not trivial to decide on which changes to prioritize since the consequences of changes are often hard to foresee. Especially during a process of refactoring a plan is needed to ensure the efforts lead in the right direction.

To make such planning easier, I present a visual way of ranking the dimensions of the quality of a software system. The code quality pyramid described below can be used to classify already identified issues and steer decisions on which problems to tackle first. Practical tips are also provided, which can be used as starting points for improvements.

The pyramid

The various qualities of software systems can be grouped and seen as hierarchical layers in a pyramid where lower parts of the pyramid support the parts further up. Improvements at the bottom lead to stabilization and gains at the top. These gains are not only technical but also translate into more freedom in changing the software in the future. Work on top is important as well, features are situated there after all, but it does not directly contribute to the quality of the entire system.

The code quality pyramid

It is possible to consult the pyramid when planning a larger architectural refactoring and see which layers would benefit from a certain change. Issues in the backlog can be checked against this model to find out whether solving them will lead to more agility when solving other issues in the future. If, for example, a refactoring eliminates a component or makes it so generic and reusable that it never needs to be adjusted again, then build performance will be improved, since that build will not be needed anymore. If refactoring allows a component to be mocked or simulated more easily, it benefits testability.

The layers

Build performance

Almost every task in development includes a build or deployment step at some point, which is why having a stable and performant build system will save time in all other parts of the software development process:

Faster CI/CD deployments mean manual testing work can start earlier.
When developers work on code, the performance of the build determines how often they can adjust and run it per hour. Fast local builds mean they can evaluate more ideas in the same amount of time and restart tests within a couple of seconds when fixing errors.

Long build times, on the other hand, create problems:

Tasks will be delayed, and their completion will be shifted from one day to the next day.
While waiting for builds developers need to start other work. This requires them to context switch, which is the mental work needed to prepare for a different task while multitasking. Context switching and interruptions have been shown to reduce the effectiveness of developers drastically. According to various studies a programmer needs about 10-15 minutes to resume work after an interruption.

For these reasons it is imperative to ensure that builds are correct, reproducible, and very fast.

Tips for developers on improving build performance

Use the latest and fastest build system for your language, e.g. esbuild for JavaScript or Gradle for Java.
Leverage caching mechanisms to save and restore compilation output between runs.
Use incremental compilation. This enables faster local testing since only changed code needs to be recompiled.
Use the profiler which is available in many build systems to further identify and investigate performance hotspots.

Test performance

Tests are the backbone of a development process that produces reliable software. Proper testing not only enables changes to be introduced in a safe and straightforward manner, but the written tests can also serve as implicit documentation and specification of the code's behavior.

An example of software backed by thorough testing is SQLite. It is widely deployed and embedded in all phones and almost all operating systems. SQLite has about 150.000 lines of code, whose functionality is secured by 92 million lines of test code. The only way to develop and maintain a test suite like that is to be able to run it fast. SQLite's "veryquick" test suite contains 300.000 test cases and runs in a couple of minutes.

Improving test performance is needed so developers can write more tests and they can write them faster, making your software increasingly more robust and reliable.

Tips for developers on improving test performance

Use a profiler to find slow code paths.
Refactor tests so that they can be run independently and in parallel.
Improve test startup. In Java, spring reflection is a common culprit that slows down test execution.
Use defunctionalization to abstract behavior. The concept is described in detail in the next section about testability. Defunctionalization makes it possible to convert slow integration tests into fast-running unit tests.
Allocating memory takes time. Do not blindly raise the maximum memory of your processes to a couple of gigabytes (-Xmx in Java), instead profile your tests to see how much memory they actually need. Java's FlightRecorder can show you where memory is allocated.
On Windows, disabling antivirus scanning for source code directories tends to boost performance, especially in projects with many source files.

Testability

The most difficult problems in a codebase are those that cannot be reproduced, so, after improving test performance, it is important to think about what you are able to test in the first place. The standard workflow of fixing a bug starts by reproducing it, which means writing a test for it. The developer needs to recreate the circumstances in which an error occurs so that the problem can be understood. If a bug cannot be reproduced, it's still possible to build a mental model of what you think might be happening and implement a fix, but you can never actually be sure whether your assumptions and therefore the bug fix are correct. Reproducibility is one of the most important principles in science and it should be applied to software development as well.

Unfortunately, this can be tricky. Testing would be easy if all programs were simple command line tools with input and output and no external requests or dependencies, but modern systems typically consist of many independent components working together. Examples of environments that are hard to simulate are mobile apps, distributed systems like a network of microservices, or systems involving hardware. Bugs occurring in such applications are often fixed without ever reproducing them.

Nevertheless, it can be done, and it is essential to extend testing to include conditions in such environments as well. You should model your system in a way that makes it possible to write a reproducing automatic test for every bug that occurs and only fix it afterward. This is especially important in production because an error occurring there means it somehow slipped all the stages, reviews, and testing before. If an error made it this far, there's a good chance, that it is an example of an entire category of problems you are not able to rule out by testing yet. If you find that you cannot test for a certain bug you need to go back to the drawing board to find a way to abstract your system so that you can recreate and test for it in the future.

The benefit of having a robust system for abstraction, simulation, and testing in place is that you can even recreate issues that are normally notoriously hard to track down, like interactions between distributed servers or race conditions where the behavior of the system depends on the timing of certain events. Since modern backends are often split up into microservices, these kinds of issues are becoming more common and you will be well-equipped to deal with them.

Defunctionalization

A useful way of abstracting behavior is defunctionalization. In the context of a program, it means splitting a function with complex behavior into a function that returns a static description of that behavior and another function that runs that description. This decouples the system, so you can write tests checking descriptions without actually executing them. In the context of microservices, this can be done by creating a workflow specification that describes which system is contacted at which steps of the way. The specification then can be interpreted by a process engine and for testing, another engine can be used which simulates the real services and provides custom responses for the scenarios you want to test.

Tips for developers on improving testability

For every bug that occurs in production, write a matching test case that at least partially reproduces the bug before fixing it.
Find a way to run end-to-end tests of your system, in which you can simulate conditions that are normally ignored like network problems or software crashes. OkHttp MockWebserver and Wiremock can help with this.

This does not necessarily mean classical end-to-end test-frameworks that simulate browsers or devices. Such frameworks can be very useful for smoke testing, but they tend to be rather slow, which contradicts the goals of the prior test performance layer in the pyramid.

A good mental model for splitting systems into logical parts that can be tested separately is ports and adapters architecture (originally known as "hexagonal architecture").
Use defunctionalization to split programs into description and execution, then run tests against the description.
If possible don't test against mocks but real services instead. Mocks require unrealistic custom code for configuration, that is hard to maintain. Testcontainers can spin up instances of your real services for testing.
Make it easy to generate test data, so developers don't lose time when developing a new test. Custom domain-specific languages or builders are very useful here, even more so than in normal code, since in tests it is often necessary to create complex object hierarchies with slight variations. DSLs are great for doing this in a concise way. There are also many tools that automatically generate test data.
If you have an app or program, implement a debug mode, which you can use to simulate crashes or network outages. You can even provide an offline mode, by shipping a mocked abstraction of your API within your app, that answers requests with test data. That way you can run the app without a network connection or backend implemented.
The documentation about How SQLite Is Tested is very well written and might give you more ideas to improve testability in your project.

Component code quality

While the code quality pyramid describes the system as a whole and how improvements should be structured and prioritized, this layer is about code quality in the classical sense, focused on how individual components should be written.

To prevent bugs from making it into a codebase in the first place it is necessary to ensure that code is well-written and maintainable. A good starting point for measuring code quality is static analysis, which is available for most languages and platforms. A recent study shows that improving code quality leads to a considerable reduction of development time while resolving issues. The study found that:

Low-quality code contains 15 times more defects than high-quality code.
Resolving issues in low-quality code takes on average 124% more time in development.
Estimations of development time in low-quality code are much less predictable. The maximum time spent on an issue is 9 times longer compared to high-quality code.

The rules of these tools are highly configurable. Unfortunately, just enabling and adhering to all available rules will not automatically lead to better code, because measures and principles that are important in one platform or language, are less important in others and this might even depend on the project. Applying static analysis too rigorously can be problematic and take time away from larger refactorings that might be more worthwhile. Maintainers of the already mentioned SQLite library say that static analysis proved to be less valuable compared to testing and actually introduced new bugs in a number of cases.

Using the most severe or highest category of rules in your static analysis toolchain is a good rule of thumb because these rules tend to make sense in any project.

Tips for developers on improving component code quality

Fix your Technical clutter
Use a null-safe language if possible to avoid NullPointerExceptions, the billion-dollar mistake.
Use static analysis to check for issues in the highest category of your ruleset and eliminate the blocker and major bugs. SonarQube is the industry standard when it comes to static analysis, but also check out tools that are specialized in findings bugs like SpotBugs.
Update your dependencies often, so you won't fall behind and get the latest improvements.
Don't use too many dependencies. Using a couple of well-maintained libraries like Guava in Java will give you all the utility classes you need.
Less code is more. If you can generate something, generate it. If a feature toggle is not necessary, remove it. If a configuration setting can be derived from another, derive it.
Many bugs can be caught by the generous usage of assertions, like "checkState" from Guava's Preconditions. Assertions are an excellent way of noting down the assumptions within your thought process while writing code. When implementing a tricky method is it not unusual that edge cases come up. Thinking about these blocks the development flow and can be worked around using assertions. If you aren't sure about which possible values a parameter or variable can assume, just add an assertion and the code will fail should it be any different. Used with a proper error message, assertions are far more useful than comments, because they are integrated into the code and cannot get outdated.
Avoid "primitive obsession", which is the over-usage of primitive parameter types. Having methods with many boolean or String parameters instead of domain-specific value objects leads to mistakes where parameters are being confused. It also makes it harder to check whether certain types or conditions are used at all. A triplet of boolean flags can often be replaced by an enum enumerating all eight possible values with a proper description.
When designing APIs, make sure they follow some standard pattern for ordering, limiting fields, and filtering to be reusable. You should aim for consistency here: Your APIs don't have to be perfect, but they should be consistent within your business, and what works for one of your software systems should work for the others as well. A good resource on API creation is this article: Best Practices for Designing a Pragmatic RESTful API.
When it comes to multithreading, look into the concepts of actors, communicating sequential processes (CSP), and blocking queues. Most multithreading issues stem from the problem that threads communicate by sharing read and write access to data. Actors and CSP turn this around: threads are sharing data by communicating with messages. This eliminates many race conditions and the need for locking and synchronization except at the handover-points.
For Java Guavas User Guide is a good read and gives many insights on how modern code should be written.

Features

When the technical foundation of the pyramid is sound, you have the proper environment to implement business logic and features.

As everyone working in software development knows, Features tend to change. In theory, there are detailed specifications, an implementation step, and testing. But in the real world, requirements shift and what you planned on implementing in a certain way might up being done differently, after circumstances were better understood. There are even programming contests where rules are changed without prior notice halfway through the event to see how contestants and languages cope when it comes to adapting to new requirements.

That is why features are underpinned by the four layers of build- and test performance, testability, and component code quality. Doing this groundwork is not optional diligence, but a necessary prerequisite for achieving agility in your code base. To support changing requirements, you need to be able to rapidly prototype, throw away code, rearrange, and rewrite components all while keeping sure that everything still works as expected. Ideally, you should be able to rewrite your code, without rewriting your tests. Since nobody can foresee the future of a business, it is essential to build a system that supports a variety of futures, which is exactly what the code quality pyramid aims to help with.

When investing time in the base of the pyramid, you are paving the way for implementing stable features with high velocity. Having proper builds and testing in place implies that developers have already structured the code base in a performant and composable way. This in turn means that it is flexible enough to prevent your project from being thrown off-balance by the inevitable changes in requirements from the outside.

Side note: The waterfall model for implementing projects and its drawbacks are widely known, but what is less known is that the author of the model Winston W. Royce recommended doing the entire process twice. It is hard to foresee how the world of development would look like today, had companies used the waterfall process twice on each project.

Tips on improving (the planning of) features

Slice your work packages along technical boundaries and testability instead of business boundaries. When writing a program that reads input, transforms it, and writes output, developers don't have to implement input first, then the transform function, then the output. They can also write a sample data structure in memory, implement the transform and output functions and test them, and implement the input function last (Example taken from an interview with Kent Beck the author of JUnit). The aforementioned offline mode for an app is another example of this. You can implement an app using an abstracted API, and you can test a lot of functionality without having even started implementing the backend.
Always estimate in ranges. This is much more important than deciding whether hours, man days, or story points should be used. There are several ways how to condense these ranges to a single number, like Three-point estimation.
When dividing up a non-trivial project that goes over several months into work packages, split the packages that look complex in two. Assuming some Feature A, you could have a work package called "Feature A 1/2" in your first milestone and schedule another work package "Feature A 2/2" in a later milestone. The goal is not to split the work in half, but to implement the entire Feature in the first of the two tasks. Should you realize, while implementing it, that your time runs out you still have the 2/2 as a backup. If that is not needed, after all, you gain time. Be sure to reserve time for the second slot, even though it might get skipped.

Code performance

After features are implemented measures can be taken to improve the run-time performance of the code. These optimizations come last, at the top of the pyramid for two reasons: Firstly is that while performance should be considered when designing the software it is important to not get lost in details. A famous quote by Donald Knuth puts it like this:

"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming." – Donald Ervin Knuth, Literate Programming

In line with this comment, you need to be careful about which parts to optimize. A system needs to be fast and responsive to be useful for users, but it's not necessary to discuss the specifics of some algorithm if it is seldomly used or might get swapped out later. Instead, look closely at flows, where performance drops, and which are so integrated into the system, that they are hard to remove later. Also, keep the order of magnitude in mind. The difference between an interaction taking 100ms or 2 seconds is vital, but whether it takes 100ms or 120ms is generally not. Keep in mind that only after the features are done you can show them to the users and get feedback on how they are actually used.

The second reason for code performance being at the top of the pyramid is, that there's a good chance this step isn't needed anymore if you followed the advice in the rest of this article. Especially after having optimized test performance and testability, the code tends to be already in a shape that can be easily optimized and brought to match your users' expectations.

Tips for developers on improving code performance

Always measure performance before starting to implement an optimization.
When using profilers to get CPU usage, always use sampling profilers, which collect a statistical sample of function calls instead of tracing profilers, which track all calls. Tracing will show you an accurate amount of calls of your functions and lead you towards optimizing the function that is called the most often, but that can be very wrong since the overhead of tracing the individual codes will actually shift the performance problem to these hotspots. Sampling will you a real profile of your application. The Java Flight Recorder can even run in production since its overhead is less than 2%.
Read up about optimization for your stack and languages before taking opinions for granted. Java is purposedly slow, but it's hard to outsmart the Java Virtual Machine when it comes to optimization. The JVM recompiles and restructures your code in various ways during the runtime of your application, which can be faster than C++.
Use memory profiling as well. An important factor that is often overlooked is memory usage and garbage collection. Your application will run much faster if it allocates fewer objects in hotspots.
Database performance is very important, especially in components that use object-relational-mappers like Hibernate. It is good practice to count the number of SQL statements that are executed. For medium to complex projects evaluate alternatives to classical ORMs like JDBI, which let you write the SQL yourself.

Unmentioned qualities

A note on security: The security of software can be seen as a functional requirement that falls into this category.

A note on operations and alerting: Operational monitoring and alerting is a quality of a system that arguably also falls into the category of testability, since alerts, like tests, tell you if something is not behaving according to specification. Here it is very important to have remote crash-logging set up so that you can debug uncaught exceptions.

Summary

Although there is no canonical definition of well-written code, we all know what constitutes good software. The code quality pyramid aims to structurize this, describing the system as a whole, how its qualities should be prioritized, and how they support each other.

Fabian Zeindl

The code quality pyramid

Table of contents

Introduction

The pyramid

The layers

Build performance

Tips for developers on improving build performance

Test performance

Tips for developers on improving test performance

Testability

Defunctionalization

Tips for developers on improving testability

Component code quality

Tips for developers on improving component code quality

Features

Tips on improving (the planning of) features

Code performance

Tips for developers on improving code performance

Unmentioned qualities

Summary