Assessment System for Tests’ Architecture Design

Design Architecture
Assessment System

Usually, people want to improve their tests but do not have quality metrics to determine which version of their improvements is most beneficial to their projects. The presented assessment framework can help you to figure out which is the best possible enhancement that you need to introduce into your system tests and so make them more stable, reliable and maintainable.

I am going to present to you eight criteria for system tests architecture design assessment. You can find some of them in different books and blog posts but this list is unique. My teammates and I created it specifically for our system tests design improvements.

What Problems Did We Have?

We had over 1000 tests that ran for over 6 hours on a single machine. Sometimes all of them were green but sometimes they were problems so we needed to troubleshoot them repeatedly. The biggest problem for us was that we could not trust our UI tests. They verified big part of the system but were so brittle because they were not designed in a way that can be easily modified. Small changes in the main workflow caused usually regression in a random group of tests. Our challenge was to find a better design so that we can refactor them and make them more maintainable, more readable and always green.

Before we came up with the system, we tried to patch up the tests and find quick solutions, hoping that this way we can fix the regression problems and simultaneously be able to add new tests. However, our problem here was that for quite some time we did not have the whole picture. As you will see through the analysis and comparing of the different ideas, we can achieve much better results.

Assessment System Criteria

So now, it is time to present to you the different levels of the system. Each level represents a characteristic of the tests. As you will see, they are listed in a numbered order, which means that they are ordered by importance. However, I think this order depends highly on the context of your team and the skill of its members. Therefore, you can reorder the criteria if you want.

  • Maintainability
  • Readability
  • Code Complexity Index
  • Usability
  • Flexibility
  • Learning Curve
  • Principle Least Knowledge
  • *Keep it Simple Stupid KISS

Our team is responsible for a complex legacy licensing system we need to have many regression tests and be able to extend and modify them easily because of that the maintainability holds the first spot. Since we have many tests, they need to be readable because sometimes the tests are documentation too. The third one is CCI, it represents how complex our code is, and we want our code to be simple. In addition, it is the only tool calculated metric. We do not want to reinvent the wheel, so the usability is important. The next one is flexibility. How easy is to learn to write tests? The seven is connected with the maintainability. Our last resort of comparing is that the simplest design wins if all other criteria are equal. It is not a metric but a principle.

Available Levels

For every criterion, there will be a rating assigned. You can find the possible ratings below. In addition, they have a number representation.

(1) Very Poor

(2) Poor

(3) Good

(4) Very Good

(5) Excellent

Steps to Apply

Here are some pragmatic steps to apply the system.

1. Create research & development branch

2. Create separate projects

3. Choose small set of tests

4. Implement the set for each design

5. Present the designs to your team

6. Every participant use the system

7. Create final triage meeting

Pilot Project

First, create completely new “Research & Development” branch. Then create separate projects to test your new ideas. Do not refactor your existing test framework’s code before you are completely sure which idea is the best for your case. To be able to evaluate effectively and assess the different ideas it is best if you choose a small set of identical tests to implement. Create different folders for each idea. Choose a same small set of identical tests to implement for each design. If the tests you create are different, how do you expect the assessment to be accurate? Usually, we do not refactor directly all of our tests because it costs a lot of time that we usually do not have. Anyway, the system will work for any number of tests. Present the design to your team. Use the provided eight-level evaluation system to assess the different solutions. It is best if a couple of people participate in the process because some of the points are personally subjective (like what readable test is or which design is easier to learn). Create a final triage meeting with your whole team and decide which idea to implement based on the results of the assessment. Before we proceed with the examples how we use the system, I am going to explain what every criterion means.

Assessment Criteria- Definitions


The official definition by Wikipedia is the following: Maintainability has been defined as "the ease with which a software system or component can be modified to correct faults, improve performance, or other attributes, or adapt to a changed environment". The keyword here is ease.

The most important part for me is the troubleshooting. How much time do you need to find out if there is a bug in the functionality that the test is asserting or it is a problem with the test itself? When there is some issue in the code- you are looking into the logs. You are all sweaty, looking and looking, unable to locate it. And debug deeper and deeper, and deeper to find out the root cause. I am sure you have experienced it more than once. This is the maintainability what I mean.



Readable code is code that clearly communicates its intention to the reader. Code that is not readable takes longer to understand and increases the likelihood of defects. There is a tendency for some programmers to use comments as a substitute for readable code or to simply comment for the sake of commenting. I believe tests’ readability means how easy is to find what the test does without the need of huge comments or large tests’ descriptions. I am sure all of you at least once in your lifetime have seen a test’s name that is two rows long.


Code Complexity Index

The code complexity index is our custom-made metric. We created a formula for it. It contains four important parts that can be calculated with tools such as Microsoft Visual Studio IDE. This is the only metric from the system that is tool calculated. All others are based on the participants’ opinion.

Code Complexity Index Rating = (Depth of Inheritance Rating + Class Coupling Rating + Maintainability Index Rating + Cyclomatic Complexity Rating)/4

First, Depth of Inheritance –The deeper the hierarchy the more difficult it might be to understand where particular methods and fields are. What about? Class Coupling – Good software design dictates that types and methods should have high cohesion and low coupling. High coupling indicates a design that is difficult to reuse and maintain because of it’s interdependent on other types. These metrics’ calculations are available in the development editions of the application, even in the free one- the community edition.

Visual Studio Editions

Maintainability Index – Calculates an index value between 0 and 100 that represents the relative ease of maintaining the code. A high value means better maintainability. Most of the formulas used to calculate the metrics are not public. However, I found an unofficial one for the maintainability index. I am not going to decipher it. I wanted to emphasise that real mathematics stays behind this metrics.

Maintainability Index = MAX(0,(171 – 5.2 * ln(Halstead Volume) – 0.23 * (Cyclomatic Complexity) – 16.2 * ln(Lines of Code))*100 / 171)

Cyclomatic Complexity – Below you can find the formula for Cyclomatic complexity. The Cyclomatic complexity is based on the number of decisions in a program. The control flow shows seven nodes (shapes) and eight edged (lines), thus using the formal formula the Cyclomatic complexity is 8 – 7 + 2 equal to 3.

Cyclomatic Complexity = CC=E-N + 2

E = the number of edges of the graph

N = the number of nodes of the graph

I could not find any official values published by Microsoft for assessment of these criteria. So I did some research and read blog posts of Microsoft MVPs that suggested a sample assessment system. I modified it a little bit to fit our needs. You can observe the result in the presented table. We use the table to calculate the rating for the different parts of the formula.

Maintainability Index

Cyclomatic Complexity

Class Coupling

Depth of Inheritance

(5) Excellent

> 70

< 10

< 10

=< 3

(4) Very Good

> 60


< 15


(3) Good



< 20


(2) Poor



< 30


(1) Very Poor

< 20

> 20

> 30

> 8


By usability, I mean how easy is to use the test framework API. How much effort is required to write a new common test leveraging on the existing test API? How much code do you need to write a single simple test? If you use complex design patterns and many classes, your tests may become complex. The tests writing should be a straightforward process, should bring joy and pleasure to the writer.



By flexibility, I mean how easy is to add a new step to the existing workflow. If you have 100 tests that use one primary method and the whole process is described there that means that if you want to support 20 different use cases, you need to have many conditions in your code. Usually, the conditions tend to make the code more complex and less maintainable. In addition, design as previously described will not follow the Open Closed Principle that states that software entities should be open for extension, but closed for modification. Every change in this imaginary method can affect all of the tests that use it. Best tests framework’s designs should allow you to add new steps quickly without the possibility to affect all other tests.

Learning Curve Test Framework API

If a new member, joins your team and he needs to read 100 pages long documentation before he is ready to write his first test. Or even worse if you do not have any documentation and you need to spend countless hours teaching each new member how to start writing. This means you have a poor test framework API learning curve.

Principle of Least Knowledge (Law of Demeter)

When the assessment system was designed, most of our tests shared the currently executed test’s data through a static class. Most of the times the different components of the design did not need to use the whole information so we decided to include the principle to our list. For example if you have a client that have first, last name, email, country and so on. And you have a test for resetting a password. If you pass only the email everything is ok but if you need the whole object this is a problem.

Keep it Simple Stupid KISS

Keeping things simple is, ironically, not simple! It requires abstract thinking. Let me quote Martin Fowler: “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” Think about it for a second—how much code have you seen that was easy to read, that was simple enough to under-stand? Probably not a lot.

This is not a metric as the previous ones. Moreover, we do not assign a rating for it. We just apply this principle if all other criteria are equal but usually is not necessary.​

Keep It Simple KISS


We can have lots of ideas and approaches, but we need to analyse them well and decide which one is the best. In the next articles from the series I will show you how to use our system in practice. I will use some of the real designs that we evaluated in the past. I will shortly explain the specifics of each one of them and then I will assign ratings for each level described in our assessment system. Further, I will clarify the reasoning behind my rating decisions.

You can watch my conference talk dedicated to the system or download the whole slide deck.