With some system design methodologies, Mock can be used to effectively meet the testing needs of multiple components within a system. Specifically:
- In system design, focus on the complexity segmentation in time and space dimensions, clearly define and constrain the state of interactions between modules, and control the testing complexity between modules.
- Within a single module, use Base Class + Mock + Dependency Inject to implement module functions, so as to cover as many branches as possible in unit tests.
Recently, I realized that the unit tests of Tiflow have significantly improved. This improvement is mainly due to the inclusion of more multi-component Mock tests, which can cover a wider range of multi-component interaction scenarios.
The reason I had this idea is that when I first graduated, I mainly wrote logic code. After working for a while, although there were unit tests in the old projects, they were mostly to test whether a certain independent function met the expectations in implementation, covering as many branches as possible, constructing various boundary values, and better using fuzz for random testing. Although I would write unit tests occasionally, neither psychologically nor in group programming did we particularly emphasize and value this matter. In summary, it is still necessary to have test coverage.
After changing jobs to PingCAP, I found that whether it is TiDB/TiKV/PD, or CDC/DM, each module basically has rich unit tests. Is it a team culture issue? Why do teammates like writing unit tests so much? So much so that every PR basically needs sufficient unit tests to be approved.
When I first joined, I was just getting in touch with Go, so I did some miscellaneous work on migrating the testing framework. In the first month, I even wanted to increase the test coverage of CDC to 80%. I guess my bold words at that time must have scared my colleagues a bit. Although they might have thought ‘this guy is something’ in their hearts, they still calmly said, ‘That might be a bit difficult, you can prioritize other high-priority tasks first.’
Later, after getting used to unit tests, I realized that unit tests have some tricks. Besides testing functions and boundaries, a very important capability is to solidify interface semantics. In a system with multi-component interactions, the system complexity increases exponentially with the number of components. This problem is particularly prominent in complex projects with multiple collaborators. Different people in the team focus on different components, and even a function within a component may be continuously iterated. If the function semantics change due to modifications in the function implementation, it may cause a chain of failures. Therefore, providing basic unit tests for each function can solidify the semantics of the function or interface itself, allowing problems to be discovered in the development stage during continuous iteration.
Another problem is that the test coverage of some modules is not high. Some are because of laziness, only covering core and commonly used branches. Others are because the branches cannot be covered, which is a bigger problem. Let me give an example,
// simulate mysql client
func NewMysql(conf MysqlConfig) (*mysql, error){
// do something
}
type mysql struct {}
func (m *mysql) Exec(sql string) error {}
func (m *mysql) Query(sql string) error {}
func DealWithMysql(conf MysqlConfig) error {
// do something
my, err := NewMysql(conf)
if err != nil {
if err == error1 {
// branch 1
}else if err == error2 {
// branch 2
}
}
err = my.Exec("create table xxx")
if err != nil {
if err == ErrConnTimeout {
// branch 3
}else {
// branch 4
}
}
return nil
}
For the above DealWithMysql function, it is difficult to cover various exception branches inside when doing unit tests. This is because the mysql object is created using the conf configuration. The initialization process generally involves some normal network interactions, and there may be very complex logic branches internally, such as maintaining a connection pool internally.
Therefore, it is difficult to cover various exception handling branches (branch 1/branch 2) that occur when creating the mysql object. Then, when calling other methods of the mysql object internally, it is also impossible to customize the return values of these methods, so it is impossible to cover branch 3/branch 4.
The branches that cannot be covered in actual projects are much more than this simple demo. For example, the KV module of CDC is like this. It is a component that establishes a connection with TiKV nodes and synchronizes data changes in real-time. It is very difficult to fully cover its branches in unit tests. If modules are nested within modules, and some network interactions are added, the situation becomes even more severe.
So how is the improvement of Tiflow unit tests achieved as we mentioned at the beginning? Base Class + Mock + Dependency Inject. Most components in Tiflow’s master-slave framework define base classes with clear interface semantics, or use base classes to wrap some non-derivable entity classes. Then, unit tests use Mock objects, ignoring the internal implementation of methods, and directly customizing the corresponding return values according to the test needs, reducing the internal implementation dependencies between components.
For the DealWithMysql function we mentioned above, how can we Mock the internal mysql object? Dependency Injection. The principle of dependency injection is actually very simple. For internally initialized objects, change them to be passed in through parameters. Combined with Base case + Mock, the logical jumps inside the function can be effectively controlled. For example, the entire logic becomes as follows:
type mysqlBase interface {
Exec(sql string) error
Query(sql string) error
}
// derived-Class mysql
type mysql struct {
// ...
}
func (m *mysql) Exec(sql string) error { // ... }
func (m *mysql) Query(sql string) error { // ... }
func DealWithMysql(Base mysqlBase) error {
// ...
}
In unit tests, we can customize different mysqlBase Mock subclasses according to the test needs to test different branches, as follows:
// Mock-Class mysql
type mysqlMock struct {
// ...
}
func (m *mysqlMock) Exec(sql string) error { // ... }
func (m *mysqlMock) Query(sql string) error { // ... }
// xxx_test.go
func TestXXX(t *testing.T) {
mock := &mysqlMock{}
err := DealWithMysql(mock)
}
With these tools like Base Class + Mock + Dependency Inject, can we cover all test scenarios? Not really. Suppose each component has $N$ states. Then the number of states when $M$ components are combined is $N^{M}$. It is basically impossible to cover all these state tests, but we can use the pruning method in algorithms to control system complexity.
To control system complexity, I think quoting a colleague’s comment would be more illustrative. This comment was made when discussing TiDB Cloud Architecture in the company:
After reading this document, I have two strong feelings to share with everyone for reference:
When we design the architecture, we seem to focus too much on functionality, with insufficient attention from the perspectives of construction, deployment, and operation management, and insufficient attention to physical units.
Thinking about the architecture ideas of k8s and the architecture design of large global distributed systems, a large and complex system must be a recursive architecture of units, such as Region, AZ, Cluster, Node, Pod.
Only such an architecture can simplify construction, deployment, and operation management by managing units at different levels. Each layer of these recursive architecture units has a clear physical counterpart.
Our architecture is basically flat, with many processes or Pods on the same plane.
Due to the lack of hierarchy, the overall system architecture looks very complex and chaotic, with too many things to take care of and handle at the same level.
When it exceeds people’s ability to handle complexity, many design and quality defects will arise, and it is difficult to predict the impact when solving them.
From the most basic software design ideas, it lacks abstraction and information hiding. In more specific terms, the modularity is not good, and more specifically, the architecture lacks hierarchical structure.Another problem is that the design perspective focuses more on the design of local components, but the fundamental challenge of microservices lies in the relationships between so many services, that is, the design of certain key scenarios, processes, or overall capabilities.
Each overall capability may involve the collaboration of many services.
When people want to understand how a certain overall function works, it is difficult to piece together this overall process from this design document.
Such overall functions include:
- How does metering and billing work?
- How are the accounts, configurations, and metadata of the many third-party services that the system depends on managed?
- How is the authentication and authorization of the production system managed, and how are automatic accounts and operation accounts managed?
- If the deployment process is automated, what steps are included in this process, and what is the execution order at different recursive levels (2-3 levels)?
- How does the complete flow and usage process of log data work?
- How does the end-to-end process of upgrading cluster versions and changing cluster configurations work? It is not an operation manual, but the general principle inside.
- How are monitoring alarms generated, and how do they work until the operation and maintenance personnel and users see them? What services and stages are involved in this process?
- …
All these overall functions and process concerns are not the details after breaking down each service, nor are they operational commands, but how to put the broken clock back together and explain how the clock’s function works from the perspective of an overall function.
Only with a sufficient global process perspective can we truly understand this complex system and see the problems and bottlenecks, rather than discovering problems from online issues and debugging processes every time.
These accidentally discovered problems are often like legends, and it takes half an hour to tell them once. It is difficult to reach a consensus on priority when understanding problems and optimization tasks.
The design of a complex system is actually a description of space (structure) and time (process).
- The spatial structure mainly relies on the hierarchical abstraction and combination of objects.
- The time process mainly relies on the hierarchical abstraction, combination, and concurrency of stage processes.
The main task of system architecture design is to clarify these two things from the perspective of system structure and overall function.
- As for how detailed the hierarchy should be, it mainly depends on the level of detail the reader audience is concerned with.
- As for API interfaces, data structures, and key algorithms, they should not be the main content of the architecture document.
They belong to the DesignDoc category of specific functions or components. Even if APIs are presented, they should be relatively simple prototypes (mainly including entity space and function names, that is, the meta part).
Even the best software engineers have their limits in handling system complexity.
If one blindly increases system complexity because they can handle it, it will greatly reduce system maintainability, and users may abandon the system due to bugs that cannot be resolved for a long time.
On the other hand, managing system complexity is also a capability to handle system complexity. Through good system design, in the dimensions of time and space, continuously segment system modules, reduce coupling between system components, and minimize the states of interacting components, we may be able to effectively verify system functions in unit tests.