In general, building robust systems that encompass every point of possible failure is difficult because of the vast quantity of possible inputs and input combinations. Since all inputs and input combinations would require too much time to test, developers cannot run through all cases exhaustively. Instead, the developer will try to generalize such cases. For example, imagine inputting some integer values. Some selected inputs might consist of a negative number, zero, and a positive number. When using these numbers to test software in this way, the developer generalizes the set of all reals into three numbers. This is a more efficient and manageable method, but more prone to failure. Generalizing test cases is an example of just one technique to deal with failure—specifically, failure due to invalid user input. Systems generally may also fail due to other reasons as well, such as disconnecting from a network. Regardless, complex systems should still handle any errors encountered gracefully. There are many examples of such successful systems. Some of the most robust systems are evolvable and can be easily adapted to new situations.
Challenges
Programs and software are tools focused on a very specific task, and thus aren't generalized and flexible. However, observations in systems such as the internet or biological systems demonstrate adaptation to their environments. One of the ways biological systems adapt to environments is through the use of redundancy. Many organs are redundant in humans. The kidney is one such example. Humans generally only need one kidney, but having a second kidney allows room for failure. This same principle may be taken to apply to software, but there are some challenges. When applying the principle of redundancy to computer science, blindly adding code is not suggested. Blindly adding code introduces more errors, makes the system more complex, and renders it harder to understand. Code that doesn't provide any reinforcement to the already existing code is unwanted. The new code must instead possess equivalent functionality, so that if a function is broken, another providing the same function can replace it, using manual or automated software diversity. To do so, the new code must know how and when to accommodate the failure point. This means more logic needs to be added to the system. But as a system adds more logic, components, and increases in size, it becomes more complex. Thus, when making a more redundant system, the system also becomes more complex and developers must consider balancing redundancy with complexity. Currently, computer science practices do not focus on building robust systems. Rather, they tend to focus on scalability and efficiency. One of the main reasons why there is no focus on robustness today is because it is hard to do in a general way.
Areas
Robust programming
Robust programming is a style of programming that focuses on handling unexpected termination and unexpected actions. It requires code to handle these terminations and actions gracefully by displaying accurate and unambiguous error messages. These error messages allow the user to more easily debug the program.
Principles
Paranoia - When building software, the programmer assumes users are out to break their code. The programmer also assumes that his or her own written code may fail or work incorrectly. Stupidity - The programmer assumes users will try incorrect, bogus and malformed inputs. As a consequence, the programmer returns to the user an unambiguous, intuitive error message that does not require looking up error codes. The error message should try to be as accurate as possible without being misleading to the user, so that the problem can be fixed with ease. Dangerous implements - Users should not gain access to libraries, data structures, or pointers to data structures. This information should be hidden from the user so that the user doesn't accidentally modify them and introduce a bug in the code. When such interfaces are correctly built, users use them without finding loopholes to modify the interface. The interface should already be correctly implemented, so the user does not need to make modifications. The user therefore focuses solely on his or her own code. Can't happen - Very often, code is modified and may introduce a possibility that an "impossible" case occurs. Impossible cases are therefore assumed to be highly unlikely instead. The developer thinks about how to handle the case that is highly unlikely, and implements the handling accordingly.
Robust machine learning
Robust machine learning typically refers to the robustness of machine learning algorithms. For a machine learning algorithm to be considered robust, either the testing error has to be consistent with the training error, or the performance is stable after adding some noise to the dataset.
Robust network design is the study of network design in the face of variable or uncertain demands. In a sense, robustness in network design is broad just like robustness in software design because of the vast possibilities of changes or inputs.
Robust algorithms
There exists algorithms that tolerate errors in the input or during the computation. In that case, the computation eventually converges to the correct output. This phenomenon has been called "correctness attraction".