Reverse engineering


Reverse engineering, also called back engineering, is the process by which a man-made object is deconstructed to reveal its designs, architecture, code or to extract knowledge from the object; similar to scientific research, the only difference being that scientific research is about a natural phenomenon.
Reverse engineering is applicable in the fields of computer engineering, mechanical engineering, electronic engineering, software engineering, chemical engineering, and systems biology.

Overview

There are many reasons for performing reverse engineering in various fields. Reverse engineering has its origins in the analysis of hardware for commercial or military advantage. However, the reverse engineering process, as such, is not concerned with creating a copy or changing the artifact in some way; it is only an analysis in order to deduce design features from products with little or no additional knowledge about the procedures involved in their original production. In some cases, the goal of the reverse engineering process can simply be a redocumentation of legacy systems. Even when the reverse-engineered product is that of a competitor, the goal may not be to copy them, but to perform competitor analysis. Reverse engineering may also be used to create interoperable products and despite some narrowly tailored United States and European Union legislation, the legality of using specific reverse engineering techniques for this purpose has been hotly contested in courts worldwide for more than two decades.
Software reverse engineering can help to improve the understanding of the underlying source code for the maintenance and improvement of the software, relevant information can be extracted in order to make a decision for software development and graphical representations of the code can provide alternate views regarding the source code, which can help to detect and fix a software bug or vulnerability. Frequently, as some software develops, its design information and improvements are often lost over time, but this lost information can usually be recovered with reverse engineering. This process can also help to cut down the time required to understand the source code, reducing the overall cost of the software development. Reverse engineering can also help to detect and eliminate a malicious code written to the software with better code detectors. Reversing a source code can be used to find alternate uses of the source code, such as to detect unauthorized replication of the source code where it wasn't intended to be used, or to reveal how a competitors product was built. This process is commonly used for "cracking" software and media to remove their copy protection, or to create a copy or even a knockoff, which is usually the goal of a competitor or a hacker. Malware developers often use reverse engineering techniques to find vulnerabilities in an operating system, in order build a computer virus that can exploit the system vulnerabilities. Reverse engineering is also being used in cryptanalysis in order to find vulnerabilities in substitution cipher, symmetric-key algorithm or public-key cryptography.
In addition to these purposes there are other uses to reverse engineering:

Reverse engineering of machines

As computer-aided design has become more popular, reverse engineering has become a viable method to create a 3D virtual model of an existing physical part for use in 3D CAD, CAM, CAE or other software. The reverse-engineering process involves measuring an object and then reconstructing it as a 3D model. The physical object can be measured using 3D scanning technologies like CMMs, laser scanners, structured light digitizers, or industrial CT scanning. The measured data alone, usually represented as a point cloud, lacks topological information and design intent. The former may be recovered by converting the point cloud to a triangular-faced mesh. Reverse engineering aims to go beyond producing such a mesh, and to recover the design intent in terms of simple analytical surfaces where appropriate, as well as possibly NURBS surfaces, to produce a boundary-representation CAD model. Recovery of such a model allows a design to be modified to meet new requirements, a manufacturing plan to be generated, etc.
Hybrid modeling is a commonly used term when NURBS and parametric modeling are implemented together. Using a combination of geometric and freeform surfaces can provide a powerful method of 3D modeling. Areas of freeform data can be combined with exact geometric surfaces to create a hybrid model. A typical example of this would be the reverse engineering of a cylinder head, which includes freeform cast features, such as water jackets and high tolerance machined areas.
Reverse engineering is also used by businesses to bring existing physical geometry into digital product development environments, to make a digital 3D record of their own products, or to assess competitors' products. It is used to analyze, for instance, how a product works, what it does, and what components it consists of, estimate costs, and identify potential patent infringement, etc.
Value engineering is a related activity also used by businesses. It involves de-constructing and analyzing products, but the objective is to find opportunities for cost-cutting.

Reverse engineering of software

In 1990, the Institute of Electrical and Electronics Engineers defined reverse engineering as "the process of analyzing a
subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher
level of abstraction", where the "subject system" is the end product of software development. Reverse engineering is a process of examination only: the software system under consideration is not modified. Reverse engineering can be performed from any stage of the product cycle, not necessarily from the functional end product.
There are two components in reverse engineering: redocumentation and design recovery. Redocumentation is the creation of new representation of the computer code so that it is easier to understand. Meanwhile, design recovery is the use of deduction or reasoning from general knowledge or personal experience of the product in order to fully understand the product functionality. It can also be seen as "going backwards through the development cycle". In this model, the output of the implementation phase is reverse-engineered back to the analysis phase, in an inversion of the traditional waterfall model. Another term for this technique is program comprehension. The Working Conference on Reverse Engineering has been held yearly to explore and expand the techniques of reverse engineering. Computer-aided software engineering and automated code generation have contributed greatly in the field of reverse engineering.
Software anti-tamper technology like obfuscation is used to deter both reverse engineering and re-engineering of proprietary software and software-powered systems. In practice, two main types of reverse engineering emerge. In the first case, source code is already available for the software, but higher-level aspects of the program, perhaps poorly documented or documented but no longer valid, are discovered. In the second case, there is no source code available for the software, and any efforts towards discovering one possible source code for the software are regarded as reverse engineering. This second usage of the term is the one most people are familiar with. Reverse engineering of software can make use of the clean room design technique to avoid copyright infringement.
On a related note, black box testing in software engineering has a lot in common with reverse engineering. The tester usually has the API, but their goals are to find bugs and undocumented features by bashing the product from outside.
Other purposes of reverse engineering include security auditing, removal of copy protection, circumvention of access restrictions often present in consumer electronics, customization of embedded systems, in-house repairs or retrofits, enabling of additional features on low-cost "crippled" hardware, or even mere satisfaction of curiosity.

Binary software

Binary reverse engineering is performed if source code for a software is unavailable. This process is sometimes termed reverse code engineering, or RCE. As an example, decompilation of binaries for the Java platform can be accomplished using Jad. One famous case of reverse engineering was the first non-IBM implementation of the PC BIOS which launched the historic IBM PC compatible industry that has been the overwhelmingly dominant computer hardware platform for many years. Reverse engineering of software is protected in the U.S. by the fair use exception in copyright law. The Samba software, which allows systems that are not running Microsoft Windows systems to share files with systems that are, is a classic example of software reverse engineering, since the Samba project had to reverse-engineer unpublished information about how Windows file sharing worked, so that non-Windows computers could emulate it. The Wine project does the same thing for the Windows API, and OpenOffice.org is one party doing this for the Microsoft Office file formats. The ReactOS project is even more ambitious in its goals, as it strives to provide binary compatibility with the current Windows OSes of the NT branch, allowing software and drivers written for Windows to run on a clean-room reverse-engineered free software counterpart. WindowsSCOPE allows for reverse-engineering the full contents of a Windows system's live memory including a binary-level, graphical reverse engineering of all running processes.
Another classic, if not well-known, example is that in 1987 Bell Laboratories reverse-engineered the Mac OS System 4.1, originally running on the Apple Macintosh SE, so they could run it on RISC machines of their own.
Binary software techniques
Reverse engineering of software can be accomplished by various methods.
The three main groups of software reverse engineering are
  1. Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzers and packet sniffers, for example, for accessing a computer bus or computer network connection and revealing the traffic data thereon. Bus or network behavior can then be analyzed to produce a stand-alone implementation that mimics that behavior. This is especially useful for reverse engineering device drivers. Sometimes, reverse engineering on embedded systems is greatly assisted by tools deliberately introduced by the manufacturer, such as JTAG ports or other debugging means. In Microsoft Windows, low-level debuggers such as SoftICE are popular.
  2. Disassembly using a disassembler, meaning the raw machine language of the program is read and understood in its own terms, only with the aid of machine-language mnemonics. This works on any computer program but can take quite some time, especially for someone not used to machine code. The Interactive Disassembler is a particularly popular tool.
  3. Decompilation using a decompiler, a process that tries, with varying results, to recreate the source code in some high-level language for a program only available in machine code or bytecode.

    Software classification

Software classification is the process of identifying similarities between different software binaries used to detect code relations between software samples. This task was traditionally done manually for several reasons but nowadays can be done somewhat automatically for large numbers of samples.
This method is being used mostly for long and thorough reverse engineering tasks. In general, statistical classification is considered to be a hard problem and this is also true for software classification, therefore there aren't many solutions/tools that handle this task well.

Reverse engineering of protocols

are sets of rules that describe message formats and how messages are exchanged. Accordingly, the problem of protocol reverse-engineering can be partitioned into two subproblems; message format and state-machine reverse-engineering.
The message formats have traditionally been reverse-engineered through a tedious manual process, which involved analysis of how protocol implementations process messages, but recent research proposed a number of automatic solutions. Typically, these automatic approaches either group observed messages into clusters using various clustering analyses, or emulate the protocol implementation tracing the message processing.
There has been less work on reverse-engineering of state-machines of protocols. In general, the protocol state-machines can be learned either through a process of offline learning, which passively observes communication and attempts to build the most general state-machine accepting all observed sequences of messages, and online learning, which allows interactive generation of probing sequences of messages and listening to responses to those probing sequences. In general, offline learning of small state-machines is known to be NP-complete, while online learning can be done in polynomial time. An automatic offline approach has been demonstrated by Comparetti et al. and an online approach by Cho et al.
Other components of typical protocols, like encryption and hash functions, can be reverse-engineered automatically as well. Typically, the automatic approaches trace the execution of protocol implementations and try to detect buffers in memory holding unencrypted packets.

Reverse engineering of integrated circuits/smart cards

Reverse engineering is an invasive and destructive form of analyzing a smart card. The attacker uses chemicals to etch away layer after layer of the smart card and takes pictures with a scanning electron microscope. With this technique, it is possible to reveal the complete hardware and software part of the smart card. The major problem for the attacker is to bring everything into the right order to find out how everything works. The makers of the card try to hide keys and operations by mixing up memory positions, for example, bus scrambling.
In some cases, it is even possible to attach a probe to measure voltages while the smart card is still operational. The makers of the card employ sensors to detect and prevent this attack. This attack is not very common because it requires a large investment in effort and special equipment that is generally only available to large chip manufacturers. Furthermore, the payoff from this attack is low since other security techniques are often employed such as shadow accounts. It is uncertain at this time whether attacks against CHIP/PIN cards to replicate encryption data and consequentially crack PINS would provide a cost-effective attack on multifactor authentication.
Full reverse engineering proceeds in 5 major steps. The first step after images have been taken with a SEM is stitching the images together. This is necessary because each layer can't be captured by a single shot. A SEM needs to sweep across the area of the circuit and take several hundred images to cover the entire layer. Image stitching takes as input these several hundred pictures and outputs a single, properly overlapped picture of the complete layer. Next, the stitched layers need to be aligned. This is because the sample, after etching, cannot be put into the exact same position relative to the SEM each time. Therefore the stitched versions won't overlap in the correct fashion as on the real circuit. Usually three corresponding points are selected and a transformation applied on the basis of this. To extract the circuit structure, the aligned, stitched images need to be segmented. Segmentation highlights the important circuitry and separates it from the uninteresting background and insulating materials. Now, the wires can be traced from one layer to the next and the netlist of the circuit can be reconstructed which contains all information of the circuit.

Reverse engineering for military applications

Reverse engineering is often used by people in order to copy other nations' technologies, devices, or information that have been obtained by regular troops in the fields or by intelligence operations. It was often used during the Second World War and the Cold War. Well-known examples from WWII and later include:
Reverse engineering concepts have been applied to Biology as well, and specifically to the task of understanding the structure and function of gene regulatory networks. Gene regulatory networks regulate almost every aspect of biological behavior and allow cells to carry out physiological processes as well as responses to perturbations. Understanding the structure and dynamic behavior of gene networks is therefore one of the paramount challenges of Systems Biology, with immediate, practical repercussions in several applications beyond basic research.
There are several methods for reverse engineering gene regulatory networks using molecular biology and data science methods, and these have been generally divided into six classes:
Often, gene network reliability is tested by genetic perturbation experiments followed by dynamic modelling, based on the principle that removing one network node will have predictable effects on the functioning of the remaining nodes of the network.
Applications of Reverse engineering gene networks range from understanding mechanisms of plant physiology to highlighting new targets for anticancer therapy.

Overlap with patent law

Reverse engineering applies primarily to gaining understanding of a process or artifact, where the manner of its construction, use, or internal processes is not made clear by its creator.
Patented items do not of themselves have to be reverse-engineered to be studied, since the essence of a patent is that the inventor provides detailed public disclosure themselves, and in return receives legal protection of the invention involved. However, an item produced under one or more patents could also include other technology that is not patented and not disclosed. Indeed, one common motivation of reverse engineering is to determine whether a competitor's product contains patent infringements or copyright infringements.

Legality

United States

In the United States even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful as long as it has been legitimately obtained.
Reverse engineering of computer software in the US often falls under both contract law as a breach of contract as well as any other relevant laws. This is because most end user license agreements specifically prohibit it, and U.S. courts have ruled that if such terms are present, they override the copyright law which expressly permits it. Sec. 103 of the DMCA says that a person who is in legal possession of a program, is permitted to reverse-engineer and circumvent its protection if this is necessary in order to achieve "interoperability" — a term broadly covering other devices and programs being able to interact with it, make use of it, and to use and transfer data to and from it, in useful ways. A limited exemption exists that allows the knowledge thus gained to be shared and used for interoperability purposes.

European Union

on the legal protection of computer programs, which superseded an earlier directive, governs reverse engineering in the European Union.