TIOBE coding standard methodology

16 May 2005

Towards a company wide coding standard

TIOBE Coding Standard Methodology

TIOBE has developed a step-by-step methodology to support companies to introduce coding standards into their organization. The methodology consists of 7 logical steps and is based on years of experience in the field. The independent nature of the steps allows organizations to take a sufficient break after each step to get familiar with the new situation. The 7 steps are shown in the figure below.

Step #1: Adopt/Define Coding Standards

The first action to be taken is to define or adopt a coding standard. Usually a coding standard consists of a set of programming rules (e.g. “Do not compare two floating point numbers.”), naming conventions (e.g. “Classes should start with capital C.”) and layout specifications (e.g. “Indent 4 spaces.”).

It is strongly advised to adopt an existing coding standard instead of defining one yourself. The main advantage of this is that it saves a lot of effort. An extra reason for adoptation is that if you take a well-known coding standard there will probably be checking tools available that support this standard. It can even been put the other way around: purchase a code checking tool and declare (a selection of) the rules in it as your coding standard!

Here is a list of most frequently used coding standards for the mainstream industrial programming languages.

There might be good reasons to define an own coding standard. For instance, if a lot of existing code already adheres to undocumented but consistently used rules it is better to make these rules explicit in a proprietary standard.

It is convenient to have your coding standard available in a database or at least in hyperlinked HTML or XML instead of a plain document. This will help you to reuse coding standard info in later stages, e.g. jump from violations in the IDE to the corresponding description of the coding rule in the standard or generate a list of coding rules that must be used during code review because they are not checked automatically by a tool.

Step #2: Select Code Checking Tool(s)

Code checking tools are software products that check compile-time whether a software program adheres to a certain set of rules of a coding standard. Without such code checkers, the enforcement of a coding standard in an organization is likely to fail. There are two main causes for this: (1) the number of rules in a coding standard is usually so large that nobody can remind them all (except the author) and (2) some context-sensitive rules that demand inspection of several files are very hard to check by human beings.

Although code checkers can reduce formal code review time considerably, it certainly does not mean that code reviews are not needed any more. Some coding rules cannot be checked at all (e.g. “Comments should be in English”) or only partly at compile-time (e.g. “Use delete, whenever you use new”). On average about 65% of a coding standard can be covered with code checking tools.

There are several characteristics that should be assessed during the selection process of a code checker. There are 6 issues that are most important in our opinion. Obvious requirements such as whether it is available for the used platform are not taken into account.

  • Availability to define own rules. Experience has shown that only about 20% of a propietary coding standard is available as built-in rule in a code checker. As a consequence, there should be a possibility to define own rules. This can be via an API (complex in use but powerful) or a graphical user interface (easy to use but limited).
  • Integration in programming IDE. Most end users of code checkers are software engineers. Therefore a code checker should fit smoothly in the used programming environment. Preferrably via a toolbar with an easy way of running the tool (one button push).
  • Presence of command-line version. In order to be able to integrate the code checker in the software development process and/or in the programming IDE (if no plug-in is available) it should be command-line oriented. Command-line versions also allow for nightly batch runs of the code checker to collect quality data.
  • High performance. Code checkers are slower than compilers by nature because they perform a more in-depth semantical analysis of the source code. However, performance is a key issue for end user acceptance of the tooling. Response times of more than 15 seconds per file are perceived as not workable.
  • Low cost of ownership. Cost of ownership means purchase price and maintenance costs. Since code checkers are not used continuously by end users, floating licenses appear to be more cost effective in most cases. About 1 floating end user license is needed per 10 developers.
  • Available support. Decent product support is an essential feature of a code checker. Good support does not only mean prompt and adequate response to user questions and problems, but also the frequency of new product release. “We will solve this in the next release” is not interesting if the average product release cycle is 1 year.

Apart from the requirements mentioned above, it is recommended to select a mainstream code checker of a market leader. Have a look at the current market leaders (as we encounter them at our customer sites) for the various programming languages below.

Step #3: Customize Code Checking Tool(s)

The first obvious action of customizing a code checker concerns switching on all available rules that are part of the coding standard. This usually takes more effort than expected. Comparing say 250 built-in rules with 100 rules of a coding standard is a laborious task, especially because the meaning of the rules may differ in subtle ways.

Rules that can be checked automatically but are not provided as built-in rules can be implemented by means of a provided API or a graphical configuration tool. This is a specialized task that can be done best by engineers that are experienced in this field.

Companies that want to increase coverage of their coding standard even further can decide to write extra code checking tools of their own. This can be interesting in case there are no off-the-shelf code checkers available that allow the implementation of their complex, context dependent coding rules. But it can also be initiated because a non-standard dialect of a programming language is used (e.g. for a specific processor), or a substantial part of the code is written in a special, non-mainstream programming language or even in case the used platform is unusual. Unfortunately most multinational companies are forced to develop add-on customized code checking tools for these reasons.

Despite the fact that proprietary tooling introduces a maintenance problem, there is also an advantage: more control. Think of direct hyperlinked connection between coding standard rules and rules defined in the tooling, think of applying different coding standards for different kinds of files (plain C++ vs. C++/CLI, for instance), think of introducing proprietary levels or categories, etcetera, etcetera. In short, own tooling allows for adjusting it in such a way that it fits perfectly into the existing development processes instead of the other way around. For large organizations it is most of the times harder to change established processes than to invest in proprietary tooling.

Step #4: Integration in Software Environment

As soon as your code checker has been configured according to the available coding standard, it is almost ready for use in daily practice. The only thing that remains to be done is integration in the software environment. This comes primarily down to:

  • Providing a plug-in for the used programming environment
  • Alignment with the compilation process
  • Arrange up-to-date access to all parts of the code archive

These three aspects are dealt with in more detail below.

Plug-in in Programming Environment

A code checker should be available as a set of simple buttons within the used programming environment in order to ease its introduction in the organization. That’s why code checker providers usually offer plug-ins for mainstream environments. Examples of popular programming environments are Eclipse (Java/C++) and Visual Studio (C++/C#).

If for some reason more than one code checker is in use -e.g. because two code checkers guarantee more coverage of the coding standards than one, or different languages are used within one programming environment- the outcomes of these checkers should be combined. This means that an appropriate plug-in must be provided that acts as an umbrella for the individual coding checkers.

Alignment Compilation Process

Static code checking is based on structural analysis of source code. If a file “includes”, “imports”, or “uses” another file, the code in that other file needs to be examined too because it could contain important information. But where to find these used files? Fortunately, information about the location of these other files should be present already because it is also needed for the compilation process.

In case a mainstream compilation process has been adopted such as Unix’ make or the Microsoft project approach, the appropriate information will probably be provided by the code checker vendor. However, most large organizations deviate from standard compilation solutions (for good reasons by the way). This means that the compilation process must be simulated in some way to extract the location of the include files (such as the “-n option” for most make facilities) or the information must be gathered from some kind of log file after compilation. An example of the last is examining configuration records of ClearCase after having applied “omake”.

Another commonly used approach is to integrate code checking into the compilation process. Instead of applying the compiler the code checker is invoked. In this way duplication of information is avoided. Besides this it is sometimes easy to accomplish. The disadvantage to this approach is that it becomes impossible to check individual files, such as a separate header file, since checking is bound to a compilation target.

Access to the Code Archive

This might seem a trivial constraint. For large code archives with complex, multi-site configuration management systems with many parallel branches this is certainly not the case. The complexity of this problem may even increase if one makes heavy use of symbolic links, mapped network drives, and/or a mixture of file systems (e.g. using Unix and Windows in parallel).

Step #5: Set up Quality Database

As long as code is checked by individual developers all calculated quality data is volatile. This means that it is not stored and thus cannot be retrieved at a later moment in time. In this way it is impossible for organizations to get a clear overview of the current status of violations in the code archive. A “quality database” that contains all violations helps organizations to make structural improvements with respect to the coding standard through time.

The quality database runs on a “quality server” and collects new violation data by determining what files have been changed in the archive since its last check. Only deltas are checked because it is most of the time impossible to check all available files within 24 hours. A code archive of 10,000 files and an average check time of 30 seconds per file results into a through put time of about 3.5 days for the entire archive. It is important to note that only accepted (sometimes called consolidated) files should be checked by the quality server. An intermediate file of an individual developer that has been saved but not checked in and that has not yet been accepted by the system integrator should not be taken into account.

What information should be collected in a quality database? If we are talking about large code archives, a lot of data will be generated. In order to keep control over the size of the database, one should be careful about what to store. Since large organizations tend to branch (complete archive copies that remain relatively unmodified) and merge their archives lots of time, file based data collection appears to be far too time and space consuming in practice. Instead of taking files as the leading principle, which might seem to be the most natural entity at first sight, we propose a combination of file name and its checksum. This means that only if a file changes, a new entity will be created. The major advantage of this approach is that in this way copies of a file in different branches are checked and stored only once.

The following basic data is stored for file name/checksum entities:

  • Complete file path (canonical);
  • Lines of code;
  • All violations (rule ID, line number) per file.

On top of a quality database a web application can be built that shows all kinds of quality data. Historical comparisons, comparing different modules, absolutely or relative (i.e. per lines of code) views, organizational (comparing departments) or hierarchical (comparing directories), the sky is the limit. An example of such a web application is TIOBE’s TiCS framework as shown to the left.