After finishing hard-coded passwords detector, I have focused on improving the detection of the most serious security bugs, which could be found by static taint analysis. SQL injection, OS command injection and Cross-site scripting (XSS) are placed as top first, second and fourth in CWE Top 25 most dangerous software errors (while well-known buffer overflow, not applicable to Java, is placed third). Path Traversal, Unvalidated Redirect, XPath injection or LDAP injection are also related types of weaknesses – unvalidated user input can exploit syntax of an interpreter and cause a vulnerability. Injections in general are the risk number one in OWASP Top 10 too, so a reliable open-source static analyser for those kinds of weaknesses could really make the world a more secure place 🙂

FindBugs already has detectors for some kinds of injections, but many bugs is missed due to insufficient flow analysis, unknown taint sources and sinks and also targeting zero false positives (even though there are some). In contrast, the aim of bug detectors in FindSecurityBugs is to be helpful during security code review and not to miss any vulnerability – there was some effort to reduce false positives, but before my contribution almost all taint sinks were reported in practice. Unfortunately, searching a real problem among many false warnings is quite tedious. The aim of the new detection mechanism is to report more high-confidence bugs (with minimum of false positives) than FindBugs detectors plus report lower-confidence bugs with decreased priority not missing any real bugs while having false positives rate much lower than FindSecurityBugs had originally.

For a reliable detection, we need a good data-flow analysis. I have already mentioned OpcodeStackDetector class in previous articles, but there is a more advanced and general mechanism in FindBugs. We can create and register classes performing a custom data-flow analysis and request those results later in detectors. Methods are symbolically executed after building control flow graph made of blocks of instructions connected by different types of edges (such as goto, ifcmp or exception handling), which are attempted to be pruned for impossible flow. We have to create a class to represent facts at different code locations – we want to remember some information (called a fact) for every reachable instruction, which can later help us to decide, whether a particular bug should be reported at that location. We need to model effects of instructions and edges on facts, specify the way of merging facts from different flow branches and make everything to work together. Fortunately, there are existing classes designed for extension to make this process easier. In particular, FrameDataflowAnalysis models values in the operand stack and local variables, so we can concentrate on the sub-facts about these values. The actual fact is then a frame of these sub-facts. This class models effects of instructions by pushing the default sub-fact on the modelled stack and popping the right amount of stack values. It also automatically moves sub-facts between the stack and the part of the frame with local variables.

Lets have a look, which classes had to be implemented for taint analysis. If we want to run custom data-flow analysis, a special class implementing IAnalysisEngineRegistrar must be created and referenced from findbugs.xml.

<!-- Registers engine for taint analysis dataflow -->
<EngineRegistrar
    class="com.h3xstream.findsecbugs.taintanalysis.EngineRegistrar"/>

This simple class (called EngineRegistrar) makes a new instance of TaintDataflowEngine and registers it with global analysis cache.

public class EngineRegistrar implements IAnalysisEngineRegistrar {

    @Override
    public void registerAnalysisEngines(IAnalysisCache cache) {
        new TaintDataflowEngine().registerWith(cache);
    }
}

Thanks to this, in the right time, method analyze of TaintDataflowEngine (implementing ImethodAnalysisEngine) is called for each method of analyzed code. This method requests objects needed for analysis, instantiates two custom classes (mentioned in next two sentences) and executes the analysis.

public class TaintDataflowEngine
    implements IMethodAnalysisEngine<TaintDataflow> {

    @Override
    public TaintDataflow analyze(IAnalysisCache cache)
            throws CheckedAnalysisException {
        CFG cfg = cache.getMethodAnalysis(CFG.class, descriptor);
        DepthFirstSearch dfs = cache
            .getMethodAnalysis(DepthFirstSearch.class, descriptor);
        MethodGen methodGen = cache
            .getMethodAnalysis(MethodGen.class, descriptor);
        TaintAnalysis analysis = new TaintAnalysis(
            methodGen, dfs, descriptor);
        TaintDataflow flow = new TaintDataflow(cfg, analysis);
        flow.execute();
        return flow;
    }

    @Override
    public void registerWith(IAnalysisCache iac) {
        iac.registerMethodAnalysisEngine(TaintDataflow.class, this);
    }
}

TaintDataflow (extending Dataflow) is really simple and used to store results of performed analysis (used later by detectors).

public class TaintDataflow
        extends Dataflow<TaintFrame, TaintAnalysis> {

    public TaintDataflow(CFG cfg, TaintAnalysis analysis) {
        super(cfg, analysis);
    }
}

TaintAnalysis (extending FrameDataflowAnalysis) implements data-flow operations on TaintFrame but it mostly delegates them to other classes.

public class TaintAnalysis
        extends FrameDataflowAnalysis<Taint, TaintFrame> {

    private final MethodGen methodGen;
    private final TaintFrameModelingVisitor visitor;

    public TaintAnalysis(MethodGen methodGen, DepthFirstSearch dfs,
            MethodDescriptor descriptor) {
        super(dfs);
        this.methodGen = methodGen;
        this.visitor = new TaintFrameModelingVisitor(
            methodGen.getConstantPool(), descriptor);
    }

    @Override
    protected void mergeValues(TaintFrame frame, TaintFrame result,
            int i) throws DataflowAnalysisException {
        result.setValue(i, Taint.merge(
            result.getValue(i), frame.getValue(i)));
    }

    @Override
    public void transferInstruction(InstructionHandle handle,
            BasicBlock block, TaintFrame fact)
            throws DataflowAnalysisException {
        visitor.setFrameAndLocation(
            fact, new Location(handle, block));
        visitor.analyzeInstruction(handle.getInstruction());
    }

    // some other methods
}

TaintFrame is just a concrete class for abstract Frame<Taint>.

public class TaintFrame extends Frame<Taint> {

    public TaintFrame(int numLocals) {
        super(numLocals);
    }
}

Effects of instructions are modelled by TaintFrameModelingVisitor (extending AbstractFrameModelingVisitor) so we can code with the visitor pattern again.

public class TaintFrameModelingVisitor
    extends AbstractFrameModelingVisitor<Taint, TaintFrame> {

    private final MethodDescriptor methodDescriptor;

    public TaintFrameModelingVisitor(ConstantPoolGen cpg,
            MethodDescriptor method) {
        super(cpg);
        this.methodDescriptor = method;
    }

    @Override
    public Taint getDefaultValue() {
        return new Taint(Taint.State.UNKNOWN);
    }

    @Override
    public void visitACONST_NULL(ACONST_NULL obj) {
        getFrame().pushValue(new Taint(Taint.State.NULL));
    }

    // many more methods
}

The taint fact – information about a value in the frame (stack item or local variable) is stored in a class called just Taint.

The most important piece of information in Taint is the taint state represented by an enum with values TAINTED, UNKNOWN, SAFE and NULL. TAINTED is pushed for invoke instruction with a method call configured to be tainted (e.g. getParameter from HttpServletRequest or readLine from BufferedReader), SAFE is stored for ldc (load constant) instruction, NULL for aconst_null and UNKNOWN is a default value (this description is a bit simplified). Merging of taint states is defined such that if we could compare them as TAINTED > UNKNOWN > SAFE > NULL, then merge of states is the greatest value (e.g. TAINTED + SAFE = TAINTED). Not only this merging is done where there are more input edges to a code block of control flow graph, but I have also implemented a mechanism of taint transferring methods. For example, consider calling toLowerCase method on a String before passing it to a taint sink – instead of pushing a default value (UNKNOWN), we can copy the state of the parameter not to forget the information. Merging is also done in more complicated examples such as for append method of StringBuilder – the taint state of the argument is merged with the taint state of StringBuilder instance and returned to be pushed on the modelled stack.

There were two problems with taint state transfer which had to be solved. First, taint state must be transferred directly to mutable classes too, not only to their return values (plus the method can be void). Not only we set the taint state for an object when it is being seen for the first time in the analysed method and then the state is copied, but we also change it according to instance methods calls. For example, StringBuilder is safe, when a new instance is created with non-parametric constructor, but it can taint itself by calling its append method. If only methods with safe parameters are called, the taint state of StringBuilder object remains safe too.  For this reason, effect of load instructions is modified to mark index of loaded local variable to Taint instance of corresponding stack item. Then we can transfer taint state to a local variable with index stored in Taint for specified methods in mutable classes. Second, taint transferring constructors (methods <init> in bytecode) must be handled specifically, because of the way of creating new objects in Java. Instruction new is followed by dup and invokespecial, which consumes duplicated value and initializes the object remaining at the top of the stack. Since the new object is not stored in any variable, we must transfer the taint value from merged parameters to the stack top separately.

Bugs related to taint analysis are identified by TaintDetector (implementing Detector). For better performance, before methods of some class are analyzed, constant pool (part of the class file format with all needed constants) is searched and the analysis continues only if there are references for some taint sinks. Then TaintDataflow instance is loaded for each method and locations of its control flow graph are iterated until taint sink method is found. This means, we find all invoke instructions used in a currently analysed method and check, whether the called methods are related to the searched weaknesses. Facts (instances of Taint class) from TaintDataFlow are extracted for each sink parameter of a sink method. Bug is reported with high confidence (and priority), if the taint state is TAINTED, with medium confidence for UNKNOWN taint state and with low confidence for SAFE and NULL (just for the case of a bad analysis, these warnings are not normally shown anywhere). Taint class also contains references for taint source locations, so these are shown in bug reports to make review easier – you should see a path between taint sources and the taint sink. TaintDetector itself is abstract, so it must be extended to detect concrete weakness types (like command injection) and InjectionSource interface implemented to specify taint sinks (the name of the interface is a bit misleading) and items in a constant pool to specify candidate classes.

public class CommandInjectionDetector extends TaintDetector {

    public CommandInjectionDetector(BugReporter bugReporter) {
        super(bugReporter);
    }

    @Override
    public InjectionSource[] getInjectionSource() {
        return new InjectionSource[] {new CommandInjectionSource()};
    }
}

CommandInjectionSource overwrites method getInjectableParameters, which returns an instance of InjectionPoint containing parameters, that cannot be tainted, and the weakness type to report. Boolean method isCandidate looks up constant pool for the names of taint sink classes and return true if present.

TaintDetector is currently used to detect command, SQL, LDAP and script (for eval method of ScriptEngine) injections and unvalidated redirect. More bug types and taint sinks should follow soon. Test results are looking quite promising so far. Inter-procedural analysis (not restricted to a method scope) should be the next big improvement, which could make this analysis really helpful. Then everything should be tested with a large amount of real code to iron out the kinks. You can see the discussed classes in taintanalysis package and try the new version of FindSecurityBugs.

FindBugs GUI

In the previous article, I was describing the creation of a new FindBugs detector for hard-coded passwords and cryptographic keys. I also mentioned some imperfections and I have decided to learn more about FindBugs and improve the detection.

Java virtual machine has a stack architecture – operands must be pushed on the stack before method is invoked, given number of stack values is consumed during invocation and produced return value (if any) is pushed subsequently. My detector class extends OpcodeStackDetector, which implements abstract interpretation technique to collect approximated information about values at the operand stack for each code location. These pieces of information (usually called facts) are kept only for those locations, where a derived value of the fact does not depend on the preceding control flow (for example, the value is the same for each possible branch executed in earlier conditional statements).

One of the facts available for stack values is the actual value of a number or String (related to constant propagation performed by compilers during optimization). We can use this to detect hard-coded values – known constant means hard-coded value. However, we also need to track other other data types besides Strings (numbers can be ignored) to detect passwords in a char array and hard-coded keys. In addition, there is one more issue with this approach…

Tracking concrete values is unnecessarily complicated and the value often becomes unknown – we only need to know whether the value is constant, not which constant is on the stack. Consider a piece of code like this:

private Connection getConnection(String user) {
    String password;
    if ("root".equals(user)) {
        password = "superSecurePassword";
    } else {
        password = "differentPassword";
    }
    return DriverManager.getConnection(DB_URL, user, password);
}

The constant value in the password variable is known inside both branches, but these values are forgotten after the conditional statement, since the values differ. For this reason, weakness like this was not reported by the previous version of the detector nor by the original FindBugs detector for constant database passwords. Even if there is only one possible constant, analysis can fail because of null values, see this code (looking a bit unreal, but demonstrating the problem):

String pwd = null;
if (shouldConnect()) {
    pwd = "hardcoded";
}
if (pwd != null) {
    Connection connection = DriverManager.getConnection(url, user, pwd);
    // some code working with database
}

We can easily see that the password variable has always value “hardcoded”, but the performed analysis is linear and the fact is forgotten right after the first conditional statement. Second condition cannot return the forgotten constant back and weakness is not detected again.

Fortunately, these issues can be solved by setting and reading user value fact, which OpcodeStackDetector allows (if annotation CustomUserValue is added to the extending class). Our fact has only one value for hard-coded stack items or it is null to indicate unknown state (default). We can periodically check, whether a value on the stack is a known constant or null and set the user value for it if it is, propagation is done automatically. If then analysis merges facts from different control flow branches with different constants (or null), user value is the same and not reset to default. Custom user value is also used to mark hard-coded passwords and keys with the other password and key data types, detection of those objects remains similar as in the previous version of the detector. Weakness is reported if sink parameter has non-null user value and stack value is not null (null passwords are not considered to be hard-coded).

The detector is using proper flow analysis after this improvement; however, it is restricted to a method scope and hard-coded values in fields are reported only if used in the same class. Inter-method and inter-class analysis is a future challenge, but I have kept reporting of hard-coded fields with suspicious names and unknown sink not to miss important positives. In contrast to the previous version, these fields are reported with lower priority and only if not reported before using proper flow analysis technique to prevent duplicate warning. Moreover, all fields are reported in a single warning for a class to make possible false positives less distracting.

Another improvement is the possibility to configure more method parameters as password or key sinks. If more than one parameter is hard-coded, only single warning is produced and the parameters are mentioned in the detailed message. The last important change is that hard-coded cryptographic keys are reported in a separated bug pattern since both hard-coded passwords and keys have a different CWE identifier (259 and 321) and are equally important. Decision between reported warnings is done automatically based on the data types of hard-coded parameters.

I have tested the detector with Juliet Test Suite and using proper analysis it can reveal both types of weakness in 17 flow variants (out of 37) and all sink variants with no false positives. Original FindBugs detector reveals weaknesses in 10 flow variants and only for database passwords, other password methods and hard-coded keys are not detected in any variant.

You can see the detector class on GitHub. Happy coding with no hard-coded passwords!

software bug

FindBugs is a great open source tool for detection of software bugs in Java. It uses static analysis to search compiled classes for hundreds of bug patterns and even more can be found using FindSecurityBugs and fb-contrib plugins. However, before my recent contribution there was no general detector for hard-coded passwords and cryptographic keys. Hard-coded password are identical for each installation and can be easily extracted, which is likely to be exploited (see CWE-259 for more information). FindBugs and FindSecurityBugs could already detect this vulnerability, but only for constant database passwords and two very specific cases. I have created a detector (accepted to FindSecurityBugs), which is able to find hard-coded values of Strings, char and byte arrays or BigIntegers used as an input parameter for one of the configured methods such as KeyStore.load or KeySpec constructors.

To add a new detector, we have to create a class that implements Detector or extends a prepared class with helper functionality (I have used OpcodeStackDetector). An instance of BugReporter passed in constructor and its method reportBug are used to report problems (BugInstance objects). We also need to add the class name to findbugs.xml to be executed and edit messages.xml for information about detections. Good start for thinking about the detection logic is writing a bunch of flawed code samples and looking to their bytecode (plugin for IDEA can be used). We can write unit tests for them by mocking BugReporter.

Detector class can use visitor design pattern (if it implements Visitor) to react on events while analyzed class is scanned. I have started by overwriting method sawOpcode, which is called every time an instruction is read. Since we are interested in invocations of specific methods, we need to check if it is one of the invoke instructions and get full called method name such as java/security/KeyStore.load(Ljava/io/InputStream;[C)V, which contains method class with package, argument types ([C is char array, parameters of object types starts with L and ends with semicolon) and return type (V for void). Method name can be obtained by calling methods getClassConstantOperand, getNameConstantOperand and getNameSigOperand inherited from DismantleByteCode. If it is one of the problematic methods (loaded from resource files), we can create a BugInstance, add current analyzed line plus some info to it and report the bug. Now we have a password usage detector but not hard coded password detector, so it is time to eliminate false positives (detections that are not real bugs).

For String passwords, we can utilize OpcodeStackDetector and check nullness of stack.getStackItem(0).getConstant() to detect usage of constant String as a first method parameter. Unfortunately, it is not so easy for the other variable types. To detect that an array is initialized with hard-coded values, I am checking whether instruction for new array creation is followed by push and array store instructions while for example not calling methods. Constant arrays are also converted from constant Strings using methods toCharArray and getBytes. After implementing this, we can detect BigIntegers too, since they can be constructed from Strings or byte arrays.

In terms of so called taint analysis, we are able to detect vulnerability source (hard-coded data) and sink (usage of password or cryptographic function), but bug should be reported only if there is a flow from source to sink (we cannot be sure that hard-coded password is really a password until it is used as a password parameter). In the current implementation, no complex flow analysis is performed, we assume that a taint source followed by a taint sink of a matching type inside the same method body are always related. For this reason, false positives are easy to demonstrate, but are quite uncommon in practice. On the other hand, local hard-coded declarations are forgotten while another method is analyzed (visit method is overwritten to reset the state), so passwords are not detected if they are passed as a parameter and used in another method.

Class fields are also taken into an account – if constant data is stored to them, we remember that and consider it as a taint source when they are read. Because of this, the order of the methods matters and as static initializer section is added to the class end by compiler, its analysis is run ‘manually’ by calling doVisitMethod when class analysis starts. In addition, if the field stores hard-coded data and its name is suspicious (like password, secret or aesKey), the bug is reported immediately, since there is a high bug confidence and if it was used in a different class, it would not be reported otherwise (one the other hand, it can be reported twice now).

You can see the whole code on GitHub. I have mentioned some imperfections, but I think the detector is working quite well. Unfortunately, there is not much information about writing detectors, so creating them can be just a matter of trial and error. If you have an idea for improvement or a new detector, don’t hesitate to contact me or pull the code directly. 🙂