This is an introduction to UAM project – management server for Universal appliances (new TPv4, eDee and SafeQube) and the Vert.x framework on which it is developed.

There was a need for easy installation and upgrade of OS and applications running on an UA so we needed a system to manage UAs with as little user interaction as possible.
The main requirements are robustness (we will deal with unrealiable networks and possible power outages), small executables (there is limited storage on the SafeQube) and little disk access (we don’t want to wear out our SafeQube flashdisks).

UAThe system has 2 parts:

  • UA management server – server application which will be deployed next to CML and will contain APIs for system administration
  • UA management server proxy – application cluster which will be deployed on SafeQubes and will cache application binaries and configuration

Both applications are developed using Vert.x framework. UAM server uses also database as configuration storage and proxy applications use Hazelcast for clustering.

Vert.x is an application framework for programming servers (especially web servers). It is built on top of Netty so no application container like Tomcat or Jetty is necessary.

I will show several examples of working with Vert.x so you can try out writing your own applications.

Starting a web server is simple:

vertx.createHttpServer()
    .requestHandler(request -> {
        System.out.println("Received request from " + request.remoteAddress().host());
    })
    .listen(80);

Using this approach is fast, but you have to handle which business logic to call for which endpoint and it can get quite messy.
Fortunately, we can do better using Vert.x web module:

Router router = Router.router(vertx);
router.route("/hello").handler(context -> {
    String name = context.request().getParam("name");
    context.response().end("Hello " + name);
});

router.get("/ping").handler(context -> {
    context.response().end("Pong!");
});

vertx.createHttpServer()
     .requestHandler(router::accept)
     .listen(80);

This way, we can easily separate business logic for our endpoints.

Do you need request logging or security? You can add more routes for same endpoints. Routes are then processed in the order of addition to the router and can call next route in the chain or just end the response.

Router router = Router.router(vertx);
router.route().handler(CookieHandler.create());
router.get("/secured-resource").handler(context -> {
    if ("password".equals(context.getCookie("securityToken"))) {
        context.next();
    } else {
        context.response().setStatusCode(401).end("sorry, you are not authorized to see the secured stuff");
    }
});

router.get("/secured-resource").handler(context -> {
    context.response().end("top secret stuff");
});

Do you want more than just APIs? There are template engines for Handlebars, MVEL, Thymeleaf or Jade if you want to use their functionality.

router.get().handler(TemplateHandler.create(JadeTemplateEngine.create()));

Things get more interesting when you configure Vert.x instances to form a cluster. Currently, the only production ready cluster manager is built on top of Hazelcast, so we will use it.

VertxOptions options = new VertxOptions();
options.setClustered(true);
options.setClusterHost("192.168.1.2");

Vertx.clusteredVertx(options, result -> {
    if (result.succeeded()) {
        Vertx vertx = result.result();
        //initialize what we need
    } else {
        LOGGER.error("Failed to start vertx cluster", result.cause());
    }
});

As you can see, starting Vert.x cluster isn’t that complicated. Just be careful to set correct cluster host address or you may find out that the cluster doesn’t behave how you would expect.
And what functionality is available for clustered applications? There is clustered event bus, where you can use point-to-point or publish-subscribe messaging. Just register your consumers and send them messages to logical addresses without ever caring on which node the consumer resides.

vertx.eventBus().consumer("/echo", message -> {
    message.reply(message.body());
});

vertx.eventBus().send("/echo", "ping");

I hope this short introduction to Vert.x will make you curious to look at its documentation and start writing your own applications on top of it.

One of the systems our team develops is UI for end-users, where users can view and manage their print related data.

The system is designed as a simple web application, where we make AJAX calls to Spring controllers which delegate the calls to two other systems, no database is present.

One of the requirements on the system was to support about 1000 concurrent users. Since Tomcat has by default 200 threads and the calls to the other systems may take long (fortunately it’s not the case at the moment), we have decided to make use of Servlet 3.0 async. This way, each AJAX call from the browser uses up Tomcat thread only for preparation of a call to other system. The calls are handled by our asynchronous library for communication with SafeQ and asynchronous http client for communication with Payment System which both have own threadpools and fill in the response when they get a reply.

Since we depend so much on other systems’ performance, we wanted to monitor the execution time of the requests for better tracking and debugging of production problems.

There are several endpoints for each system and more can come later, so in order to avoid duplication, we have decided to leverage Spring aspect programming support. We have created an aspect for each business service (one for SafeQ, one for Payment System) and it was time to implement the measurement business logic.

In synchronous scenario, things are pretty simple. Just intercept the call, note start and end time and if it took too long, just log it in a logfile.

@Around("remoteServiceMethod()")
public Object restTimingAspect(ProceedingJoinPoint joinPoint) throws Throwable {
    long start = System.currentTimeMillis();

    Object result = joinPoint.proceed();

    long end = System.currentTimeMillis();

    long executionInMillis = end - start;
    if (executionInMillis > remoteServiceCallDurationThresholdInMillis) {
        LOGGER.warn("Execution of {} took {} ms.", joinPoint, executionInMillis);
    }

    return result;
}

This won’t work in our asynchronous case. The call to joinPoint.proceed(), which will make call to other system returns immediately without waiting for a reply. Reply is processed in a callback provided to one of async communication libraries. So we have to do a bit more.

We know the signature of our business methods. One of the arguments is always callback, which will process the reply.

public void getEntitlement(ListenableFutureCallback callback, String userGuid, Long costCenterId)

If we want to add our monitoring logic in a transparent way, we have to create special callback implementation, which will wrap the original callback and track the total time execution.

class TimingListenableFutureCallback implements ListenableFutureCallback {

    private ListenableFutureCallback delegate;
    private StopWatch timer = new StopWatch();
    private String joinPoint;

    public TimingListenableFutureCallback(ListenableFutureCallback delegate, String joinPoint) {
        this.delegate = delegate;
        this.joinPoint = joinPoint;
        timer.start();
    }
        
    @Override
    public void onSuccess(Object result) {
        logExecution(timer, joinPoint);
        delegate.onSuccess(result);
    }

    @Override
    public void onFailure(Throwable ex) {
        logExecution(timer, joinPoint);
        delegate.onFailure(ex);
    }
}

And then we have to call the target business method with properly wrapped callback argument.

@Around("remoteServiceMethod() && args(callback,..)")
public Object restTimingAspect(ProceedingJoinPoint joinPoint, ListenableFutureCallback callback) throws Throwable {
    String joinPointName = computeJoinPointName(joinPoint);
    
    Object[] wrappedArgs = Arrays.stream(joinPoint.getArgs()).map(arg -> {
        return arg instanceof ListenableFutureCallback ? wrapCallback((ListenableFutureCallback) arg, joinPointName) : arg;
    }).toArray();
    
    LOGGER.trace("Calling remote service operation {}.", joinPointName);
    
    return joinPoint.proceed(wrappedArgs);
}

There is similar implementation for the other async messaging library.

I hope this solution will help you solve similar problems in your applications in an elegant manner 🙂

Black Hat Europe 2015

Black Hat is a famous computer security conference held annually in the USA, Europe and Asia. Philippe Arteau, the original developer of FindSecurityBugs, was accepted as a presenter and because of my significant software contribution, he suggested me presenting the tool together. Y Soft then supported me in taking this opportunity.

Black Hat Europe 2015 was held in November in Amsterdam and over 1500 people from 61 countries attended this event. The conference was opened with the keynote called What Got Us Here Won’t Get Us There mentioning upcoming security apocalypse, complexity as a problem, security anti-patterns, taking wrong turns, developing bad habits, missing opportunities or re-examining old truths (using many slides). The Forum and three smaller rooms were then used for parallel briefings – usually one-hour presentations with slides and some demonstrations of latest developments in information security. There was also Business Hall with sponsors, free coffee and a Tool/Demo area for the sub-event called Arsenal. Here we and mostly independent researchers showed our “weapons” – every tool had a kiosk for two hours and people come closer to see short demonstrations and to ask questions.

Despite presenting during lunch time, fair number of Black Hat attendees came to discuss and see FindSecurityBugs in action. The most common questions were about what it can analyse (Java and other JVM languages like Scala, no source code needed), which security weaknesses can we detect (the list is here, a few people were happy that there are also detectors for Android) and whether is it free or not (yes, it is open source). People seemed to like the tool and someone was quite surprised it is not a commercial tool 🙂

There were many interesting briefings during the two days. One of the most successful presentations (with applause in the middle of it) explained a reliable software-only attack to defeat full disk encryption (BitLocker) in Windows. If there was no pre-boot authentication and the attacked machine had joined a domain, it was possible to change the password at the login screen (and have full access to the system) by setting up a mock domain controller with user password expired. Microsoft released a patch during the conference. Other briefings included breaking into buildings, using continuous integration tools as an attack surface, fooling and tracking self-driving cars or delivering exploits “with style” (encoded in image/HTML polyglot). One presentation (about flaws in banking software) was cancelled (or probably postponed to another event) because the findings were too serious to disclose them at that time, but there was an interesting talk about backend-as-a-service instead – many Android applications contain hard coded credentials to access cloud services and researchers were able to extract 56 millions of sensitive user records. You can download many of the presentations (and some white papers) from the Black Hat website.

I also had time for a small sightseeing – there was a special night bus for attendees to the city centre and I was able to see it again before my flight home too. Amsterdam has a nice architecture separated by kilometres of canals and it is a very interesting city in general. What surprised me the most were bicycles everywhere – I had known people ride here a lot, but I didn’t expect to see such a huge amount of bikes going around and parked in the centre. They don’t wear helmets, sometimes carry a few children in a cart and don’t seem to be very careful (so I had to be a bit more careful than I’m used to when crossing streets). Walking through red-light district De Wallen with adult theatres and half-naked ladies behind glass doors is a remarkable experience too (don’t try to make photos of them, they’ll be angry). Sex shops and “coffee shops” (selling cannabis) are quite common not only in this area.

Another surprise came at the airport, where the inspection was very thorough and I was forced to put everything out of my bag (which I was not very happy about, because it took me a long time to pack it into a single cabin baggage). Just after (when I connected to the airport Wi-Fi) I realized what happened in Paris recently. The plane was also surprisingly small and first time for me I had received special instructions for opening the emergency door (since my seat was next to the right wing by chance). Nevertheless, the whole tripe was a nice experience for me and I was happy to be there, so maybe next time in London 🙂

While caring about security of our code is arguably important, it is not enough for building a secure product. Vulnerabilities might also arise from a 3rd party component. Handling of those vulnerable libraries is thus another essential aspect of building a secure product.

Database of vulnerabilities

A simple way for managing a large number of 3rd party libraries might be using a vulnerability database. There is well-known National Vulnerability Database that contains information about many vulnerabilities. This is also what OWASP Dependency Check uses. Let me introduce it.

NVD aims to contain all known vulnerabilities of publicly available software. An entry about a vulnerability has some CVE number. (CVE means “Common Vulnerabilities and Exposures”, so CVEs are not limited to vulnerabilities, but it is not so important for now.) You can look at some example of CVE. Various details might be available, but the level of details may depend on multiple factors. For example, a not-yet-publicly-disclosed vulnerability might have a CVE number, but its description will be obviously not very verbose.

CVE numbes are usually assigned to CPEs (“Common Platform Enumeration”). A CPE is an identifier of vulnerable software. For example, the mentioned CVE-2005-1234 contains information that it affects cpe:/a:phpbb_group:phpbb-auction:1.0m and cpe:/a:phpbb_group:phpbb-auction:1.2m.

How OWASP Dependency Check works?

In a nutshell, it scans a project (JARs, POM files, ZIP files, native libraries, .NET assemblies, …) and tries to assign a CPE to each entity. Note that there is much of heuristics involved. Obviously, there is also much what can go wrong.

When CPEs are assigned, it looks for vulnerabilities (CVEs) assigned to those CPEs. A scan is written to a result file. Multiple formats are supported. The two most prominent formats are HTML (for direct reading) and XML (for further processing). We use the XML output in order to process multiple reports of multiple projects and assign them to particular teams. (One team is usually responsible for multiple projects.)

Integration with projects

Maven

OWASP Dependency Check has a mature Maven plugin. It basically works out-of-box and you can adjust probably any parameter supported by OWASP Dependency Check. We didn’t have to modify the project, as it can be run by mvn org.owasp:dependency-check-maven:check and the configuration can be adjusted by passing -Dproperty=value parameters.

Gradle

There is also some plugin for Gradle. Unfortunately, it is not as mature as the Maven plugin. We had to modify it in order to make it working in our environment. I’d like to merge those changes with the original project. Even after the modifications, it is still not ideal. The most prominent issue is that it includes test dependencies. Excluding them in the Groovy plugin is not as simple as with Maven plugin, because Gradle can have many configurations and each of those configurations might have different dependencies. I have no simple clue how to distinguish important configurations from others. However, this was not a really painful issue, as these dependencies are considerably less common and usually don’t have any known vulnerability in NVD, as they usually aren’t touched by untrusted input.

How we scan our projects?

Running those scans manually on multiple sub projects and evaluating all the reports manually would take too much time. So, we have developed a tool for automating some of the manual work.

First, scans are run automatically every day. There is no need to run it manually on every single subproject. Results of these scans are stored in XML format. Some adaptations specific to our environment are applied. For example, scans are able to run without any connection to the Internet. (Maven plugin can be configured without any modification, while Gradle plugin required to be modified for that.) Of course, we have to download all the vulnerability database separately, outside of the scan process.

Second, our tool downloads the results from our Bamboo build server and processes them. There are some sanity checks that should warn us if something breaks. For example, we check freshness of the vulnerability database.

Third, issues related to particular subprojects are automatically assigned to corresponding teams. Issues can be prioritized by severity and number of occurrences.

While ODC is a great tool, there are some issues connected to using it. I’ll discuss some of them there.

Complexity of library identifiers

How many ways of identifying a library do you know? Maybe you will suggest using groupId and artifactId and version (i.e. GAV) by which you can locate the library on a Maven repository. (Or you can pick some similar identifier used on another platform than Java.) This is where we usually start. However, there are multiple other identifiers:

  • GAV identifier: as mentioned above.
  • file hash: ODC uses SHA1 hash for lookups in Maven database. Note that there might be multiple GAV identifiers for one SHA1 hash, so there is 1:1 relation. Moreover, when we consider snapshot, we theoretically get m:n relation.
  • CPE identifier: This one is very important for ODC, as ODC can’t match vulnerabilities without that. Unfortunately, there is no exact algorithm for computation of CPE from GAV or vice versa. Moreover, multiple GAVs might be assigned to one CPE. For example, Apache Tomcat consists of many libraries, but all of them have just one CPE per version. Unfortunately, the ODC heuristic matching algorithm might also assign multiple CPEs to one GAV in some cases.
  • GA identifier: This is just some simplification of GAV identifier, which misses the version number. There is nothing much special about this identifier for ODC, but we have to work with that, too.
  • intuitive sense: For example, when you mention “Hibernate”, you probably mean multiple GAs at the same time.

Note that this complexity is not introduced by ODC. The complexity is introduced by the ecosystem that ODC uses (e.g. CPEs) and by the ecosystem ODC supports (e.g. Maven artifacts).

Bundled libraries

While überJARs might be useful in some cases, they are making inspection of all the transitive dependencies harder. While ODC can detect some bundled libraries (e.g. by package names or by POM manifests if included), this is somehow suboptimal. Moreover, if ODC finds a bundled dependency, it might look like a false positive at first sight.

Libraries without a CPE

Some libraries don’t have a CPE. So, when ODC can’t assign any CPE, it might mean either there is no CPE for the library (and hopefully no publicly known vulnerability) or ODC has failed to assign the CPE. There is no easy way to know that is the case.

False positives

False positives are implied by the heuristics used mainly for detecting CPE identifier. We ca’t get rid of all of them, until a better mechanism (e.g. CPE identifiers in POMs) is developed and widely used. Especially the latter would be rather a long run.

False positives can be suppressed in ODC in two ways. One can suppress assignment of a specific CPE to a specific library (identified by SHA1 hash). If a more fine-grained control is needed, one can suppress assignment of a single vulnerability to a specific library (identified by SHA1 hash). Both of them make sense when handling false positives.

Handling a CPE collision

For example, there is NLog library for logging in .NET. ODC assigns it cpe:/a:nlog:nlog. This might look correct, but this CPE has been used for some Perl project and there is some vulnerability for this CPE that is not related to the logging library.

If one suppressed matching the CPE, one could get some false negatives in future, as vulnerabilities of the NLog library might be published under the same CPE, according to a discussion post by Jeremy Long (the author of the tool).

In this case, we should not suppress those CPEs. We should just suppress CVEs for them.

Obviously mismatched CPEs

CPE might be also mismatched obviously. While the case is similar to the previous one, the CPE name makes it obvious that it does not cover the library. In this case, we can suppress the CPE for the SHA1 hash.

Missing context

Vulnerability description usually contains a score. However, even if the score is 10 out of 10, it does not imply that the whole system is vulnerable. In general, there are multiple other aspects that might change the severity for a particular case:

  • The library might be isolated from untrusted input.
  • The untrusted input might be somehow limited.
  • The library might run in sandboxed environment.
  • Preconditions of the vulnerability are unrealistic for the context.

None of them can be detected by ODC and automatic evaluation of those aspects is very hard, if it is not impossible.

Conclusion

We have integrated OWASP Dependency Check with our environment. We have evaluated the quality of reports. The most fragile part is matching a library to a CPE identifier, which may lead to both false negatives (rarely seen) and false positives (sometimes seen). Despite those issues, we have found ODC to be useful enough and adaptable to our environment.

As many of you may know, Android supports native printing since Android 4.4. This means, that there is new api handling communication between application from which user prints and application that later sends the job to the printer.

android print

So how it works? First lets have a look at applications from which user prints.

Main responsibility of these applications is to prepare output for print in pdf format. This includes for example paging or updating image to landscape or portrait mode.

Application from which user prints uses then system PrintManager service.

PrintManager printManager = (PrintManager) getSystemService(Context.PRINT_SERVICE);

Document output is prepared with PrintDocumentAdapter which is passed as second parameter of PrintManagers print function.

printManager.print(jobName, new PrintDocumentAdapter(...), printAttributes);

Now we are heading to the second part of job printing, where we have to discover printers and send them our job. This is responsibility of PrintService.

Printer discovery

We can either add printers manually. Set their ip address and port, or we can look for network printers in the local network.

Let’s have a look on how to find printers which support Zeroconf discovery in local network. Implementations of Zeroconf are for example Avahi daemon or Bonjour.

When printer discovery in Android is started, onCreatePrinterDiscoverySession() method in PrintService is called. Here we have to create our PrinterDiscoverySession.

Responsibilities of PrinterDiscoverySession are pretty straightforward.

  • find and add discovered printers
  • remove previously added printers that disappeared
  • update already added printers

In this example we will use NsdManager. NsdManager is native Android solution for finding zeroconf services. On the other hand its functionality is very limited, but for purpose of this demo it’s satisfactory. There exist other and better solutions, for example JmDNS. Current limitation of NsdManager is not being able to load txt records of mDNS messages.

In order to use NsdManager we have to implement two interfaces. DiscoveryLisener (handles discovery callbacks) and ResolveListener (handles resolving callbacks). I will call them OurDiscoveryListener and OurResolveListener.

First in onStartPrinterDiscovery() method we create new instance of DiscoveryListener and start the actual discovery.

discoveryListener = new OurDiscoveryListener(nsdManager);
nsdManager.discoverServices(SERVICE_TYPE, NsdManager.PROTOCOL_DNS_SD, discoveryListener);

This is pretty self-explanatory. We specify discovery service type, which is either “_ipps_.tcp.” or “_ipp_.tcp”, depending on the fact if we want to encrypt ipp messages or don’t.

And when service is found, then OurDiscoveryListener will handle what happens in individual states of discovery.

In the following code we can see that for example when service is found we try to resolve it with NsdManager.

public void onServiceFound(NsdServiceInfo service) {
    nsdManager.resolveService(service, new OurResolveListener());
}

Resolving service means, that we try to get more information about the service. This includes host ip and port. OurPrinterResolveListener then handles states what should happen when resolving succeeds or fails. When resolving succeeds, we process gained data and save it for future use.

Last part of printer discovery is to find more details about selected printer and checking whether is this printer still available. This is handled in onStartPrinterStateTracking() method.

Discovering details about printer can be done for examaple with ipp operation Get-Printer-Attributes and according to received data, set the printer information. Second function is to keep tracking of the printer state.

The following code sample just shows how to set few printer capabilities, which should be set according to the printer attributes. This doesn’t contain  tracking of printer state.

@Override
public void onStartPrinterStateTracking(PrinterId printerId) {
    // check for info we found when printer was discovered
    PrinterInfo printerInfo = findPrinterInfo(printerId);

    if (printerInfo != null) {
        PrinterCapabilitiesInfo capabilities = new PrinterCapabilitiesInfo.Builder(printerId)
                .setMinMargins(new PrintAttributes.Margins(200, 200, 200, 200))
                .addMediaSize(PrintAttributes.MediaSize.ISO_A4, true)
                .addMediaSize(PrintAttributes.MediaSize.ISO_A5, false)
                .addResolution(new PrintAttributes.Resolution("R1", "200x200", 200, 200), false)
                .addResolution(new PrintAttributes.Resolution("R2", "200x300", 200, 300), true)
                .setColorModes(PrintAttributes.COLOR_MODE_COLOR |
                        PrintAttributes.COLOR_MODE_MONOCHROME, PrintAttributes.COLOR_MODE_COLOR)
                .build();

        printerInfo = new PrinterInfo.Builder(printerInfo)
                .setCapabilities(capabilities).build();

        // We add printer info to system service
        List<PrinterInfo> printers = new ArrayList();
        printers.add(printerInfo);
        addPrinters(printers);
    }
}

When different printer is selected, then onStopPrinterStateTracking is called and onStartPrinterStateTracking again.

 

Printing:

Android itself doesn’t contain implementation of any printing protocol. Because of this I created small IPP parser. But that’s topic for another day.

Here I will only show example of handling queued print job.

In the following code we pick one job according to id from saved processed jobs and set his state to printing. Class PrintTask in the following example is just Android AsyncTask which in background creates IPP request and appends job data.

public void handleQueuedPrintJob(PrintJobId printJobId, PrinterId printerId) {
    final PrintJob printJob = mProcessedPrintJobs.get(printJobId);
    if (printJob == null) {
        return;
    }

    if (printJob.isQueued()) {
        printJob.start();
    }

    OurPrinter ourPrinter = ourDiscoverySession.getPrinter(printerId);

    if (ourPrinter != null) {
        AsyncTask <Void, Void, Void> printTask =
                new PrintTask(printJob, ourPrinter);
        printTask.execute();
    }
}

In case that we have decided to use ipps, we also have to set correct certificate. Next step is to create new HttpURLConnection. (or HttpsURLConnection for secure transfer).

The last thing we have to do is to write into the output stream our IPP message, send it and wait for response from server.

Android manifest file

We have to set necessary permissions in the Android manifest file, in order to be able to run the PrintService.
Add android.permission.BIND_PRINT_SERVICE when creating the service. Example:

...
<service android:name=".OurPrintService" 
    android:permission="android.permission.BIND_PRINT_SERVICE">

    <intent-filter>
        <action android:name="android.printservice.PrintService" />
    </intent-filter>
</service>
...

This allows system to bind to the PrintService. Otherwise the service wouldn’t be shown in the system.

Also

<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.START_PRINT_SERVICE_CONFIG_ACTIVITY" />

permissions are needed for printer discovery and to access files in external storage.

We use Liquibase in our project as a DB change management tool. We use it to create our DB schema with the basic configuration our application needs to run.

This is, however, not enough for development or (unit) testing. Why? Because for each test case, we need to have data in the database in a particular state. E.g. I need to test that the system rejects the action of a user that does have the required quota for the action he/she wants to do. So I create a user with a 0 quota and then try to perform the action and see whether the system allows it or rejects it. To not waste time setting our test data repeatedly, we use special Liquibase scripts that set up what we need in our test environment (such as a user with a 0 quota), so that we do not have to do this manually.

For the Payment System project, we used Liquibase to run plain SQL scripts to insert the data that we needed. It worked well enough, but this approach has some disadvantages.

Mainly, the person writing the scripts has to have knowledge of our database and the relations between various tables, so there is a steep learning curve. Another issue is that all of the scripts have to be updated when a larger DB schema change takes place.

Therefore, during our implementation of the Quota System, I took inspiration from the work of my colleague, who used a kind of high-level DSL as a convenient way to setup the Quota system and I turned it into a production-ready feature on top of Liquibase. This solved the problem of manual execution (the scripts always run during the application startup and are guaranteed to run exactly once).

For the DSL, I chose Groovy, since we already use it for our tests, and there is no interoperability issue with Java based Liquibase.

Liquibase has many extension points and implementing a custom changelog parser seemed the way to go.

The default parser for XML parses the input file, generates internal Liquibase objects representing particular changes, and then transforms them to SQL, which is executed on the DB.

I created similar a parser, which was registered to accept Groovy scripts.

The parser executes the script file, which creates internal Liquibase objects, but, of course, one DSL keyword can create more insert statements. What is more, when a DB schema changes, the only place that needs to change is the DSL implementation, not all of the already created scripts.

Example Groovy script:

importFile('changelog.xml')

bw = '3' //using guid of identifier which is already present in database
color = identifier name:'color'
print = identifier name:'PRINT', guid:'100' //creating identifier with specific guid

q1 = quota guid:'q1', name:'quota1', limit:50, identifiers:[bw, print], period:weekly('MONDAY')
q2 = quota guid:'q2', name:'quota2', limit:150, identifiers:[color, print], period:monthly(1)

for (i in 1..1000)
    quotaSubject guid: 'guid' + i, name:'John' + i, quotas:[q1,q2]

This example shows that the Groovy script can reference another Liquibase script file – e.g., the default changelog file, which creates the basic DB structure and initial data.

It also shows the programmatic creation of quotaSubjects (accounts in the system), where we can use normal Groovy for a loop to simply create many accounts for load testing.

QuotaSubjects have assigned quotas, which are identified by quota identifiers. These can be either created automatically, or we can reference already existing ones.

The keywords identifier, quota, quotaSubject, weekly, and monthly are just normal Groovy functions, that take Map as an argument, which allows us to pass them named parameters.

Before execution, the script is concatenated with the main DSL script, where the keywords are defined.

Part of the main DSL script that processes identifiers:

QuotaIdentifier identifier(Map args) {
    assert args.name, 'QuotaIdentifier name is not specified, available params are: ' + args
    String guid = args.guid ? args.guid : nextGuid()

    addInsertChange('QUOTA_IDENTIFIER', columns('GUID', guid) << column('NAME', args.name) << column('STATUS', 'ENABLED'))
    new QuotaIdentifier(guid)
}

private def addInsertChange(String tableName, List<ColumnConfig> columns) {
    InsertDataChange change = new InsertDataChange()
    change.setTableName(tableName)
    change.setColumns(columns)

    groovyDSL_liquibaseChanges << change
}

The calls produce Liquibase objects, which are appended to variables accessible within the Groovy script. The content of the variables constitutes the final Liquibase changelog, which is created after the processing of the script is done.

This way, the test data file is simple to read, write, and maintain. A small change in the changelog parser also allowed us to embed the test data scripts in our Spock test specifications so that we can see the test execution logic and test data next to each other.

@Transactional @ContextConfiguration(loader = YSoftAnnotationConfigContextLoader.class)
class QuotaSubjectAdministrationServiceTestSpec extends UberSpecification {

    // ~ test configuration overrides ==================================================================

    @Configuration @ImportResource("/testApplicationContext.xml")
    static class OverridingContext {

        @Bean
        String testDataScript() {
            """groovy: importFile('changelog.xml')

                       bw = identifier name:'bw'
                       color = identifier name:'color'
                       print = identifier name:'print'
                       a4 = identifier name:'a4'

                       p_a4_c = quota guid:'p_a4_c', name:'p_a4_c', limit:10, identifiers:[print, a4, color], period:weekly('MONDAY')
                       p_a4_bw = quota guid:'p_a4_bw', name:'p_a4_bw', limit:20, identifiers:[print, a4, bw], period:weekly('MONDAY')
                       p_a4 = quota guid:'p_a4', name:'p_a4', limit:30, identifiers:[print, a4], period:weekly('MONDAY')
                       p = quota guid:'p', name:'p', limit:40, identifiers:[print], period:weekly('MONDAY')
                       q1 = quota guid:'q1', name:'q1', limit:40, identifiers:[print], period:weekly('MONDAY')   
                       q2 = quota guid:'q2', name:'q2', limit:40, identifiers:[print], period:weekly('MONDAY')
                       q_to_delete = quota guid:'q_to_delete', name:'q_to_delete', limit:40, identifiers:[print], period:weekly('MONDAY')

                       quotaSubject guid: 'user1', name:'user1', quotas:[p_a4_c, p_a4_bw, p_a4]
                       quotaSubject guid: 'user2', name:'user2', quotas:[p_a4_c, p_a4_bw, p_a4]
                       quotaSubject guid: 'user3', name:'user3', quotas:[p_a4_c, p_a4_bw, p_a4, p]
                       quotaSubject guid: 'user4', name:'user4', quotas:[p_a4_c]"""
        }
    }

    // ~ instance fields ===============================================================================

    @Autowired private QuotaSubjectAdministrationService quotaSubjectAdministrationService;
    @Autowired private QuotaAdministrationService quotaAdministrationService;

    // ~ findByGuid ====================================================================================

    def "findByGuid should throw IllegalInputException for null guid"() {
        when:
        quotaSubjectAdministrationService.findByGuid(null)

        then:
        thrown(IllegalInputException)
    }
Do you want to attend the famous developer conference, GeeCON, in the Czech Republic? No problem! Y Soft and GeeCON’s organizers bring it to our capital again. We can therefore proudly announce, that Y Soft is the platinum sponsor, co-organizer and key partner of GeeCON 2015 in Prague, on October 22-23.

 

GeeCON focuses on news and hacks all around Java and Java Virtual Machine based technologies. It was first organized in 2009 and since then has grown into a big conference with over 80 speakers and sessions in three days, from an initial 350 participants to 2000+ attendees today. From its originally wider focus, it has crystallized into one specialized topic, although no less rich – all about JAVA technology.

GeeCON is a conference focused on Java and Java Virtual Machine based technologies, with special attention to dynamic languages like Groovy and Ruby. GeeCON is a forum for sharing experiences about modern software development methodologies, enterprise architectures, software craftsmanship, design patterns, distributed computing and more!

The fact that the participants are literally flocking from all over Europe says a lot about the qualities of GeeCON, with some even coming from other continents as well. Traditionally, representatives of Czech developers have come there in large numbers. Lectures take place in several halls in parallel so that all participants can choose exactly according to their interests. You will find famous names from around the world among the speakers – Kevlin Henney, Milen Dyankov, Simon Brown, Grant Ingersoll or Antonio Goncalves.

We encourage you to visit GeeCON Prague, CineStar – Cerny Most, all you have to do is book a date in your diary for the 22nd to 23rd of October, everything else you will discover over the following weeks directly at www.geecon.cz.

After finishing hard-coded passwords detector, I have focused on improving the detection of the most serious security bugs, which could be found by static taint analysis. SQL injection, OS command injection and Cross-site scripting (XSS) are placed as top first, second and fourth in CWE Top 25 most dangerous software errors (while well-known buffer overflow, not applicable to Java, is placed third). Path Traversal, Unvalidated Redirect, XPath injection or LDAP injection are also related types of weaknesses – unvalidated user input can exploit syntax of an interpreter and cause a vulnerability. Injections in general are the risk number one in OWASP Top 10 too, so a reliable open-source static analyser for those kinds of weaknesses could really make the world a more secure place 🙂

FindBugs already has detectors for some kinds of injections, but many bugs is missed due to insufficient flow analysis, unknown taint sources and sinks and also targeting zero false positives (even though there are some). In contrast, the aim of bug detectors in FindSecurityBugs is to be helpful during security code review and not to miss any vulnerability – there was some effort to reduce false positives, but before my contribution almost all taint sinks were reported in practice. Unfortunately, searching a real problem among many false warnings is quite tedious. The aim of the new detection mechanism is to report more high-confidence bugs (with minimum of false positives) than FindBugs detectors plus report lower-confidence bugs with decreased priority not missing any real bugs while having false positives rate much lower than FindSecurityBugs had originally.

For a reliable detection, we need a good data-flow analysis. I have already mentioned OpcodeStackDetector class in previous articles, but there is a more advanced and general mechanism in FindBugs. We can create and register classes performing a custom data-flow analysis and request those results later in detectors. Methods are symbolically executed after building control flow graph made of blocks of instructions connected by different types of edges (such as goto, ifcmp or exception handling), which are attempted to be pruned for impossible flow. We have to create a class to represent facts at different code locations – we want to remember some information (called a fact) for every reachable instruction, which can later help us to decide, whether a particular bug should be reported at that location. We need to model effects of instructions and edges on facts, specify the way of merging facts from different flow branches and make everything to work together. Fortunately, there are existing classes designed for extension to make this process easier. In particular, FrameDataflowAnalysis models values in the operand stack and local variables, so we can concentrate on the sub-facts about these values. The actual fact is then a frame of these sub-facts. This class models effects of instructions by pushing the default sub-fact on the modelled stack and popping the right amount of stack values. It also automatically moves sub-facts between the stack and the part of the frame with local variables.

Lets have a look, which classes had to be implemented for taint analysis. If we want to run custom data-flow analysis, a special class implementing IAnalysisEngineRegistrar must be created and referenced from findbugs.xml.

<!-- Registers engine for taint analysis dataflow -->
<EngineRegistrar
    class="com.h3xstream.findsecbugs.taintanalysis.EngineRegistrar"/>

This simple class (called EngineRegistrar) makes a new instance of TaintDataflowEngine and registers it with global analysis cache.

public class EngineRegistrar implements IAnalysisEngineRegistrar {

    @Override
    public void registerAnalysisEngines(IAnalysisCache cache) {
        new TaintDataflowEngine().registerWith(cache);
    }
}

Thanks to this, in the right time, method analyze of TaintDataflowEngine (implementing ImethodAnalysisEngine) is called for each method of analyzed code. This method requests objects needed for analysis, instantiates two custom classes (mentioned in next two sentences) and executes the analysis.

public class TaintDataflowEngine
    implements IMethodAnalysisEngine<TaintDataflow> {

    @Override
    public TaintDataflow analyze(IAnalysisCache cache)
            throws CheckedAnalysisException {
        CFG cfg = cache.getMethodAnalysis(CFG.class, descriptor);
        DepthFirstSearch dfs = cache
            .getMethodAnalysis(DepthFirstSearch.class, descriptor);
        MethodGen methodGen = cache
            .getMethodAnalysis(MethodGen.class, descriptor);
        TaintAnalysis analysis = new TaintAnalysis(
            methodGen, dfs, descriptor);
        TaintDataflow flow = new TaintDataflow(cfg, analysis);
        flow.execute();
        return flow;
    }

    @Override
    public void registerWith(IAnalysisCache iac) {
        iac.registerMethodAnalysisEngine(TaintDataflow.class, this);
    }
}

TaintDataflow (extending Dataflow) is really simple and used to store results of performed analysis (used later by detectors).

public class TaintDataflow
        extends Dataflow<TaintFrame, TaintAnalysis> {

    public TaintDataflow(CFG cfg, TaintAnalysis analysis) {
        super(cfg, analysis);
    }
}

TaintAnalysis (extending FrameDataflowAnalysis) implements data-flow operations on TaintFrame but it mostly delegates them to other classes.

public class TaintAnalysis
        extends FrameDataflowAnalysis<Taint, TaintFrame> {

    private final MethodGen methodGen;
    private final TaintFrameModelingVisitor visitor;

    public TaintAnalysis(MethodGen methodGen, DepthFirstSearch dfs,
            MethodDescriptor descriptor) {
        super(dfs);
        this.methodGen = methodGen;
        this.visitor = new TaintFrameModelingVisitor(
            methodGen.getConstantPool(), descriptor);
    }

    @Override
    protected void mergeValues(TaintFrame frame, TaintFrame result,
            int i) throws DataflowAnalysisException {
        result.setValue(i, Taint.merge(
            result.getValue(i), frame.getValue(i)));
    }

    @Override
    public void transferInstruction(InstructionHandle handle,
            BasicBlock block, TaintFrame fact)
            throws DataflowAnalysisException {
        visitor.setFrameAndLocation(
            fact, new Location(handle, block));
        visitor.analyzeInstruction(handle.getInstruction());
    }

    // some other methods
}

TaintFrame is just a concrete class for abstract Frame<Taint>.

public class TaintFrame extends Frame<Taint> {

    public TaintFrame(int numLocals) {
        super(numLocals);
    }
}

Effects of instructions are modelled by TaintFrameModelingVisitor (extending AbstractFrameModelingVisitor) so we can code with the visitor pattern again.

public class TaintFrameModelingVisitor
    extends AbstractFrameModelingVisitor<Taint, TaintFrame> {

    private final MethodDescriptor methodDescriptor;

    public TaintFrameModelingVisitor(ConstantPoolGen cpg,
            MethodDescriptor method) {
        super(cpg);
        this.methodDescriptor = method;
    }

    @Override
    public Taint getDefaultValue() {
        return new Taint(Taint.State.UNKNOWN);
    }

    @Override
    public void visitACONST_NULL(ACONST_NULL obj) {
        getFrame().pushValue(new Taint(Taint.State.NULL));
    }

    // many more methods
}

The taint fact – information about a value in the frame (stack item or local variable) is stored in a class called just Taint.

The most important piece of information in Taint is the taint state represented by an enum with values TAINTED, UNKNOWN, SAFE and NULL. TAINTED is pushed for invoke instruction with a method call configured to be tainted (e.g. getParameter from HttpServletRequest or readLine from BufferedReader), SAFE is stored for ldc (load constant) instruction, NULL for aconst_null and UNKNOWN is a default value (this description is a bit simplified). Merging of taint states is defined such that if we could compare them as TAINTED > UNKNOWN > SAFE > NULL, then merge of states is the greatest value (e.g. TAINTED + SAFE = TAINTED). Not only this merging is done where there are more input edges to a code block of control flow graph, but I have also implemented a mechanism of taint transferring methods. For example, consider calling toLowerCase method on a String before passing it to a taint sink – instead of pushing a default value (UNKNOWN), we can copy the state of the parameter not to forget the information. Merging is also done in more complicated examples such as for append method of StringBuilder – the taint state of the argument is merged with the taint state of StringBuilder instance and returned to be pushed on the modelled stack.

There were two problems with taint state transfer which had to be solved. First, taint state must be transferred directly to mutable classes too, not only to their return values (plus the method can be void). Not only we set the taint state for an object when it is being seen for the first time in the analysed method and then the state is copied, but we also change it according to instance methods calls. For example, StringBuilder is safe, when a new instance is created with non-parametric constructor, but it can taint itself by calling its append method. If only methods with safe parameters are called, the taint state of StringBuilder object remains safe too.  For this reason, effect of load instructions is modified to mark index of loaded local variable to Taint instance of corresponding stack item. Then we can transfer taint state to a local variable with index stored in Taint for specified methods in mutable classes. Second, taint transferring constructors (methods <init> in bytecode) must be handled specifically, because of the way of creating new objects in Java. Instruction new is followed by dup and invokespecial, which consumes duplicated value and initializes the object remaining at the top of the stack. Since the new object is not stored in any variable, we must transfer the taint value from merged parameters to the stack top separately.

Bugs related to taint analysis are identified by TaintDetector (implementing Detector). For better performance, before methods of some class are analyzed, constant pool (part of the class file format with all needed constants) is searched and the analysis continues only if there are references for some taint sinks. Then TaintDataflow instance is loaded for each method and locations of its control flow graph are iterated until taint sink method is found. This means, we find all invoke instructions used in a currently analysed method and check, whether the called methods are related to the searched weaknesses. Facts (instances of Taint class) from TaintDataFlow are extracted for each sink parameter of a sink method. Bug is reported with high confidence (and priority), if the taint state is TAINTED, with medium confidence for UNKNOWN taint state and with low confidence for SAFE and NULL (just for the case of a bad analysis, these warnings are not normally shown anywhere). Taint class also contains references for taint source locations, so these are shown in bug reports to make review easier – you should see a path between taint sources and the taint sink. TaintDetector itself is abstract, so it must be extended to detect concrete weakness types (like command injection) and InjectionSource interface implemented to specify taint sinks (the name of the interface is a bit misleading) and items in a constant pool to specify candidate classes.

public class CommandInjectionDetector extends TaintDetector {

    public CommandInjectionDetector(BugReporter bugReporter) {
        super(bugReporter);
    }

    @Override
    public InjectionSource[] getInjectionSource() {
        return new InjectionSource[] {new CommandInjectionSource()};
    }
}

CommandInjectionSource overwrites method getInjectableParameters, which returns an instance of InjectionPoint containing parameters, that cannot be tainted, and the weakness type to report. Boolean method isCandidate looks up constant pool for the names of taint sink classes and return true if present.

TaintDetector is currently used to detect command, SQL, LDAP and script (for eval method of ScriptEngine) injections and unvalidated redirect. More bug types and taint sinks should follow soon. Test results are looking quite promising so far. Inter-procedural analysis (not restricted to a method scope) should be the next big improvement, which could make this analysis really helpful. Then everything should be tested with a large amount of real code to iron out the kinks. You can see the discussed classes in taintanalysis package and try the new version of FindSecurityBugs.

FindBugs GUI

In the previous article, I was describing the creation of a new FindBugs detector for hard-coded passwords and cryptographic keys. I also mentioned some imperfections and I have decided to learn more about FindBugs and improve the detection.

Java virtual machine has a stack architecture – operands must be pushed on the stack before method is invoked, given number of stack values is consumed during invocation and produced return value (if any) is pushed subsequently. My detector class extends OpcodeStackDetector, which implements abstract interpretation technique to collect approximated information about values at the operand stack for each code location. These pieces of information (usually called facts) are kept only for those locations, where a derived value of the fact does not depend on the preceding control flow (for example, the value is the same for each possible branch executed in earlier conditional statements).

One of the facts available for stack values is the actual value of a number or String (related to constant propagation performed by compilers during optimization). We can use this to detect hard-coded values – known constant means hard-coded value. However, we also need to track other other data types besides Strings (numbers can be ignored) to detect passwords in a char array and hard-coded keys. In addition, there is one more issue with this approach…

Tracking concrete values is unnecessarily complicated and the value often becomes unknown – we only need to know whether the value is constant, not which constant is on the stack. Consider a piece of code like this:

private Connection getConnection(String user) {
    String password;
    if ("root".equals(user)) {
        password = "superSecurePassword";
    } else {
        password = "differentPassword";
    }
    return DriverManager.getConnection(DB_URL, user, password);
}

The constant value in the password variable is known inside both branches, but these values are forgotten after the conditional statement, since the values differ. For this reason, weakness like this was not reported by the previous version of the detector nor by the original FindBugs detector for constant database passwords. Even if there is only one possible constant, analysis can fail because of null values, see this code (looking a bit unreal, but demonstrating the problem):

String pwd = null;
if (shouldConnect()) {
    pwd = "hardcoded";
}
if (pwd != null) {
    Connection connection = DriverManager.getConnection(url, user, pwd);
    // some code working with database
}

We can easily see that the password variable has always value “hardcoded”, but the performed analysis is linear and the fact is forgotten right after the first conditional statement. Second condition cannot return the forgotten constant back and weakness is not detected again.

Fortunately, these issues can be solved by setting and reading user value fact, which OpcodeStackDetector allows (if annotation CustomUserValue is added to the extending class). Our fact has only one value for hard-coded stack items or it is null to indicate unknown state (default). We can periodically check, whether a value on the stack is a known constant or null and set the user value for it if it is, propagation is done automatically. If then analysis merges facts from different control flow branches with different constants (or null), user value is the same and not reset to default. Custom user value is also used to mark hard-coded passwords and keys with the other password and key data types, detection of those objects remains similar as in the previous version of the detector. Weakness is reported if sink parameter has non-null user value and stack value is not null (null passwords are not considered to be hard-coded).

The detector is using proper flow analysis after this improvement; however, it is restricted to a method scope and hard-coded values in fields are reported only if used in the same class. Inter-method and inter-class analysis is a future challenge, but I have kept reporting of hard-coded fields with suspicious names and unknown sink not to miss important positives. In contrast to the previous version, these fields are reported with lower priority and only if not reported before using proper flow analysis technique to prevent duplicate warning. Moreover, all fields are reported in a single warning for a class to make possible false positives less distracting.

Another improvement is the possibility to configure more method parameters as password or key sinks. If more than one parameter is hard-coded, only single warning is produced and the parameters are mentioned in the detailed message. The last important change is that hard-coded cryptographic keys are reported in a separated bug pattern since both hard-coded passwords and keys have a different CWE identifier (259 and 321) and are equally important. Decision between reported warnings is done automatically based on the data types of hard-coded parameters.

I have tested the detector with Juliet Test Suite and using proper analysis it can reveal both types of weakness in 17 flow variants (out of 37) and all sink variants with no false positives. Original FindBugs detector reveals weaknesses in 10 flow variants and only for database passwords, other password methods and hard-coded keys are not detected in any variant.

You can see the detector class on GitHub. Happy coding with no hard-coded passwords!

software bug

FindBugs is a great open source tool for detection of software bugs in Java. It uses static analysis to search compiled classes for hundreds of bug patterns and even more can be found using FindSecurityBugs and fb-contrib plugins. However, before my recent contribution there was no general detector for hard-coded passwords and cryptographic keys. Hard-coded password are identical for each installation and can be easily extracted, which is likely to be exploited (see CWE-259 for more information). FindBugs and FindSecurityBugs could already detect this vulnerability, but only for constant database passwords and two very specific cases. I have created a detector (accepted to FindSecurityBugs), which is able to find hard-coded values of Strings, char and byte arrays or BigIntegers used as an input parameter for one of the configured methods such as KeyStore.load or KeySpec constructors.

To add a new detector, we have to create a class that implements Detector or extends a prepared class with helper functionality (I have used OpcodeStackDetector). An instance of BugReporter passed in constructor and its method reportBug are used to report problems (BugInstance objects). We also need to add the class name to findbugs.xml to be executed and edit messages.xml for information about detections. Good start for thinking about the detection logic is writing a bunch of flawed code samples and looking to their bytecode (plugin for IDEA can be used). We can write unit tests for them by mocking BugReporter.

Detector class can use visitor design pattern (if it implements Visitor) to react on events while analyzed class is scanned. I have started by overwriting method sawOpcode, which is called every time an instruction is read. Since we are interested in invocations of specific methods, we need to check if it is one of the invoke instructions and get full called method name such as java/security/KeyStore.load(Ljava/io/InputStream;[C)V, which contains method class with package, argument types ([C is char array, parameters of object types starts with L and ends with semicolon) and return type (V for void). Method name can be obtained by calling methods getClassConstantOperand, getNameConstantOperand and getNameSigOperand inherited from DismantleByteCode. If it is one of the problematic methods (loaded from resource files), we can create a BugInstance, add current analyzed line plus some info to it and report the bug. Now we have a password usage detector but not hard coded password detector, so it is time to eliminate false positives (detections that are not real bugs).

For String passwords, we can utilize OpcodeStackDetector and check nullness of stack.getStackItem(0).getConstant() to detect usage of constant String as a first method parameter. Unfortunately, it is not so easy for the other variable types. To detect that an array is initialized with hard-coded values, I am checking whether instruction for new array creation is followed by push and array store instructions while for example not calling methods. Constant arrays are also converted from constant Strings using methods toCharArray and getBytes. After implementing this, we can detect BigIntegers too, since they can be constructed from Strings or byte arrays.

In terms of so called taint analysis, we are able to detect vulnerability source (hard-coded data) and sink (usage of password or cryptographic function), but bug should be reported only if there is a flow from source to sink (we cannot be sure that hard-coded password is really a password until it is used as a password parameter). In the current implementation, no complex flow analysis is performed, we assume that a taint source followed by a taint sink of a matching type inside the same method body are always related. For this reason, false positives are easy to demonstrate, but are quite uncommon in practice. On the other hand, local hard-coded declarations are forgotten while another method is analyzed (visit method is overwritten to reset the state), so passwords are not detected if they are passed as a parameter and used in another method.

Class fields are also taken into an account – if constant data is stored to them, we remember that and consider it as a taint source when they are read. Because of this, the order of the methods matters and as static initializer section is added to the class end by compiler, its analysis is run ‘manually’ by calling doVisitMethod when class analysis starts. In addition, if the field stores hard-coded data and its name is suspicious (like password, secret or aesKey), the bug is reported immediately, since there is a high bug confidence and if it was used in a different class, it would not be reported otherwise (one the other hand, it can be reported twice now).

You can see the whole code on GitHub. I have mentioned some imperfections, but I think the detector is working quite well. Unfortunately, there is not much information about writing detectors, so creating them can be just a matter of trial and error. If you have an idea for improvement or a new detector, don’t hesitate to contact me or pull the code directly. 🙂