While choosing the right security layer for messaging via ZeroMQ, there were two main considerations: built-in security CurveZMQ or usage of more conventional TLS backend?

First option is ZeroMQ’s proprietary security protocol based on elliptic curve cryptography and Daniel’s Bernstein NaCl cryptographic library. CurveZMQ utilizes not only Bernstein’s Curve25519 elliptic curve, but also other ciphers, designed by him, as well- f.e. Salsa20 and poly1305. Other NaCl’s ciphers can be found on its web page under Public-key and Secret-key cryptography chapters.

From the beginning this option was really hot candidate to use, as according to papers about performance of Salsa20 and performance of Curve25519, NaCl crypto functions are faster than standard crypto functions. Also the fact that it was already included in ZeroMQ was big plus. Unfortunately NaCl library and its functions were created in years 2008-2010 and aren’t used widely as f.e. AES or SHA are, therefore aren’t matured enough as other libraries, what brings a little bit panic to our mind and ultimately was the cause of not using it.

Second option was the usage of TLS backend- TLSZMQ. This is a demo project of Ian Barber to secure ZeroMQ communication using OpenSSL library. Unfortunately TLS isn’t built into ZeroMQ, so for now this is the only option how to use TLS and ZeroMQ together 🙁

Implementations comparison

To find the best approach we had few requirements to check. Mainly it was the speed of messaging and library and crypto functions maturity, but other requirements such as support and liveliness of implementation, licensing, quality of code also weighted in decision.

From this set of metrics only 2 appeared to be ultimately decisive:

  • Library and crypto functions maturity
  • Speed of messaging

While the maturity of NaCl library was strong argument why not to use CurveZMQ, performance of NaCl mentioned before spoke strongly in favor of it.

To finally resolve which approach to use and determine the total overhead of each implementation I have created performance tests to find out if TLSZMQ could be used in our environment. The test suite for each implementation was designed to be a block of 20 separate tests running for 10 minutes each, to be sure of reliable test outcomes. Even the message sizes were chosen to reflect the size of messages to be really sent in environment.

total persec

At this point, results clearly spoke strongly in favor of CurveZMQ. On the other hand, there was a conflict with maturity of crypto functions used in CurveZMQ and use of single elliptic curve to generate keys. This could cause many problems, if the curve or other functions would be compromised in the future.

At this point I realized that tests were done with OpenSSL’s by default chosen cipher suite RSA-AES256-SHA384 (OpenSSL version 1.0.1p). There is nothing wrong with this cipher suite, but we have to admit that these algorithms were unnecessarily secure (understand with high overhead).

Yes, I know what I just said and that it sounds really bad, but we must also consider the environment of messaging (its encryption) and performance of the system as well. Only after thorough examination we should accommodate ciphers to real requirements and not just use cipher suites with unnecessarily high security (and overhead) to overkill it.

Therefore, tests were rerun with more appropriate (weaker but still strong) cipher suites. The requirement was to get as much close to CurveZMQ performance as possible:

  • elliptic curves used everywhere possible
    • key exchange [ECDH]
    • authentication [ECDSA]

Any EC could be used, not just one.

  • AES doesn’t have to use 256b key as 128b long keys are still considered pretty strong.

From few acceptable cipher suites, this one specific peaked out: ECDSA-AESGCM128-SHA256.

This new finding has shown that new cipher suite increased the messaging performance considerably (sometimes effectively doubling the speed) [see graphs].

From the results one can see, that TLSZMQ isn’t that much inferior to CurveZMQ and due to TLS’s maturity it is a better choice. The final argument is that test results of TLSZMQ were worse, but sufficient in YSoft SafeQ environment, so the choice of implementation was pretty straight-forward.

Truth is, that NaCl and Curve25519 are really fun (and much easier) to play with and I encourage you to at least have a look at them, but in the end TLS backend is more flexible in terms of crypto primitives (ciphers, key generation) and doesn’t bring that much security concerns. And that’s what we are looking for in secured ZeroMQ messaging.

As I have written some time ago, we use OWASP Dependency check in order to scan our product for known vulnerabilities. But when you have reports from many components of the product, you need to manage them somehow. In the previous article, I’ve briefly mentioned our tool that processes the reports. In this article, I’ll present this tool. FWIW, we have open-sourced it. You can use it, you can fork it, you can send us pull requests for it.

Vulnerable libraries view

The vulnerable libraries view summarizes vulnerabilities grouped by libraries. They are sorted by priority, which is computed as number of affected projects multiplied by highest-severity vulnerability. This is probably the most important view. If you choose a vulnerable library, you can check all the details about vulnerabilities, including affected projects. When you decide to update a library to a newer version, you will probably wonder if there is some known vulnerability present in the new library version. The ODC Analyzer allows you to check it after simply entering your desired library version.

One might find the ordering to be controversial, mostly using the highest-rated vulnerability. Maybe the scoring system is not perfect (and no automatic scoring system can be perfect), but I find it reasonable. I assume that highest-scored vulnerability is likely some remote code execution triggerable by remote unauthenticated attacker. In such case, does having multiple such vulnerabilities make the situation worse? Well, it might slightly increase the probability that the vulnerability will be exposed to the attacker, but having two such vulnerabilities can hardly make it twice as bad as having just one. So I wanted one highest-rated vulnerability to be a ceiling of risk introduced by the vulnerable library to the project. Maybe we could improve the algorithm to rate multiple low-severity vulnerabilities higher than the severity of the highest-rated vulnerability. I already have an idea how to do this, but it has to be discussed.

Vulnerabilities view

You can also list all vulnerabilities affecting one of the projects scanned. They are sorted by severity multiplied by number of affected projects. Details available there are mostly the same as in the vulnerable libraries view.

Another interesting feature is vulnerability suppression. Currently, ODC Analyzer can generate a suppression that can be pasted to suppressions.xml, so it is taken into account the next time when running vulnerability scans. We consider making it more smart and moving the suppression to ODC Analyzer itself.

Filters

Unless you have just a single team, most people are probably not interested in all the vulnerabilities of all the projects. They will want to focus on only their projects instead. ODC Analyzer allows focusing on them by filtering only one project or even a subproject. One can also define a team and filter all projects belonging to this team.

Teams are currently defined in configuration file, there is no web interface for it now.

E-mail notifications

One can subscribe to new vulnerabilities of a project by e-mail. This allows one to watch all relevant vulnerabilities without periodic polling of the page.

Export to issue tracking system

We are preparing export to issue tracking system. There is some preliminary implementation, but we might still perform some redesigns. We can export one issue per vulnerability. The rationale behind this grouping is that this allows efficient discussion when multiple projects are affected by the same vulnerability. A side effect of such design is that you need an extra project in your issue tracker and you will might want to create child issues for particular projects.

Currently, only JIRA export is implemented. However, if you want to export it to <insert name of your favourite issue tracker here>, you can simply implement one interface, add a few of code for configuration parsing and send us a pull request 🙂

Some features under consideration

Affected releases

We perform scans on releases. It would be great to add affected releases field to vulnerabilities. We, however, have to be careful to explain its meaning. We are not going to perform these scans on outdated unsupported releases, thus they will not appear there. So, it must be clear that affected releases are not exhaustive in this way.

Discussion on vulnerabilities

We consider adding a discussion to vulnerabilities. One might want to discuss impact on a project. These discussions should be shared across projects, because it allows knowledge sharing across teams. However, this will probably be made in issue tracker systems like JIRA, as we don’t want to duplicate their functionality. We want to integrate them, though.

Better branches support

If a project has two branches, it can be currently added as two separate projects only. It might be useful to take into account that multiple branches of a software belong to the same project. There are some potential advantages:

  • Issues in production might have higher urgency than issues in development.
  • Watching a particular project would imply watching all the branches.

However, it is currently not sure how to implement it and we don’t want to start implementation of this feature without a proper design. For example:

  • Some projects might utilize build branches in Bamboo.
  • Some projects can’t utilize build branches, because there are some significant differences across branches. For example, some projects switch from Maven to Gradle.
  • Is it useful to allow per-branch configuration (e.g., two branches belonging to different teams or watching only one branch)?
  • Should the branches be handled somewhat automatically?
  • How to handle different branching models (e.g. master + feature branches, master + devel + feature branches, …)?

Library tagging

Library tagging is useful for knowing what types of libraries are in the system. It is partially implemented. It works, but it has to be controlled using direct access to the database. There was never a GUI for adding a tag. When you have some existing tags, there is a GUI for adding these tags to a library, but there is no way to add one permission for that.

Homepage

The project was originally designed to be rather a single-page. This did not scale, so we added some additional views. The current homepage is rather a historical left-over and maybe it should be completely redesigned.

List of all libraries

Non-vulnerable libraries are not so interesting, but one might still want to list them for some purposes. While there is a hidden page containing all the libraries (including a hidden CSV output), it is not integrated to the rest of the application. We have needed this in the past, but we are not sure how to meaningfully integrate it with the rest of the system without creating too much of clutter.

Conclusion

We have implemented a tool useful for handling libraries with known vulnerabilities and cooperation across teams. This tool is in active development. If you find it useful, you can try it. If you miss a feature, you can contribute by your code. If you are unsure if your contribution is welcome, you can discuss it first.

Black Hat Europe 2015

Black Hat is a famous computer security conference held annually in the USA, Europe and Asia. Philippe Arteau, the original developer of FindSecurityBugs, was accepted as a presenter and because of my significant software contribution, he suggested me presenting the tool together. Y Soft then supported me in taking this opportunity.

Black Hat Europe 2015 was held in November in Amsterdam and over 1500 people from 61 countries attended this event. The conference was opened with the keynote called What Got Us Here Won’t Get Us There mentioning upcoming security apocalypse, complexity as a problem, security anti-patterns, taking wrong turns, developing bad habits, missing opportunities or re-examining old truths (using many slides). The Forum and three smaller rooms were then used for parallel briefings – usually one-hour presentations with slides and some demonstrations of latest developments in information security. There was also Business Hall with sponsors, free coffee and a Tool/Demo area for the sub-event called Arsenal. Here we and mostly independent researchers showed our “weapons” – every tool had a kiosk for two hours and people come closer to see short demonstrations and to ask questions.

Despite presenting during lunch time, fair number of Black Hat attendees came to discuss and see FindSecurityBugs in action. The most common questions were about what it can analyse (Java and other JVM languages like Scala, no source code needed), which security weaknesses can we detect (the list is here, a few people were happy that there are also detectors for Android) and whether is it free or not (yes, it is open source). People seemed to like the tool and someone was quite surprised it is not a commercial tool 🙂

There were many interesting briefings during the two days. One of the most successful presentations (with applause in the middle of it) explained a reliable software-only attack to defeat full disk encryption (BitLocker) in Windows. If there was no pre-boot authentication and the attacked machine had joined a domain, it was possible to change the password at the login screen (and have full access to the system) by setting up a mock domain controller with user password expired. Microsoft released a patch during the conference. Other briefings included breaking into buildings, using continuous integration tools as an attack surface, fooling and tracking self-driving cars or delivering exploits “with style” (encoded in image/HTML polyglot). One presentation (about flaws in banking software) was cancelled (or probably postponed to another event) because the findings were too serious to disclose them at that time, but there was an interesting talk about backend-as-a-service instead – many Android applications contain hard coded credentials to access cloud services and researchers were able to extract 56 millions of sensitive user records. You can download many of the presentations (and some white papers) from the Black Hat website.

I also had time for a small sightseeing – there was a special night bus for attendees to the city centre and I was able to see it again before my flight home too. Amsterdam has a nice architecture separated by kilometres of canals and it is a very interesting city in general. What surprised me the most were bicycles everywhere – I had known people ride here a lot, but I didn’t expect to see such a huge amount of bikes going around and parked in the centre. They don’t wear helmets, sometimes carry a few children in a cart and don’t seem to be very careful (so I had to be a bit more careful than I’m used to when crossing streets). Walking through red-light district De Wallen with adult theatres and half-naked ladies behind glass doors is a remarkable experience too (don’t try to make photos of them, they’ll be angry). Sex shops and “coffee shops” (selling cannabis) are quite common not only in this area.

Another surprise came at the airport, where the inspection was very thorough and I was forced to put everything out of my bag (which I was not very happy about, because it took me a long time to pack it into a single cabin baggage). Just after (when I connected to the airport Wi-Fi) I realized what happened in Paris recently. The plane was also surprisingly small and first time for me I had received special instructions for opening the emergency door (since my seat was next to the right wing by chance). Nevertheless, the whole tripe was a nice experience for me and I was happy to be there, so maybe next time in London 🙂

In November 2015, the third year of the Czech Hackathon was held in Prague. The challenge was to create in less than 2 days a sport application using modern technology such as Oculus Rift, Google Cardboard, Apple Watch, Arduino and others. The use of technologies in sport is becoming more popular. Y Soft therefore decided to set up a hacking team.

From Friday evening on 7 people, including a skilled teammate from sli.do (Dominik Paľo), enjoyed the great atmosphere, lots of interesting people, a chill out zone massage, perfect food and beer and then…

We hadn’t done any special preparation, we had a project idea which we had clarified at work and for which we had divided tasks. After work on Friday we met at the Impact Hub, a welcoming environment, where cool t-shirts and friendly organizers were awaiting us. The evening began with a welcome speech and lectures on Arduino, Apple TV and Apple Watch. Professional athletes also spoke and explained what their training looks like, how technology helps them and what enhancements would be welcomed. The Brainstorming and Assembly sessions started after 22:00. We had agreed our moves in advance, so we assembled a line for beer and dinner. Hacking started straight after the team pitches.

At 3 AM we were still working on the first prototype and application design. Our project was applications for fitness centers. Customers with an iBeacon keyring which identifies them are paired with their web profile. On each machine in the gym there is a tablet that detects whether the iBeacon keyring is nearby and greets the user with the question of whether they wish to work out or to have the statistics of their previous exercises. They can even directly choose the exercise program for the whole period spent in the gym. Whenever a problem occurs, the user may call a coach who can see on his own tablet the position and status of all the machines in the gym.

Gym Web Portal
The Saturday wake up was smooth, motivation was our morning alarm. We hacked, chatted and drank beer the whole day. The pleasure derived from the chillout massage planned before dinner was awesome. The work was divided into the application for tablet, watches, web portal and backend. We went home a bit earlier, a little after midnight. Next morning the final hours of hacking, testing and fine-tuning awaited us. The final presentations and voting started at 13:00.Mobile App Animation

So a great atmosphere and litres of beer and then … we came 2nd! The Hackathon was great, we enjoyed it because we strengthened our friendships, learned to do only what is necessary due to time constraints and spent it in the pleasant company of hacking enthusiasts. It was our second Hackathon. We came 3rd at the first one and 2nd at this one, so you’re welcome to join our Y Soft RnD team of enthusiastic hackers and win first place at the 2016 Hackathon.
Hacking Team
Hack on and run along now to the gym!
Y Soft Hacking Team

 

Wishing you good luck and great success in year 2016.

Here is small puzzle game for you:

PF 2016 Puzzle Game

http://ysofters.com/pf2016

Source code is available at GitHub.

You can play the game on tablet, phone or computer.

Based on: Enigma game, Kiwi.JS, Font Awesome.

Limitation for users with Windows 10 with touch displays: Mouse does not work, you can use only touch. Bug was already reported to authors of Kiwi.JS framework.

You can enjoy also PF games from previous years: PF 2015, PF 2014, PF 2013,PF 2012,PF 2011, PF 2010

Opting for a distributed solution of a problem is mostly motivated by two goals: resilience to failures (via redundancy) and better response time. In this post we will focus on the former. More specifically, we will investigate the conditions necessary to maintain a consistent distributed key-value store.

CAP

From the theoretical side, any possible solution is limited by the so-called CAP theorem. Informally, the theorem states that in an asynchronous network no database can remain

  • Consistent and
  • Available when
  • Partitioning of the network is possible.

A realistic environment of a distributed system must assume that server failures are possible and thus that there is no reasonable fix bound on the time needed to exchange messages between two servers. And while some may find the formulation of the theorem ambiguous, its inevitable corollary is that a distributed database cannot guarantee consistency of replicas and availability of read/write operations at the same time.

Consensus

Leslie Lamport’s Paxos protocol demonstrates the feasibility of maintaining consistency whenever a majority of servers is functioning. The above theorem thus states that any distributed system using the protocol cannot be always available. But there are some highly-available solution apart of Paxos.

Etcd

Etcd is an always-consistent highly-available key-value data store. The consistency is maintained by a majority consensus (quorum). The quorum is established and maintained by the Raft algorithm, i.e. Raft maintains a consistent replicated log among servers, each of which executes the commands stored in the log on the key-value store. As any other consensus protocol, Raft requires that among n servers at most f=⌊(n-1)⁄2⌋ is faulty. If more than f servers become faulty Raft times out all write operations but the read operations may work if consistency is not required.

Maintaining consistency is expensive even if only the possibility of partitioning has to be appreciated. Rather than describing the algorithm itself it might be more illustrative to show performance results of running etcd in real application. The figure below depicts the transaction throughput in a 5-node cluster.

Etcd throughput in a 5-node cluster.

The script that measured the performance increased the load in three steps: first a 2500 transaction was sent every second (for 30 seconds), then 5000 for another 30 seconds and then 10000. The data in the plot then report how many transaction successfully finished each second.

Leader Election

Partitions of the cluster are handled by a coordinated replication, i.e. a leader is elected who is responsible for all communication (both within the cluster and with the outside world). In the case of such a partition that separates the leader from most of the nodes, a new leader is elected. The leader election starts whenever the communication between the leader and a follower is lost for more than a randomly chosen time period called election timeout. After the timeout, the follower begins the election by broadcasting the request for vote. Everyone votes for the server that send the request first and the new leader is the server with a majority of votes. Together with the transaction throughput, the performance of leader election are the most important metrics to measure for a consensus protocol.

Etcd leader election time.

The graphs above show the time to detect and replace a crashed leader. Each line represents 1,000 trials (except for 100 trials for “150–150 ms”) and corresponds to a particular choice of election timeouts; for example, “150–155 ms” means that election timeouts were chosen randomly and uniformly between 150 ms and 155 ms. The steps that appear on the graphs show when split votes occur (the cluster must wait for another election timeout before a leader can be elected).

Etcd leader election -- number of packets.

Finally, above is the distribution of the number of packets needed to elect a leader. The figure  shows the number of packets sent to elect a leader in the 150–200 ms case. The minimum number of packets is six, representing a single node dispatching requests to the other four nodes and receiving votes from the first two. The plot clusters around multiples of eight: traces under eight represent the first node timing out and winning the election, traces between 8 and 16 represent two nodes timing out before a leader is elected.

Problem Definition

Let us assume that our servers need to share certain configuration. The servers can be restarted, thus becoming temporarily unavailable, and the network can have arbitrarily long delays. The stored data should be available whenever at least one server is alive. Some data must be consistent but other data suffice in possible stale version.

Case 1: Consistency of Logs

Some servers store logs data as a single continuous string. It is required that the logging information is extracted from this string, started from the last extraction point up until the last log. Each log must be accepted for exactly once. We can thus generalise the problem to implementing the following function:


string getLastLog( Server s, string rawLogs )

  • the string rawLogs is a sequence of log information, i.e. <(p1t1, ..), .. , (pntn, ..)>
  • the function returns a suffix lastLogs of rawLogs such that lastLogs=<(pi, ti, ..), .. , (pn, tn, ..)>, where ti-1<ts and ti >= ts, and ts is the time-stamp of the last retrieval
  • the function mus be thread-safe in accepting each log at most once regardless the number of concurrent calls of the function

Case 2: Quorum-less access

It is required that the configuration data that need not be consistent can be retrieved and stored even when the quorum cannot be established.

  • The get operation returns a value that was either last written by another node (while under quorum), or by this node (possibly without quorum).
  • The set operation remember the the new value and stores is when the quorum is reestablished.

Candidate: Etcd

The Etcd has limits in solving both the above cases:

  1. The CAS operation is implemented as a set operation which specifies the expected previous value that returns either a Boolean value, an exception, or an error. Yet when exception/error is returned we have no guarantees regarding the value in the data store, e.g. the request may have timed out, but afterwards was accepted by Raft and subsequently reflected in the data store.
  2. Etcd does allow get to operate without quorum but only if the leader is working (any or indeed all other servers may fail). The set operation is not allowed without quorum.

Requirements for Sufficient Solution

Three interface function are required and while the concrete implementation can vary we specify it exactly to avoid confusion.

type Response = { bool valid, bool timeout, string value }

Response get( string key, bool consistent, bool blocking )

  • consistent=true the returned value was stored on a majority of servers at the time when this request was accepted by the consensus protocol
  • consistent=false the returned value is the most recent value stored on the server that first received this request (other servers may have different value)
  • blocking=true the function returns only after the desired data is retrieved
  • blocking=false the function may timeout

void set( string key, string value, bool consistent, bool blocking )

  • consistent=true the new value is accepted by a majority of servers and until another set for the same key is requested, the value will be returned by any consistent get
  • consistent=false the new value is accepted by at least the server that first received this request, until another set is requested at this server, the value will be returned; furthermore once quorum is established the consistent set guarantees will be assumed

Response swap( string key, string new_value, string old_value, bool blocking )

  • consistent by default
  • blocking=true the function returns only after the quorum is established, the function returns true if the old_value was stored under key at the time of accepting this request on a majority AND the new_value will be returned by any consistent get until the value is rewritten; most importantly no other value can be returned by get between old_value and new_value. If old_value was not stored under key, the function does not modify the data store and returns false.
  • blocking=false same guarantees as above but the function may timeout without modifying the data store.

Any distributed data-store conforming to the above requirements would be sufficient to solve the above problem. Yet the cost of consistency is high and only a few data-stores offer adequate protection from partitioning. Etcd seems the closest candidate but its functionality needs to be extended.

While caring about security of our code is arguably important, it is not enough for building a secure product. Vulnerabilities might also arise from a 3rd party component. Handling of those vulnerable libraries is thus another essential aspect of building a secure product.

Database of vulnerabilities

A simple way for managing a large number of 3rd party libraries might be using a vulnerability database. There is well-known National Vulnerability Database that contains information about many vulnerabilities. This is also what OWASP Dependency Check uses. Let me introduce it.

NVD aims to contain all known vulnerabilities of publicly available software. An entry about a vulnerability has some CVE number. (CVE means “Common Vulnerabilities and Exposures”, so CVEs are not limited to vulnerabilities, but it is not so important for now.) You can look at some example of CVE. Various details might be available, but the level of details may depend on multiple factors. For example, a not-yet-publicly-disclosed vulnerability might have a CVE number, but its description will be obviously not very verbose.

CVE numbes are usually assigned to CPEs (“Common Platform Enumeration”). A CPE is an identifier of vulnerable software. For example, the mentioned CVE-2005-1234 contains information that it affects cpe:/a:phpbb_group:phpbb-auction:1.0m and cpe:/a:phpbb_group:phpbb-auction:1.2m.

How OWASP Dependency Check works?

In a nutshell, it scans a project (JARs, POM files, ZIP files, native libraries, .NET assemblies, …) and tries to assign a CPE to each entity. Note that there is much of heuristics involved. Obviously, there is also much what can go wrong.

When CPEs are assigned, it looks for vulnerabilities (CVEs) assigned to those CPEs. A scan is written to a result file. Multiple formats are supported. The two most prominent formats are HTML (for direct reading) and XML (for further processing). We use the XML output in order to process multiple reports of multiple projects and assign them to particular teams. (One team is usually responsible for multiple projects.)

Integration with projects

Maven

OWASP Dependency Check has a mature Maven plugin. It basically works out-of-box and you can adjust probably any parameter supported by OWASP Dependency Check. We didn’t have to modify the project, as it can be run by mvn org.owasp:dependency-check-maven:check and the configuration can be adjusted by passing -Dproperty=value parameters.

Gradle

There is also some plugin for Gradle. Unfortunately, it is not as mature as the Maven plugin. We had to modify it in order to make it working in our environment. I’d like to merge those changes with the original project. Even after the modifications, it is still not ideal. The most prominent issue is that it includes test dependencies. Excluding them in the Groovy plugin is not as simple as with Maven plugin, because Gradle can have many configurations and each of those configurations might have different dependencies. I have no simple clue how to distinguish important configurations from others. However, this was not a really painful issue, as these dependencies are considerably less common and usually don’t have any known vulnerability in NVD, as they usually aren’t touched by untrusted input.

How we scan our projects?

Running those scans manually on multiple sub projects and evaluating all the reports manually would take too much time. So, we have developed a tool for automating some of the manual work.

First, scans are run automatically every day. There is no need to run it manually on every single subproject. Results of these scans are stored in XML format. Some adaptations specific to our environment are applied. For example, scans are able to run without any connection to the Internet. (Maven plugin can be configured without any modification, while Gradle plugin required to be modified for that.) Of course, we have to download all the vulnerability database separately, outside of the scan process.

Second, our tool downloads the results from our Bamboo build server and processes them. There are some sanity checks that should warn us if something breaks. For example, we check freshness of the vulnerability database.

Third, issues related to particular subprojects are automatically assigned to corresponding teams. Issues can be prioritized by severity and number of occurrences.

While ODC is a great tool, there are some issues connected to using it. I’ll discuss some of them there.

Complexity of library identifiers

How many ways of identifying a library do you know? Maybe you will suggest using groupId and artifactId and version (i.e. GAV) by which you can locate the library on a Maven repository. (Or you can pick some similar identifier used on another platform than Java.) This is where we usually start. However, there are multiple other identifiers:

  • GAV identifier: as mentioned above.
  • file hash: ODC uses SHA1 hash for lookups in Maven database. Note that there might be multiple GAV identifiers for one SHA1 hash, so there is 1:1 relation. Moreover, when we consider snapshot, we theoretically get m:n relation.
  • CPE identifier: This one is very important for ODC, as ODC can’t match vulnerabilities without that. Unfortunately, there is no exact algorithm for computation of CPE from GAV or vice versa. Moreover, multiple GAVs might be assigned to one CPE. For example, Apache Tomcat consists of many libraries, but all of them have just one CPE per version. Unfortunately, the ODC heuristic matching algorithm might also assign multiple CPEs to one GAV in some cases.
  • GA identifier: This is just some simplification of GAV identifier, which misses the version number. There is nothing much special about this identifier for ODC, but we have to work with that, too.
  • intuitive sense: For example, when you mention “Hibernate”, you probably mean multiple GAs at the same time.

Note that this complexity is not introduced by ODC. The complexity is introduced by the ecosystem that ODC uses (e.g. CPEs) and by the ecosystem ODC supports (e.g. Maven artifacts).

Bundled libraries

While überJARs might be useful in some cases, they are making inspection of all the transitive dependencies harder. While ODC can detect some bundled libraries (e.g. by package names or by POM manifests if included), this is somehow suboptimal. Moreover, if ODC finds a bundled dependency, it might look like a false positive at first sight.

Libraries without a CPE

Some libraries don’t have a CPE. So, when ODC can’t assign any CPE, it might mean either there is no CPE for the library (and hopefully no publicly known vulnerability) or ODC has failed to assign the CPE. There is no easy way to know that is the case.

False positives

False positives are implied by the heuristics used mainly for detecting CPE identifier. We ca’t get rid of all of them, until a better mechanism (e.g. CPE identifiers in POMs) is developed and widely used. Especially the latter would be rather a long run.

False positives can be suppressed in ODC in two ways. One can suppress assignment of a specific CPE to a specific library (identified by SHA1 hash). If a more fine-grained control is needed, one can suppress assignment of a single vulnerability to a specific library (identified by SHA1 hash). Both of them make sense when handling false positives.

Handling a CPE collision

For example, there is NLog library for logging in .NET. ODC assigns it cpe:/a:nlog:nlog. This might look correct, but this CPE has been used for some Perl project and there is some vulnerability for this CPE that is not related to the logging library.

If one suppressed matching the CPE, one could get some false negatives in future, as vulnerabilities of the NLog library might be published under the same CPE, according to a discussion post by Jeremy Long (the author of the tool).

In this case, we should not suppress those CPEs. We should just suppress CVEs for them.

Obviously mismatched CPEs

CPE might be also mismatched obviously. While the case is similar to the previous one, the CPE name makes it obvious that it does not cover the library. In this case, we can suppress the CPE for the SHA1 hash.

Missing context

Vulnerability description usually contains a score. However, even if the score is 10 out of 10, it does not imply that the whole system is vulnerable. In general, there are multiple other aspects that might change the severity for a particular case:

  • The library might be isolated from untrusted input.
  • The untrusted input might be somehow limited.
  • The library might run in sandboxed environment.
  • Preconditions of the vulnerability are unrealistic for the context.

None of them can be detected by ODC and automatic evaluation of those aspects is very hard, if it is not impossible.

Conclusion

We have integrated OWASP Dependency Check with our environment. We have evaluated the quality of reports. The most fragile part is matching a library to a CPE identifier, which may lead to both false negatives (rarely seen) and false positives (sometimes seen). Despite those issues, we have found ODC to be useful enough and adaptable to our environment.

ARUR_logoY Soft Applied Research and University Relations (ARUR) is a program under which we ysofters:

  • Supervise and consult students working on their theses
  • Support student research positions at research laboratories
  • Offer student internships
  • Give lectures at universities
  • Organize and support student events

This post provides statistics (collected for the last six years) about some of the Y Soft ARUR activities mentioned above.

Theses supervision

During the last six years, 46 bachelor and master theses realized in collaboration with Y Soft were successfully defended. See Figure 1 for a number of theses defended per each year. In 2015, for example, 6 bachelor and 8 master theses were successfully defended. Most of all the theses came from Y Soft R&D in collaboration with the Faculty of Informatics, Masaryk University.

Figure 1

Figure 1. Number of defended theses

More than 75% of all theses defended during the last six years received “A” or “B” grade. See Figure 2 for grades statistics per each year. In 2015, for example, 14 theses were successfully defended and received eight “A” and six “B” grades (i.e., 100% of the theses defended in 2015 received “A” or “B” grade).

Figure 2

Figure 2. Grades of defended theses

Since 2012, five of the students received awards for their theses (see Figure 3).

Figure 3

Figure 3. Number of awards

Part-time positions at Y Soft

14 students were given part-time jobs within the scope of ARUR during the last six years. Some of these part-timers have been working in Y Soft on their bachelor or master thesis for more than a year. There were ten students from the Faculty of Informatics (Masaryk University), two from the Faculty of Electrical Engineering and Communication (Brno University of Technology), and two from the Faculty of Information Technology (Brno University of Technology).

Student positions at research laboratories

Every year, in cooperation with the Faculty of Informatics, Masaryk University, we participate in organizing a competition for talented students who are in the first or the second year of their bachelor studies. During the last six years, we have supported nine students who were winners of the competition. Some students were funded for more than one year. In 2015, six student positions at research laboratories have been funded.

Thanks to all already participating in our ARUR program and we are looking forward for new participants!

We are more than happy to discuss any suggestions and ideas for cooperation you might have.

Contact us: uni-rel@ysoft.com