While caring about security of our code is arguably important, it is not enough for building a secure product. Vulnerabilities might also arise from a 3rd party component. Handling of those vulnerable libraries is thus another essential aspect of building a secure product.
Database of vulnerabilities
A simple way for managing a large number of 3rd party libraries might be using a vulnerability database. There is well-known National Vulnerability Database that contains information about many vulnerabilities. This is also what OWASP Dependency Check uses. Let me introduce it.
NVD aims to contain all known vulnerabilities of publicly available software. An entry about a vulnerability has some CVE number. (CVE means “Common Vulnerabilities and Exposures”, so CVEs are not limited to vulnerabilities, but it is not so important for now.) You can look at some example of CVE. Various details might be available, but the level of details may depend on multiple factors. For example, a not-yet-publicly-disclosed vulnerability might have a CVE number, but its description will be obviously not very verbose.
CVE numbes are usually assigned to CPEs (“Common Platform Enumeration”). A CPE is an identifier of vulnerable software. For example, the mentioned CVE-2005-1234 contains information that it affects
How OWASP Dependency Check works?
In a nutshell, it scans a project (JARs, POM files, ZIP files, native libraries, .NET assemblies, …) and tries to assign a CPE to each entity. Note that there is much of heuristics involved. Obviously, there is also much what can go wrong.
When CPEs are assigned, it looks for vulnerabilities (CVEs) assigned to those CPEs. A scan is written to a result file. Multiple formats are supported. The two most prominent formats are HTML (for direct reading) and XML (for further processing). We use the XML output in order to process multiple reports of multiple projects and assign them to particular teams. (One team is usually responsible for multiple projects.)
Integration with projects
OWASP Dependency Check has a mature Maven plugin. It basically works out-of-box and you can adjust probably any parameter supported by OWASP Dependency Check. We didn’t have to modify the project, as it can be run by
mvn org.owasp:dependency-check-maven:check and the configuration can be adjusted by passing
There is also some plugin for Gradle. Unfortunately, it is not as mature as the Maven plugin. We had to modify it in order to make it working in our environment. I’d like to merge those changes with the original project. Even after the modifications, it is still not ideal. The most prominent issue is that it includes test dependencies. Excluding them in the Groovy plugin is not as simple as with Maven plugin, because Gradle can have many configurations and each of those configurations might have different dependencies. I have no simple clue how to distinguish important configurations from others. However, this was not a really painful issue, as these dependencies are considerably less common and usually don’t have any known vulnerability in NVD, as they usually aren’t touched by untrusted input.
How we scan our projects?
Running those scans manually on multiple sub projects and evaluating all the reports manually would take too much time. So, we have developed a tool for automating some of the manual work.
First, scans are run automatically every day. There is no need to run it manually on every single subproject. Results of these scans are stored in XML format. Some adaptations specific to our environment are applied. For example, scans are able to run without any connection to the Internet. (Maven plugin can be configured without any modification, while Gradle plugin required to be modified for that.) Of course, we have to download all the vulnerability database separately, outside of the scan process.
Second, our tool downloads the results from our Bamboo build server and processes them. There are some sanity checks that should warn us if something breaks. For example, we check freshness of the vulnerability database.
Third, issues related to particular subprojects are automatically assigned to corresponding teams. Issues can be prioritized by severity and number of occurrences.
While ODC is a great tool, there are some issues connected to using it. I’ll discuss some of them there.
Complexity of library identifiers
How many ways of identifying a library do you know? Maybe you will suggest using groupId and artifactId and version (i.e. GAV) by which you can locate the library on a Maven repository. (Or you can pick some similar identifier used on another platform than Java.) This is where we usually start. However, there are multiple other identifiers:
- GAV identifier: as mentioned above.
- file hash: ODC uses SHA1 hash for lookups in Maven database. Note that there might be multiple GAV identifiers for one SHA1 hash, so there is 1:1 relation. Moreover, when we consider snapshot, we theoretically get m:n relation.
- CPE identifier: This one is very important for ODC, as ODC can’t match vulnerabilities without that. Unfortunately, there is no exact algorithm for computation of CPE from GAV or vice versa. Moreover, multiple GAVs might be assigned to one CPE. For example, Apache Tomcat consists of many libraries, but all of them have just one CPE per version. Unfortunately, the ODC heuristic matching algorithm might also assign multiple CPEs to one GAV in some cases.
- GA identifier: This is just some simplification of GAV identifier, which misses the version number. There is nothing much special about this identifier for ODC, but we have to work with that, too.
- intuitive sense: For example, when you mention “Hibernate”, you probably mean multiple GAs at the same time.
Note that this complexity is not introduced by ODC. The complexity is introduced by the ecosystem that ODC uses (e.g. CPEs) and by the ecosystem ODC supports (e.g. Maven artifacts).
While überJARs might be useful in some cases, they are making inspection of all the transitive dependencies harder. While ODC can detect some bundled libraries (e.g. by package names or by POM manifests if included), this is somehow suboptimal. Moreover, if ODC finds a bundled dependency, it might look like a false positive at first sight.
Libraries without a CPE
Some libraries don’t have a CPE. So, when ODC can’t assign any CPE, it might mean either there is no CPE for the library (and hopefully no publicly known vulnerability) or ODC has failed to assign the CPE. There is no easy way to know that is the case.
False positives are implied by the heuristics used mainly for detecting CPE identifier. We ca’t get rid of all of them, until a better mechanism (e.g. CPE identifiers in POMs) is developed and widely used. Especially the latter would be rather a long run.
False positives can be suppressed in ODC in two ways. One can suppress assignment of a specific CPE to a specific library (identified by SHA1 hash). If a more fine-grained control is needed, one can suppress assignment of a single vulnerability to a specific library (identified by SHA1 hash). Both of them make sense when handling false positives.
Handling a CPE collision
For example, there is NLog library for logging in .NET. ODC assigns it
cpe:/a:nlog:nlog. This might look correct, but this CPE has been used for some Perl project and there is some vulnerability for this CPE that is not related to the logging library.
If one suppressed matching the CPE, one could get some false negatives in future, as vulnerabilities of the NLog library might be published under the same CPE, according to a discussion post by Jeremy Long (the author of the tool).
In this case, we should not suppress those CPEs. We should just suppress CVEs for them.
Obviously mismatched CPEs
CPE might be also mismatched obviously. While the case is similar to the previous one, the CPE name makes it obvious that it does not cover the library. In this case, we can suppress the CPE for the SHA1 hash.
Vulnerability description usually contains a score. However, even if the score is 10 out of 10, it does not imply that the whole system is vulnerable. In general, there are multiple other aspects that might change the severity for a particular case:
- The library might be isolated from untrusted input.
- The untrusted input might be somehow limited.
- The library might run in sandboxed environment.
- Preconditions of the vulnerability are unrealistic for the context.
None of them can be detected by ODC and automatic evaluation of those aspects is very hard, if it is not impossible.
We have integrated OWASP Dependency Check with our environment. We have evaluated the quality of reports. The most fragile part is matching a library to a CPE identifier, which may lead to both false negatives (rarely seen) and false positives (sometimes seen). Despite those issues, we have found ODC to be useful enough and adaptable to our environment.