Opting for a distributed solution of a problem is mostly motivated by two goals: resilience to failures (via redundancy) and better response time. In this post we will focus on the former. More specifically, we will investigate the conditions necessary to maintain a consistent distributed key-value store.

CAP

From the theoretical side, any possible solution is limited by the so-called CAP theorem. Informally, the theorem states that in an asynchronous network no database can remain

  • Consistent and
  • Available when
  • Partitioning of the network is possible.

A realistic environment of a distributed system must assume that server failures are possible and thus that there is no reasonable fix bound on the time needed to exchange messages between two servers. And while some may find the formulation of the theorem ambiguous, its inevitable corollary is that a distributed database cannot guarantee consistency of replicas and availability of read/write operations at the same time.

Consensus

Leslie Lamport’s Paxos protocol demonstrates the feasibility of maintaining consistency whenever a majority of servers is functioning. The above theorem thus states that any distributed system using the protocol cannot be always available. But there are some highly-available solution apart of Paxos.

Etcd

Etcd is an always-consistent highly-available key-value data store. The consistency is maintained by a majority consensus (quorum). The quorum is established and maintained by the Raft algorithm, i.e. Raft maintains a consistent replicated log among servers, each of which executes the commands stored in the log on the key-value store. As any other consensus protocol, Raft requires that among n servers at most f=⌊(n-1)⁄2⌋ is faulty. If more than f servers become faulty Raft times out all write operations but the read operations may work if consistency is not required.

Maintaining consistency is expensive even if only the possibility of partitioning has to be appreciated. Rather than describing the algorithm itself it might be more illustrative to show performance results of running etcd in real application. The figure below depicts the transaction throughput in a 5-node cluster.

Etcd throughput in a 5-node cluster.

The script that measured the performance increased the load in three steps: first a 2500 transaction was sent every second (for 30 seconds), then 5000 for another 30 seconds and then 10000. The data in the plot then report how many transaction successfully finished each second.

Leader Election

Partitions of the cluster are handled by a coordinated replication, i.e. a leader is elected who is responsible for all communication (both within the cluster and with the outside world). In the case of such a partition that separates the leader from most of the nodes, a new leader is elected. The leader election starts whenever the communication between the leader and a follower is lost for more than a randomly chosen time period called election timeout. After the timeout, the follower begins the election by broadcasting the request for vote. Everyone votes for the server that send the request first and the new leader is the server with a majority of votes. Together with the transaction throughput, the performance of leader election are the most important metrics to measure for a consensus protocol.

Etcd leader election time.

The graphs above show the time to detect and replace a crashed leader. Each line represents 1,000 trials (except for 100 trials for “150–150 ms”) and corresponds to a particular choice of election timeouts; for example, “150–155 ms” means that election timeouts were chosen randomly and uniformly between 150 ms and 155 ms. The steps that appear on the graphs show when split votes occur (the cluster must wait for another election timeout before a leader can be elected).

Etcd leader election -- number of packets.

Finally, above is the distribution of the number of packets needed to elect a leader. The figure  shows the number of packets sent to elect a leader in the 150–200 ms case. The minimum number of packets is six, representing a single node dispatching requests to the other four nodes and receiving votes from the first two. The plot clusters around multiples of eight: traces under eight represent the first node timing out and winning the election, traces between 8 and 16 represent two nodes timing out before a leader is elected.

Problem Definition

Let us assume that our servers need to share certain configuration. The servers can be restarted, thus becoming temporarily unavailable, and the network can have arbitrarily long delays. The stored data should be available whenever at least one server is alive. Some data must be consistent but other data suffice in possible stale version.

Case 1: Consistency of Logs

Some servers store logs data as a single continuous string. It is required that the logging information is extracted from this string, started from the last extraction point up until the last log. Each log must be accepted for exactly once. We can thus generalise the problem to implementing the following function:


string getLastLog( Server s, string rawLogs )

  • the string rawLogs is a sequence of log information, i.e. <(p1t1, ..), .. , (pntn, ..)>
  • the function returns a suffix lastLogs of rawLogs such that lastLogs=<(pi, ti, ..), .. , (pn, tn, ..)>, where ti-1<ts and ti >= ts, and ts is the time-stamp of the last retrieval
  • the function mus be thread-safe in accepting each log at most once regardless the number of concurrent calls of the function

Case 2: Quorum-less access

It is required that the configuration data that need not be consistent can be retrieved and stored even when the quorum cannot be established.

  • The get operation returns a value that was either last written by another node (while under quorum), or by this node (possibly without quorum).
  • The set operation remember the the new value and stores is when the quorum is reestablished.

Candidate: Etcd

The Etcd has limits in solving both the above cases:

  1. The CAS operation is implemented as a set operation which specifies the expected previous value that returns either a Boolean value, an exception, or an error. Yet when exception/error is returned we have no guarantees regarding the value in the data store, e.g. the request may have timed out, but afterwards was accepted by Raft and subsequently reflected in the data store.
  2. Etcd does allow get to operate without quorum but only if the leader is working (any or indeed all other servers may fail). The set operation is not allowed without quorum.

Requirements for Sufficient Solution

Three interface function are required and while the concrete implementation can vary we specify it exactly to avoid confusion.

type Response = { bool valid, bool timeout, string value }

Response get( string key, bool consistent, bool blocking )

  • consistent=true the returned value was stored on a majority of servers at the time when this request was accepted by the consensus protocol
  • consistent=false the returned value is the most recent value stored on the server that first received this request (other servers may have different value)
  • blocking=true the function returns only after the desired data is retrieved
  • blocking=false the function may timeout

void set( string key, string value, bool consistent, bool blocking )

  • consistent=true the new value is accepted by a majority of servers and until another set for the same key is requested, the value will be returned by any consistent get
  • consistent=false the new value is accepted by at least the server that first received this request, until another set is requested at this server, the value will be returned; furthermore once quorum is established the consistent set guarantees will be assumed

Response swap( string key, string new_value, string old_value, bool blocking )

  • consistent by default
  • blocking=true the function returns only after the quorum is established, the function returns true if the old_value was stored under key at the time of accepting this request on a majority AND the new_value will be returned by any consistent get until the value is rewritten; most importantly no other value can be returned by get between old_value and new_value. If old_value was not stored under key, the function does not modify the data store and returns false.
  • blocking=false same guarantees as above but the function may timeout without modifying the data store.

Any distributed data-store conforming to the above requirements would be sufficient to solve the above problem. Yet the cost of consistency is high and only a few data-stores offer adequate protection from partitioning. Etcd seems the closest candidate but its functionality needs to be extended.

While caring about security of our code is arguably important, it is not enough for building a secure product. Vulnerabilities might also arise from a 3rd party component. Handling of those vulnerable libraries is thus another essential aspect of building a secure product.

Database of vulnerabilities

A simple way for managing a large number of 3rd party libraries might be using a vulnerability database. There is well-known National Vulnerability Database that contains information about many vulnerabilities. This is also what OWASP Dependency Check uses. Let me introduce it.

NVD aims to contain all known vulnerabilities of publicly available software. An entry about a vulnerability has some CVE number. (CVE means “Common Vulnerabilities and Exposures”, so CVEs are not limited to vulnerabilities, but it is not so important for now.) You can look at some example of CVE. Various details might be available, but the level of details may depend on multiple factors. For example, a not-yet-publicly-disclosed vulnerability might have a CVE number, but its description will be obviously not very verbose.

CVE numbes are usually assigned to CPEs (“Common Platform Enumeration”). A CPE is an identifier of vulnerable software. For example, the mentioned CVE-2005-1234 contains information that it affects cpe:/a:phpbb_group:phpbb-auction:1.0m and cpe:/a:phpbb_group:phpbb-auction:1.2m.

How OWASP Dependency Check works?

In a nutshell, it scans a project (JARs, POM files, ZIP files, native libraries, .NET assemblies, …) and tries to assign a CPE to each entity. Note that there is much of heuristics involved. Obviously, there is also much what can go wrong.

When CPEs are assigned, it looks for vulnerabilities (CVEs) assigned to those CPEs. A scan is written to a result file. Multiple formats are supported. The two most prominent formats are HTML (for direct reading) and XML (for further processing). We use the XML output in order to process multiple reports of multiple projects and assign them to particular teams. (One team is usually responsible for multiple projects.)

Integration with projects

Maven

OWASP Dependency Check has a mature Maven plugin. It basically works out-of-box and you can adjust probably any parameter supported by OWASP Dependency Check. We didn’t have to modify the project, as it can be run by mvn org.owasp:dependency-check-maven:check and the configuration can be adjusted by passing -Dproperty=value parameters.

Gradle

There is also some plugin for Gradle. Unfortunately, it is not as mature as the Maven plugin. We had to modify it in order to make it working in our environment. I’d like to merge those changes with the original project. Even after the modifications, it is still not ideal. The most prominent issue is that it includes test dependencies. Excluding them in the Groovy plugin is not as simple as with Maven plugin, because Gradle can have many configurations and each of those configurations might have different dependencies. I have no simple clue how to distinguish important configurations from others. However, this was not a really painful issue, as these dependencies are considerably less common and usually don’t have any known vulnerability in NVD, as they usually aren’t touched by untrusted input.

How we scan our projects?

Running those scans manually on multiple sub projects and evaluating all the reports manually would take too much time. So, we have developed a tool for automating some of the manual work.

First, scans are run automatically every day. There is no need to run it manually on every single subproject. Results of these scans are stored in XML format. Some adaptations specific to our environment are applied. For example, scans are able to run without any connection to the Internet. (Maven plugin can be configured without any modification, while Gradle plugin required to be modified for that.) Of course, we have to download all the vulnerability database separately, outside of the scan process.

Second, our tool downloads the results from our Bamboo build server and processes them. There are some sanity checks that should warn us if something breaks. For example, we check freshness of the vulnerability database.

Third, issues related to particular subprojects are automatically assigned to corresponding teams. Issues can be prioritized by severity and number of occurrences.

While ODC is a great tool, there are some issues connected to using it. I’ll discuss some of them there.

Complexity of library identifiers

How many ways of identifying a library do you know? Maybe you will suggest using groupId and artifactId and version (i.e. GAV) by which you can locate the library on a Maven repository. (Or you can pick some similar identifier used on another platform than Java.) This is where we usually start. However, there are multiple other identifiers:

  • GAV identifier: as mentioned above.
  • file hash: ODC uses SHA1 hash for lookups in Maven database. Note that there might be multiple GAV identifiers for one SHA1 hash, so there is 1:1 relation. Moreover, when we consider snapshot, we theoretically get m:n relation.
  • CPE identifier: This one is very important for ODC, as ODC can’t match vulnerabilities without that. Unfortunately, there is no exact algorithm for computation of CPE from GAV or vice versa. Moreover, multiple GAVs might be assigned to one CPE. For example, Apache Tomcat consists of many libraries, but all of them have just one CPE per version. Unfortunately, the ODC heuristic matching algorithm might also assign multiple CPEs to one GAV in some cases.
  • GA identifier: This is just some simplification of GAV identifier, which misses the version number. There is nothing much special about this identifier for ODC, but we have to work with that, too.
  • intuitive sense: For example, when you mention “Hibernate”, you probably mean multiple GAs at the same time.

Note that this complexity is not introduced by ODC. The complexity is introduced by the ecosystem that ODC uses (e.g. CPEs) and by the ecosystem ODC supports (e.g. Maven artifacts).

Bundled libraries

While überJARs might be useful in some cases, they are making inspection of all the transitive dependencies harder. While ODC can detect some bundled libraries (e.g. by package names or by POM manifests if included), this is somehow suboptimal. Moreover, if ODC finds a bundled dependency, it might look like a false positive at first sight.

Libraries without a CPE

Some libraries don’t have a CPE. So, when ODC can’t assign any CPE, it might mean either there is no CPE for the library (and hopefully no publicly known vulnerability) or ODC has failed to assign the CPE. There is no easy way to know that is the case.

False positives

False positives are implied by the heuristics used mainly for detecting CPE identifier. We ca’t get rid of all of them, until a better mechanism (e.g. CPE identifiers in POMs) is developed and widely used. Especially the latter would be rather a long run.

False positives can be suppressed in ODC in two ways. One can suppress assignment of a specific CPE to a specific library (identified by SHA1 hash). If a more fine-grained control is needed, one can suppress assignment of a single vulnerability to a specific library (identified by SHA1 hash). Both of them make sense when handling false positives.

Handling a CPE collision

For example, there is NLog library for logging in .NET. ODC assigns it cpe:/a:nlog:nlog. This might look correct, but this CPE has been used for some Perl project and there is some vulnerability for this CPE that is not related to the logging library.

If one suppressed matching the CPE, one could get some false negatives in future, as vulnerabilities of the NLog library might be published under the same CPE, according to a discussion post by Jeremy Long (the author of the tool).

In this case, we should not suppress those CPEs. We should just suppress CVEs for them.

Obviously mismatched CPEs

CPE might be also mismatched obviously. While the case is similar to the previous one, the CPE name makes it obvious that it does not cover the library. In this case, we can suppress the CPE for the SHA1 hash.

Missing context

Vulnerability description usually contains a score. However, even if the score is 10 out of 10, it does not imply that the whole system is vulnerable. In general, there are multiple other aspects that might change the severity for a particular case:

  • The library might be isolated from untrusted input.
  • The untrusted input might be somehow limited.
  • The library might run in sandboxed environment.
  • Preconditions of the vulnerability are unrealistic for the context.

None of them can be detected by ODC and automatic evaluation of those aspects is very hard, if it is not impossible.

Conclusion

We have integrated OWASP Dependency Check with our environment. We have evaluated the quality of reports. The most fragile part is matching a library to a CPE identifier, which may lead to both false negatives (rarely seen) and false positives (sometimes seen). Despite those issues, we have found ODC to be useful enough and adaptable to our environment.

ARUR_logoY Soft Applied Research and University Relations (ARUR) is a program under which we ysofters:

  • Supervise and consult students working on their theses
  • Support student research positions at research laboratories
  • Offer student internships
  • Give lectures at universities
  • Organize and support student events

This post provides statistics (collected for the last six years) about some of the Y Soft ARUR activities mentioned above.

Theses supervision

During the last six years, 46 bachelor and master theses realized in collaboration with Y Soft were successfully defended. See Figure 1 for a number of theses defended per each year. In 2015, for example, 6 bachelor and 8 master theses were successfully defended. Most of all the theses came from Y Soft R&D in collaboration with the Faculty of Informatics, Masaryk University.

Figure 1

Figure 1. Number of defended theses

More than 75% of all theses defended during the last six years received “A” or “B” grade. See Figure 2 for grades statistics per each year. In 2015, for example, 14 theses were successfully defended and received eight “A” and six “B” grades (i.e., 100% of the theses defended in 2015 received “A” or “B” grade).

Figure 2

Figure 2. Grades of defended theses

Since 2012, five of the students received awards for their theses (see Figure 3).

Figure 3

Figure 3. Number of awards

Part-time positions at Y Soft

14 students were given part-time jobs within the scope of ARUR during the last six years. Some of these part-timers have been working in Y Soft on their bachelor or master thesis for more than a year. There were ten students from the Faculty of Informatics (Masaryk University), two from the Faculty of Electrical Engineering and Communication (Brno University of Technology), and two from the Faculty of Information Technology (Brno University of Technology).

Student positions at research laboratories

Every year, in cooperation with the Faculty of Informatics, Masaryk University, we participate in organizing a competition for talented students who are in the first or the second year of their bachelor studies. During the last six years, we have supported nine students who were winners of the competition. Some students were funded for more than one year. In 2015, six student positions at research laboratories have been funded.

Thanks to all already participating in our ARUR program and we are looking forward for new participants!

We are more than happy to discuss any suggestions and ideas for cooperation you might have.

Contact us: uni-rel@ysoft.com

For those, who has attended Robot Framework workshop on Test Crunch conference you can find more details about environment setup, source codes and books below.

Installing Robot Framework on your computer

  1. Download and install Python 2.7 (32/64 bit based on your OS) from https://www.python.org/downloads/
  2. Add Python location into Path environment variable (e.g. c:\Python27\;c:\Python27\Scripts\)
  3. Download and install wxPython 2.8 with unicode support (32/64 bit based on Python version) from http://sourceforge.net/projects/wxpython/files/wxPython/2.8.12.1/
  4. Install pip (package manager) by  opening command line and running command python get-pip.py
  5. Open command line and run following commands to install Robot framework and additional libraries
    pip install robotframework
    pip install robotframework-ride
    pip install robotframework-selenium2library
  6. Start RIDE by running ride.py in command line

Workshop source code

Books and other study materials

Recording from workshop will be available soon.

Stay tuned and enjoy Robot Framework.

Members of our Robot team have participated in the worldwide competition of robots on the 11 April to 12 April in 2015, which was held in Vienna, Austria. There were a lot of competition categories including Humanoid sumo, where we participated. Over 600 robots were registered over all categories and 16 robots in the Humanoid sumo.

Robot specifications

Each humanoid robot has to meet certain specifications, e.g. maximum dimensions and weight. The rules also require having a head, two legs, two arms and a name. We named our robot YSoft Ragnarök, which is a great foretold battle from Norse mythology. The limit for the weight is 3000 g, which was quite problematic for us. At the beginning of the competition we had to reduce robot’s weight to 2997 g by removing some insignificant parts. Our strategy is a great stability which many other robots lack, but it comes with demand for heavy parts especially at the bottom of the robot. Heavy body also reduces mobility and speed so we had to develop better solution in order to find the opposing robot reliably, then move directly to it and wreck it. For this purpose we used ultrasonic sensors with maximum range of 2 meters, high precision and low power consumption.

robot clanekArena rules

Tournament started with qualifications and continued with single match elimination. Each match startswith two robots in opposing corners facing each other. The main goal is to push the other robot out of the arena or to knock it down. If any robot is pushed out of the arena, it can be placed within the arena again, however it must be placed face down. If the robot can autonomously stand up, the match continues. Team gains 3 points for pushing the other robot out of the arena. If any robot falls in the arena, the opposing team gains 1 point. Two points are awarded for a robot that knocks the opponent to the ground. Match ends if any robot is knocked out (and cannot stand up), it does not move for a period of time or the time for the match runs out (the maximum time is 3 minutes). The competitor with highest score wins.

Our performance

We have beaten every robot in the qualification group and successfully advanced into the semifinals without suffering a loss. However, this had already happened in the past when we lost the next two matches to finish in the 4th place. This time we tried not to repeat such outcome. The first semifinal match against Mexican robot Speedy Gonzales was quite even as the opponent avoided contact with us so we only managed to knock it down once. The second match versus another Mexican robot Atom was more one-sided because it could not get up after knockout (this match can be viewed here).

Once we got into the finals, we have faced our old rival from Poland, robot DUE. At the end we have beaten them and won the first place (video). Polish robots DUE and UNO took both 2nd and 3rd place.

 

In QA we use robotic arm to autonomously operate a multifunctional device (MFD) according to a given test that is repetitive, time consuming or not performable by a human.

How does the robot know where is the screen located? How do the 2D screen coordinates transform to the robot’s system?

calib1 clanekRobot moves the end effector (stylus) in the 3D Cartesian coordinate system with the origin [0,0,0] at the center of the first servomotor. Position of the stylus is then transformed to angles of all servomotors by inverse kinematics. It is also possible to calculate position of stylus given the angles of the servomotors (forward kinematics). However the screen of the MFD is a 2D plane in the 3D space with unknown origin, dimensions and rotations. Robot needs to know where the screen is located relatively to its origin in order to correctly tap any button on the screen.

Dimensions of the screen need to be measured by hand in millimeters. We use 2D coordinates with origin at the bottom left corner to define position on the screen. In 3D space position and rotation of any plane is uniquely determined by three non-collinear points. If these three points are known, transformation matrix can be found. This matrix multiplied by position on the plane is equal to corresponding position in 3D space.
Previously we used ‘basic’ calibration where the robot is navigated through the bottom left corner [0, 0] (origin of the screen), top left corner [0, height] and top right corner [width, height]. At each corner, stylus’ position (in 3D space) is saved and transformation matrix is calculated. This method of calibration requires a lot of precision because even a slight deviation from the corner leads to a similar deviation in every robot’s tap so robot might not accurately tap the desired button. There is no feedback from the MFD, but sometimes there is no other way to perform the calibration.

calib2 clanek

With the new semi-automatic calibration we created our custom version of Terminal server (component which handles communication with MFD) that detects any tap on the MFD screen and sends its coordinates (X,Y in pixels) into the Robot application. Screen resolution is also required so Terminal Server sends that on demand. With the knowledge of screen dimensions and screen resolution robot is able to calculate position of a tap (in pixels) to position on the screen (in millimeters) and save the end effector’s position. The semi-automatic calibration procedure is almost the same as the basic one but robot can be navigated to any point within a marked rectangle, not just the specific point at the corner. This nullifies the need for precision. However, in this case a problem occurred in form of inaccurate values of Z axis. For this purpose we have developed an automatic recalibration. This recalibration takes data gained from semi-automatic calibration and automatically repeats the procedure of semi-automatic calibration with the knowledge of existing corners of the screen. It goes through those three corners same as before, however it starts higher above each point and slowly descends to accurately measure the Z coordinate. After recalibration all data from semi-automatic are forgotten and replaced with the values from automatic calibration. This procedure eliminates any error made by an engineer during calibration and makes the robot´s calibration nearly perfect.

Most systems today need to handle the user authentication. That means, the password entered during user registration must be stored in the system for later comparison.

It is obvious that the passwords must not be stored in plain-text form. In that case, if an attacker succeeded in getting access to the database, where these passwords are stored (e.g. using SQL Injection), he would obtain the whole list of user names with their corresponding passwords. Then it is very simple for him to impersonate a valid user.

Hashing

However, to check, if the password entered by the user is correct, we do not need the original password. It is enough to have a suitable information, which uniquely identifies it and can be easily computed from each password entering the system.

Such information is the password hash. Hash algorithm is a one-way function, generating a fixed-length string from the inputs (in this case from the given password) with no possibility to derive these inputs back from the computed string. Another property of a cryptographic hash function is that change of one input bit leads to change of many bits in the resulting hash. When the hash function is collision-free, we can assume that the identical hashes imply the identical inputs, from which these hashes are computed.

So instead of the password itself, only its hash will be stored in the system. Every time a user tries to login to the system, hash of the password entered is computed and compared to the stored one.

Slow hashing

However, cryptographic hash functions such as MD5 or SHA are not appropriate. The purpose of these functions is calculation of digest of large amount of data to ensure its integrity. This digest needs to be computed in as short time as possible, and thus these hash functions are designed to be fast. This property is, however, not desirable for password hashing.

As an example take the MD5 function. One 2.13GHz core is able to compute cca 6 million MD5 hashes per second using Cain & Abel tool. Trying every single possible 8 character long lowercase alphanumeric password then takes approximately 130 hours. And that is only one core. Modern computers use more of them, for example with six such cores a password can be cracked in less than a day. Furthermore, we can definitely assume that an attacker has much better equipment.

In order to prevent an attacker from trying millions of hashes per second, we need to use a slow cryptographic hash function for password hashing. Several hash functions were specifically designed for this purpose. These functions include: PBKDF2, bcrypt, scrypt.

Work factor parameters

These hash functions are not only slow, they also come up with work factor parameters defining how expensive the hash computation will be. Although the scrypt function is the youngest one (designed in 2009), it has an advantage over the older ones – it not only defines the CPU cost, but also the memory requirements. That is why scrypt is recommended function for password storage and this article talks mainly about it.

Scrypt uses following work factor parameters:

  • N – number of iterations, related to both memory and CPU cost
  • r – size of the RAM block needed, related to memory cost
  • p – parallelization, defines maximum number of threads, related to CPU cost

These parameters allow to set the memory needed and time it takes to compute one hash. The approximate memory usage for a single hash generation can be computed from the parameters using the following formula:

memory  =  N  ·  2  ·  r  ·  64

The time, on the other hand, is platform-dependent. The graph below shows dependency of time needed for single hash computation on the work factor parameters N and r. The parallelization parameter is set to 1 in all cases. The values in the graph were measured using CryptSharp, the C# implementation of scrypt function, on Windows Server 2012 with four 2.2GHz cores.

csharp_Server12_scrypt

It is needed to specify the computation time as a compromise between the usability and security provided. For example, if we have a system with only one login at a time and high security is needed, we set the parameters to make computation take cca one second. However, in case of many parallel logins this time needs to be set to only few milliseconds.

We can take the above example of password hash cracking. Using scrypt function (CryptSharp implementation) with parameters N=210, r=4 and p=1, hashing of one password takes approximately 10ms, i.e. this 2.2GHz core is able to compute 100 hashes per second. Then computation of all possible 8 character long lowercase alphanumeric passwords takes 895 years.

Attacker goals

Imagine an attacker, who obtained the list of user names and corresponding password hashes. There are now three goals he can have:

  • Crack a password of one specific user (e.g. admin)
  • Crack a password of any user
  • Crack passwords of a longer list of users

Attacks

In the first option the attacker has a password hash and wants to find the corresponding password it was computed from. He can use brute force or dictionary attack, i.e. try many possible inputs to the hashing function and compare the results with the obtained password hash.
An effective method for trying so many hashes is usage of lookup tables. The general idea is to pre-compute hashes of possible passwords and store them in a lookup table data structure (or Rainbow tables for lower memory requirements). Comparison of these pre-computed values with given hash is much faster than hash computation.

The second option is simpler. The only thing needed is to compute hashes of possible inputs and compare each result with all password hashes in the obtained list. Sooner or later the attacker will hit some match.

For cracking a longer list of hashes the attacker does not need to crack one password at a time, he will instead compare each computed hash with all hashes from the list. This way cracking of a hashes list takes approximately the same time as cracking only one specific password.

Salt

The above attacks work because each password is hashed the same way, the same password always results in the same hash. The simplest way of preventing against this is salting. That means, a random string (salt) is generated for each password and used together with it to create a hash.
It is needed to ensure uniqueness of the salts, thus they really need to be randomly generated. Any random number generator can be used, however, cryptographically secure RNGs, such as RNGCryptoServiceProvider in C# or SecureRandom in Java, are recommended.

The salt is a non-secret value, it needs to be stored together with the password hash to ensure its availability to the hash function. Thus, if someone gets access to the hashes, he automatically gets also all the salts. However, the salt power is not in its secrecy, but in randomness.

With different salt, same passwords result in different hashes. Pre-computed hash attack is infeasible due to a large additional memory requirements – an attacker needs to store pre-computed hashes for each possible salt.

Cracking password of any user is reduced to cracking password of a specific one, since the salt for each user password is different.

Also cracking of a larger list of hashes is more complicated with different salt for each password, the attacker has no other choice than cracking one password at a time.

Pepper

In order to increase security even more, we can use another randomly generated string – pepper. In comparison to the salt, pepper needs to be kept secret as it is used as an HMAC key. HMAC is a one-way algorithm based on hash function generating fixed-length string from the input message and a secret key, which in our case is generated pepper.

Since pepper is a secret key, it needs to be generated using a cryptographically secure random number generator, such as RNGCryptoServiceProvider.

public static byte[] GeneratePepper()
{
    byte[] pepper = new byte[32];
    RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
    rng.GetBytes(pepper);
    return pepper;
}

When generated, the pepper must be stored separately in a configuration file with restricted access.

Although an attacker had enough resources to be able to crack the hash function, he would still need this secret value for obtaining the user password. And with pepper randomly generated for each system instance, if one instance is compromised, other remain secure.

Overall scheme

The overall hashing of the password with both salt and pepper looks as follows:

scrypt ( Base64 ( HMAC ( ‘SHA256’, password, pepper ) ), salt, workFactors )

And the C# implementation of this scheme using CryptSharp library:

public byte[] HashPassword(String password, byte[] pepper, byte[] salt)
{
    if (salt == null)
    {
        Console.WriteLine("Password hash not created - salt is null.");
        return null;
    }

    String encodedHmac = HmacBase64(password, pepper);
    return CryptSharp.Utility.SCrypt.ComputeDerivedKey(Encoding.UTF8.
         GetBytes(encodedHmac), salt, n, r, p, null, HASH_LENGTH);
}

private static string HmacBase64(string password, byte[] pepper)
{
    if (pepper == null)
    {
        Console.WriteLine("Password hash not created - pepper is null.");
        return null;
    }
    HMACSHA256 hmac = new HMACSHA256(pepper);
    hmac.Initialize();
    byte[] buffer = Encoding.UTF8.GetBytes(password);
    byte[] rawHmac = hmac.ComputeHash(buffer);
    return System.Convert.ToBase64String(rawHmac);
}

Conclusion

User passwords must never be stored as plain text, always compute its hash using a slow cryptographic hash function. To each password generate random salt and use this value together with the password for hash computation. For higher level of security generate random secret pepper for each system instance.

Of course, security of the user password depends on the password itself. An attacker could still try frequently used passwords such as “123456”, however, with secure storage we can protect him from trying too many of them and from obtaining the strong ones.

As many of you may know, Android supports native printing since Android 4.4. This means, that there is new api handling communication between application from which user prints and application that later sends the job to the printer.

android print

So how it works? First lets have a look at applications from which user prints.

Main responsibility of these applications is to prepare output for print in pdf format. This includes for example paging or updating image to landscape or portrait mode.

Application from which user prints uses then system PrintManager service.

PrintManager printManager = (PrintManager) getSystemService(Context.PRINT_SERVICE);

Document output is prepared with PrintDocumentAdapter which is passed as second parameter of PrintManagers print function.

printManager.print(jobName, new PrintDocumentAdapter(...), printAttributes);

Now we are heading to the second part of job printing, where we have to discover printers and send them our job. This is responsibility of PrintService.

Printer discovery

We can either add printers manually. Set their ip address and port, or we can look for network printers in the local network.

Let’s have a look on how to find printers which support Zeroconf discovery in local network. Implementations of Zeroconf are for example Avahi daemon or Bonjour.

When printer discovery in Android is started, onCreatePrinterDiscoverySession() method in PrintService is called. Here we have to create our PrinterDiscoverySession.

Responsibilities of PrinterDiscoverySession are pretty straightforward.

  • find and add discovered printers
  • remove previously added printers that disappeared
  • update already added printers

In this example we will use NsdManager. NsdManager is native Android solution for finding zeroconf services. On the other hand its functionality is very limited, but for purpose of this demo it’s satisfactory. There exist other and better solutions, for example JmDNS. Current limitation of NsdManager is not being able to load txt records of mDNS messages.

In order to use NsdManager we have to implement two interfaces. DiscoveryLisener (handles discovery callbacks) and ResolveListener (handles resolving callbacks). I will call them OurDiscoveryListener and OurResolveListener.

First in onStartPrinterDiscovery() method we create new instance of DiscoveryListener and start the actual discovery.

discoveryListener = new OurDiscoveryListener(nsdManager);
nsdManager.discoverServices(SERVICE_TYPE, NsdManager.PROTOCOL_DNS_SD, discoveryListener);

This is pretty self-explanatory. We specify discovery service type, which is either “_ipps_.tcp.” or “_ipp_.tcp”, depending on the fact if we want to encrypt ipp messages or don’t.

And when service is found, then OurDiscoveryListener will handle what happens in individual states of discovery.

In the following code we can see that for example when service is found we try to resolve it with NsdManager.

public void onServiceFound(NsdServiceInfo service) {
    nsdManager.resolveService(service, new OurResolveListener());
}

Resolving service means, that we try to get more information about the service. This includes host ip and port. OurPrinterResolveListener then handles states what should happen when resolving succeeds or fails. When resolving succeeds, we process gained data and save it for future use.

Last part of printer discovery is to find more details about selected printer and checking whether is this printer still available. This is handled in onStartPrinterStateTracking() method.

Discovering details about printer can be done for examaple with ipp operation Get-Printer-Attributes and according to received data, set the printer information. Second function is to keep tracking of the printer state.

The following code sample just shows how to set few printer capabilities, which should be set according to the printer attributes. This doesn’t contain  tracking of printer state.

@Override
public void onStartPrinterStateTracking(PrinterId printerId) {
    // check for info we found when printer was discovered
    PrinterInfo printerInfo = findPrinterInfo(printerId);

    if (printerInfo != null) {
        PrinterCapabilitiesInfo capabilities = new PrinterCapabilitiesInfo.Builder(printerId)
                .setMinMargins(new PrintAttributes.Margins(200, 200, 200, 200))
                .addMediaSize(PrintAttributes.MediaSize.ISO_A4, true)
                .addMediaSize(PrintAttributes.MediaSize.ISO_A5, false)
                .addResolution(new PrintAttributes.Resolution("R1", "200x200", 200, 200), false)
                .addResolution(new PrintAttributes.Resolution("R2", "200x300", 200, 300), true)
                .setColorModes(PrintAttributes.COLOR_MODE_COLOR |
                        PrintAttributes.COLOR_MODE_MONOCHROME, PrintAttributes.COLOR_MODE_COLOR)
                .build();

        printerInfo = new PrinterInfo.Builder(printerInfo)
                .setCapabilities(capabilities).build();

        // We add printer info to system service
        List&lt;PrinterInfo&gt; printers = new ArrayList();
        printers.add(printerInfo);
        addPrinters(printers);
    }
}

When different printer is selected, then onStopPrinterStateTracking is called and onStartPrinterStateTracking again.

 

Printing:

Android itself doesn’t contain implementation of any printing protocol. Because of this I created small IPP parser. But that’s topic for another day.

Here I will only show example of handling queued print job.

In the following code we pick one job according to id from saved processed jobs and set his state to printing. Class PrintTask in the following example is just Android AsyncTask which in background creates IPP request and appends job data.

public void handleQueuedPrintJob(PrintJobId printJobId, PrinterId printerId) {
    final PrintJob printJob = mProcessedPrintJobs.get(printJobId);
    if (printJob == null) {
        return;
    }

    if (printJob.isQueued()) {
        printJob.start();
    }

    OurPrinter ourPrinter = ourDiscoverySession.getPrinter(printerId);

    if (ourPrinter != null) {
        AsyncTask &lt;Void, Void, Void&gt; printTask =
                new PrintTask(printJob, ourPrinter);
        printTask.execute();
    }
}

In case that we have decided to use ipps, we also have to set correct certificate. Next step is to create new HttpURLConnection. (or HttpsURLConnection for secure transfer).

The last thing we have to do is to write into the output stream our IPP message, send it and wait for response from server.

Android manifest file

We have to set necessary permissions in the Android manifest file, in order to be able to run the PrintService.
Add android.permission.BIND_PRINT_SERVICE when creating the service. Example:

...
&lt;service android:name=".OurPrintService" 
    android:permission="android.permission.BIND_PRINT_SERVICE"&gt;

    &lt;intent-filter&gt;
        &lt;action android:name="android.printservice.PrintService" /&gt;
    &lt;/intent-filter&gt;
&lt;/service&gt;
...

This allows system to bind to the PrintService. Otherwise the service wouldn’t be shown in the system.

Also

&lt;uses-permission android:name="android.permission.INTERNET"/&gt;
&lt;uses-permission android:name="android.permission.START_PRINT_SERVICE_CONFIG_ACTIVITY" /&gt;

permissions are needed for printer discovery and to access files in external storage.