The first article introduced an overview of goals and architecture for log processing, next two articles will cover inputs and outputs – how can be data (both logs and metrics) forwarder into monitoring and how can be data viewed after the processing.
There are two ways of forwarding data into monitoring platform, automatic and manual. The first one – automatic – is currently used in testing environments, where both logs and metrics are continuously collected and forwarded for processing. On the other hand, when YSoft SafeQ is deployed at customer’s site, such approach is seldom possible, because of security concerns and additional performance requirements for the monitoring server. Instead, only specific log files containing the problem are transferred from the customer and these have to be manually uploaded.

Automatic log forwarding

The simplest way to forward logs would be configuring logging framework to send logs directly over the network, however, such solution does not work with network outages which can be part of tests. Some logging frameworks can be configured with failover logging destination (if the network does not work, it will write logs into files), but these files would need another mechanism to automatically upload them.
Instead, logs are sent to local port into log forwarder, which has to be installed. We currently use Logstash, which (since version 5.0) has a persistent queue. If network works properly, logs are sent before they are flushed to disk, however, if there is a network outage, logs are written on a disk and there is no danger of overflowing RAM memory.
There are two other goals of Logstash. The first one is to unify log formats, logs generated by different logging frameworks have different formats. That could be done on monitoring servers, but this approach makes the processing simpler. The other goal of Logstash is enhancing logs by additional info, like hostname and name of deployment group.
Telegraf is deployed next to Logstash to collect various host metrics, which are again forwarder to monitoring servers. Note that Telegraf does not support persistent queue, so it sends metrics into Logstash, which provides necessary buffering.
Logstash and Telegraf are installed by Calf, our internal tool. Calf can be easily configured and installed as service, it is responsible for installing, configuring and running both Logstash and Telegraf. That makes usage of both tools much easier.

Log and metrics collection schema

Manual log uploading

The main goal of manual log uploading is clear, forward logs to monitoring servers, in the same format as the previous method. That requires log parsing and adding additional information.
The logs for automatic processing are generated directly in JSON format, on the other hand, logs are written into files as lines. These lines have to be parsed, GROK patterns are used for this purpose (basically named regexes). More can be found here, there is also a simple way for constructing GROK patterns.

log:
2017-05-19 10:03:19,368 DEBUG pool-9-thread-14| RemotePeerServer| [RemotePeer{name='1dc5e474-1abc-43fc-85c9-7e5e786919ef', state='ONLINE', session='ZeroMQSessio

grok pattern:
^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} *(?<thread>[\w-]*)\| *%{WORD:loggerName}\| *%{GREEDYDATA:message}

result:
{
  "timestamp": "2017-05-19 10:03:19,368",
  "level": "DEBUG",
  "thread": "pool-9-thread-14",
  "loggerName": "RemotePeerServer",
  "message": "[RemotePeer{name='1dc5e474-1abc-43fc-85c9-7e5e786919ef', state='ONLINE', session='ZeroMQSessio"
}

However, when manually uploading files, it is necessary to provide additional information about log file, specifically hostname, a name of deployment group and a component name, since each component of YSoft SafeQ has different log format. Logstash is again used for log uploading, but it is wrapped in a Python script for better usability.

cat spoc*.log | python import.py -c spoc -ip localhost -g default

This article is a first one of planned series focused on log processing, therefore YSoft SafeQ monitoring. It explains goals of log monitoring and intended use cases, as well as requirements for designed architecture. A brief description of high-level view of designed architecture follows, architecture itself will be properly described in one of following articles.

Goals of YSoft SafeQ log processing

Logs contain information about behaviour of SafeQ deployment, but each log carries only limited local info (such as single Exception). This can directly lead to understanding, what has happened, but sometimes more logs, even from different components, are needed.

The main goal of log processing is to collect all logs to a central location, unify them to single format to simplify their structure, compute additional information, such as duration of a print process, and finally index this info into a database to allow searches and visualizations into graphs.

Such graphs are a much faster way to obtain info about YSoft SafeQ behaviour than to go through individual log files, for example, spikes in printing time can be easily identified and possibly correlated with additional WARN or ERROR logs. That allows us to understand YSoft SafeQ faster and better.

All that is usually used for fixing bugs, but can be used for many other things, as for help with improving performance or monitoring user-related activities. On the other hand, there are several requirements which need to be satisfied by log processing.

Requirements of log processing architecture

At first, log processing shouldn’t significantly increase performance requirements on YSoft SafeQ servers, therefore logs should be forwarded by a network on a different server. That requires some network bandwidth, but on the other hand, logs don’t have to be saved on hard disk. Also, some network bandwidth can be saved by compressing, in exchange for some CPU time.

Even with logs being sent over the network, the reliability should stay the same as with writing logs directly into files. That is provided by a number of mechanisms:

  • when a connection is temporarily unavailable, generated logs should be buffered on a hard disk and automatically resent, when the connection is established
  • reliable protocol (such as TCP) is used
  • target server can properly work even under heavy load. Note, that multiple YSoft SafeQ servers can forward logs into one destination and there can be also big spikes when many logs are generated at once

Another requirement for log processing is low end-to-end latency. While it is no problem for reporting purposes to collect and aggregate data once a week, for debugging or bug fixing the logs should be processed in order of seconds/minutes.

Lastly, processing of logs should be scalable, so it can be easily enlarged for bigger YSoft SafeQ deployments.

(Note: these are not the only requirements, but the most important ones. More detailed info can be found in my bachelor thesis)

Architecture overview

The main idea is that generated logs are processed as a stream. As soon as a log is generated, it is forwarded from YSoft SafeQ server to a queue on a monitoring server. Queue serves as a buffer when there is a spike in a number of incoming logs or the rest of monitoring is being reconfigured. Kafka is used as persistent and fast performing queue.

Various tools and software can be used „above“ queue to unify logs, aggregate various information, compute a duration of processes (duration of a print job), correlate logs and metrics. Still, logs are processed as a stream, therefore they can’t be replayed or queried as in conventional databases, which makes aggregation more complex. For example, computation of elapsed time between two logs needs to consider logs out of order or missing logs, this will be explained in detail in one of following articles. On the other hand, such approach makes overall latency much lower and it is more efficient (despite more complex algorithms), since logs are processed just once.

At last, logs, metrics and computed data are stored and possibly visualized. Some data can be stored only into files (as logs), but metrics computed from them (as a duration of processes) can be indexed into a database to allow its visualization.

Moreover, this architecture can be simply extended, by another source of data, another tool for log processing or different storage method, online and without affecting the rest of processing pipelines.

The next article will cover data visualizations, with real examples from our testing environments.

 

In the previous post, I wrote about testing requirements, which led us to create Modular sensor platform. I told you about ASP.NET Core technology, which can simplify developing web API server application. You could try developing your own API server. Today I am going to introduce you USB to CAN converter and universal board for connecting sensors.

USB to CAN converter is the STM32F4 powered device for translating USB communication to CAN bus and vice versa. The converted is USB HID class based device. The HID class was chosen because there is guaranteed delay of packets, which is an important parameter in some cases of measuring a response time of testing devices. It is connected to the web server by USB micro and there are two RJ12 connectors on the board. RJ12 connectors are used for connecting sensors or actors (see image below).

Sensors and actors

Sensors and actors can be connected to USB converter via cable with the RJ12 connector through which it is powered and it can receive and sent messages from web API server. The board on CAN bus have to be addressable by a unique address. So each device has its own encoder. Using encoder on the board, you can set the address of the device (see image below – the black box with orange shaft). The encoder is 4 bit, so you can add up to 16 different devices.


The universal version of the board has 3 connectors (the blue ones). These connectors you can use for connecting different kinds of SPI or I2C sensors. The following sensors are in process of development:

  • RGB sensor – For sensing status of LED of a tested device
  • Paper sensor – Detection of paper in printer

These sensors will be introduced in upcoming parts of this article series. The advantage of the universal board is that it simplifies developing new sensors. You do not have to develop custom PCB (Printed Circuit Board), but you can use this board, connect sensor and write firmware specific to the sensor

The firmware is written in pure C using STM32 HAL library (Hardware abstraction layer). The initialization code was generated by STM32CubeMX, which is a graphical software configuration tool that allows configuring MCU by graphical wizards. The tool allows configuration of pin multiplexing, clock, and other peripherals configuration. Then you can generate C project for any common embedded IDE.

Both PCBs were designed in CircuitMaker by Altium, which is free also for commercial use. There is no license to worry about. The disadvantage is that you have only two private projects, others must be public (see circuitmaker.com).

Summary

The article describes the hardware part of the Modular sensor platform. The USB to CAN converter and the universal sensor board for developing custom devices compatible with the platform. The concrete developed sensor and actors will be in next parts of the Modular sensor platform series.  This post also describes tools and technologies that were used for developing converter and sensor board. If you are interested in developing embedded systems, you should definitely try STM32CubeMX and CircuitMaker.

When will the robotic revolution come and what will be its impact? What does Industry 4.0 mean and how will it change the world around us?

Come, listen and discuss this with me during a talk titled “Robotic revolution: How robots help during development and testing SW & HW” during the Žijeme IT event on the 16th of February 2018.

The event will take place at the Brno University of Technology, find out more at zijemeit.cz.

I will discuss how Y Soft’s Research and Development department uses robots for development and testing of SW and why we have started to use them. We implement tons of automated tests which are executed as continuous integrations. But how do we proceed when we need to test closed ecosystems which are hard to control remotely or needs to be replaced by simulators? Is robotic testing better than manual testing? What are the advantages and disadvantages of a robotic approach? And why we have ultimately decided not to stay with manual testing? Lastly, what about using simulators, can they provide trustworthy test results?

I will share with you how Y Soft started its robotic development and how this is connected with students. Are students changing the world?

Y Soft is using a robotic arm for testing multi-functional devices, but the robotic arm is not enough for our testing purpose. We need to interact with the device in different ways than just tapping on the touchscreen. A Screen of the tested device is already captured by a camera, therefore it is needed another feedback from a device and react to that feedback. Due to that, we developed Modular sensor platform, which can be easily plugged into a computer (Web API server) by USB. Via REST API protocol you can read information or command different kinds of sensors and actors. The following diagram illustrates how the platform is composed.

Web API server

As this diagram shows you can connect multiple sensors to the server via USB to CAN converter. When the web server starts it sends discovery packet. From the responses, the web knows what types and how many sensors are connected. After initialization, it starts listening to sensors commands from clients.

The web API server is written using ASP.NET Core framework. In the following link, you can find a tutorial, which shows you a simplicity of creating a RESTful application and from which components the server is composed.

The .NET Core is cross-platform so the web server can run on any device running Linux, macOS or Windows.

Try to create ASP.NET Core application based on tutorial above or you can just create a console application (see link). The Created application can be built for any supported OS, for ARM there is available only runtime, not SDK for developing an application (see SDK support, ARM Runtime).

Build for a device is as simple as run this command

dotnet publish -r <Runtime identifier>

in the directory of the project (after -r switch you can specify any supported platform, for more information use this link). You must also install prerequisites to the target device (see link), then you can copy this folder

<Project path>bin\<Configuration>\netcoreapp2.0\<Runtime identifier>\publish

to ARM device and run the application.

Summary

This article shows the composition of parts of the platform and how parts communicate with each other and that the platform is not limited only to one operating system. It works with Windows, Linux, macOS, even on ARM architecture. In next part of an article, I will tell you about the development of USB to CAN converter and sensors.

 

Chef is an automation platform designed to help the deployment and provisioning
process during software development and in production. Chef can, in cooperation with other deployment tools, transform the whole product environment into 
infrastructure as a code.

DSL

Chef provides a custom DSL that lets its users define the whole environment as a set of resources, together forming recipes, which can be further grouped into cookbooks. The DSL is based on Ruby, which adds a level of flexibility by offering Ruby’s language constructs to help the development. A basic example of a resource is a file with a specified content:
file 'C:\app\app.config' do
    content "server_port = #{port}"
end
Upon executing, Chef will make sure there exists a defined file and has the correct content. If the file with the same content already exists, Chef will finish without updating the resource, letting developers know the environment has already been in a desired stated before the Chef run.

Provisioning

The resources have build-in validations ensuring only the changes in configurations are applied in an existing environment. This lets users execute recipes repeatedly with only minor adjustments and Chef will make the necessary changes in your environment, leaving the correctly defined resources untouched.
This is especially handy in a scenario when an environment is already deployed and developers keep updating the recipes with new resources and managing configurations of deployed components. Here, with correctly defined validations, the recipes will be executed on target machines repeatedly, always updating the environment without modifying the parts of the environment which are already up to date.
This behavior can be illustrated on the following example:
my_tool = maven 'tool.exe' do
    artifact_id     'tool'
    group_id        'com.ysoft'
    version         '1.0.0'
    dest            'C:\utils'
    packaging       'exe'
end

execute 'run tool.exe' do
    command "#{my_tool.dest}\\#{my_tool.name} > #{my_tool.dest}\\tool.output"
    not_if {::File.exist?(#{my_tool.dest}\\tool.output)}
end
In this example, the goal is to download an exe file and run it exactly once (only the first run of this recipe should update the environment). The maven resource internally validates, whether the given artifact has already been downloaded (there would already exist a file C:\utils\tool.exe).
The problem is with the execute resource, as it has no way of checking whether it has been run before, thus potentially executing more times. Users can, however, define restrictions themselves, in this case, the not_if attribute. It will prevent the resource to execute again, as it checks the existence of the tool output from previous runs.

Architecture

To enable environment provisioning, Chef operates in a client-server architecture with a pull-based model.
Chef server represents the storage of everything necessary for deployment and provisioning. It stores cookbooks, templates, data bags, policies and metadata describing each registered node.
Chef client is installed on every machine managed by Chef server. It is responsible for contacting Chef server and checking whether there are new configurations to be applied (hence the pull-based model).
ChefDK workstation is the machine from which the whole Chef infrastructure is operated. Here, the cookbooks are developed and Chef server is managed.
In this example, we can differentiate between the Chef infrastructure (blue) and the managed environment (green). The process of deployment and provisioning is as follows:
  1. A developer creates/modifies a cookbook and uploads it to the Chef server.
  2. Chef client requests the server for changes in the recipes.
  3. If there are changes to be made, Chef server notifies the client.
  4. The client initiates a Chef run with the new recipes.
Note here that in a typical Chef environment, Chef client is set to request the server for changes periodically, to automate the process of configuration propagation.

Serverless deployment

When only the deployment of the environment is necessary (e.g for a simple installation of a product where no provisioning is required), in an offline deployment or while testing, much of the operational overhead of Chef can be mitigated by leaving out the server completely.
Chef client (with additional tools from ChefDK) can operate in a local mode. In such case, everything necessary for the deployment, including the recipes, is stored on the Chef client, which will act as a dummy server for the duration of the Chef run.
 
Here, you can see the architecture of a serverless deployment. The process is as follows:
  1. Chef client deploys a dummy server and points it to cookbooks stored on the same machine.
  2. Chef client from now on acts as the client in the example above and requests the server for changes in the recipes.
  3. Chef Server notifies the client of the changes and a new Chef run is initiated.

Conclusion

Chef is a promising tool that has a potential to help us improve not only the products we offer, but also make the process of development and testing easier.
In combination with infrastructure deployment tools (like Terraform) we are currently researching, automatization of product deployment and provisioning can allow our developers to focus on important tasks instead of dealing with the deployment of testing environments or manually updating configuration files across multiple machines.

This blog post will introduce tool Terraform, which we use for deploying testing environments in YSoft. We will cover following topics:

  • What is Terraform?
  • How does it work?
  • Example of use

What is Terraform?

Terraform is a command line tool for building and changing infrastructure in a safe and efficient matter. It can manage resources on most of the popular service providers. In essence, Terraform is simply a tool that takes configuration files as input and generates an execution plan describing what needs to be done to reach the desired state. Do you need to add another server to your cluster? Just add another module to your configuration. Or redeploy your production environment in a matter of minutes? Than Terraform is the right tool.

How does it work?

Infrastructure as a code

Configuration files that define infrastructure are written using high-level configuration syntax. This basically means that blueprint of your production or testing infrastructure can be versioned and treated as you would normally treat any other code. In addition, since we are talking about code, the configuration can be shared and re-used.

Execution plan

Before every Terraform execution, there is planning step, where Terraform generates an execution plan. The execution plan will show you what will happen when you run an execution (when you apply the plan). This way you avoid surprises when you manipulate with your infrastructure.

Terraform state file

How does Terraform determine the current state of infrastructure? The answer is the state file. State file keeps the information about all the resources that were created by execution of the given configuration file. To assure that the information in state file is fresh and up to date, Terraform queries our provider for any changes of our infrastructure (and modifies state file accordingly), before running any operation, meaning: for every plan and apply, Terraform will perform synchronization of a state file with a provider.

Sometimes this behavior can be problematic, for example querying large infrastructures can take a non-trivial amount of time. In this scenarios, we can turn off the synchronizing, which means the cached state will be treated as the record of truth.

Below you can see the picture of the whole execution process.

Example of use

In our example, we will be working with the azure provider. The example configuration files can be used only for the azure provider (hence the configuration files for different providers may and will differ). In our example, it is also expected, that we have set up terraform on our machine and appropriate endpoints to provider beforehand.

Step 1: Write configuration file

The presented configuration file has no expectations regarding previously created resources and it can be executed on its own, without the need to create any resources in advance.

The configuration file that we will write describes following resources:

Now we create an empty directory on the machine where we have installed terraform and within we create a file with name main.tf. The contents of the main.tf  file:

provider "azurerm" {
  subscription_id = "..."
  client_id       = "..."
  client_secret   = "..."
  tenant_id       = "..."
}

resource "azurerm_resource_group" "test" {
  name     = "test-rg"
  location = "West US 2"
}

resource "azurerm_virtual_network" "test" {
  name                = "test-vn"
  address_space       = ["10.0.0.0/16"]
  location            = "West US 2"
  resource_group_name = "${azurerm_resource_group.test.name}"
}

resource "azurerm_subnet" "test" {
  name                 = "test-sbn"
  resource_group_name  = "${azurerm_resource_group.test.name}"
  virtual_network_name = "${azurerm_virtual_network.test.name}"
  address_prefix       = "10.0.2.0/24"
}

resource "azurerm_network_interface" "test" {
  name                = "test-nic"
  location            = "West US 2"
  resource_group_name = "${azurerm_resource_group.test.name}"

  ip_configuration {
    name                          = "testconfiguration1"
    subnet_id                     = "${azurerm_subnet.test.id}"
    private_ip_address_allocation = "dynamic"
  }
}

resource "azurerm_virtual_machine" "test" {
  name                  = "test-vm"
  location              = "West US 2"
  resource_group_name   = "${azurerm_resource_group.test.name}"
  network_interface_ids = ["${azurerm_network_interface.test.id}"]
  vm_size               = "Standard_DS1_v2"

  delete_os_disk_on_termination = true

  storage_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04-LTS"
    version   = "latest"
  }

  storage_os_disk {
    name              = "myosdisk1"
    caching           = "ReadWrite"
    create_option     = "FromImage"
    managed_disk_type = "Standard_LRS"
  }

  os_profile {
    computer_name  = "hostname"
    admin_username = "testadmin"
    admin_password = "Password1234!"
  }

  os_profile_linux_config {
    disable_password_authentication = false
  }
}

 

Step 2: Planning the execution

Now we browse into a directory with our main.tf  file and we run command terraform init, which initializes various local settings and data that will be used by subsequent commands.

Secondly, we run command terraform plan, which will output the execution plan, describing which actions Terraform will take in order to change real infrastructure to match the configuration. The output format is similar to the diff format generated by tools such as Git. If terraform plan failed with an error, read the error message and fix the error that occurred. At this stage, it is likely to be a syntax error in the configuration.

Step 3: Applying the plan

If terraform plan ran successfully we are safe to execute terraform apply. Throughout the whole “apply” process terraform will inform us of progress. Once the terraform is done, our environment is ready and we can easily check by logging in to our virtual machine. Also, our directory now contains file terraform.tfstate file, which is state file that corresponds to our newly created infrastructure.

Conclusion

This example was only a very simple one to show how configuration file might look. Terraform offers more on top of that. Configurations can be packed into modules, self-contained packages that are managed as a group. This way we can create reusable parametrizable components and treat these pieces of infrastructure as a black box. Besides that, Terraform can also perform a provisioning of VM and much more.

At Y Soft, we use robots to test our solutions for verification and validation aspects, we are interested if the system works according to required specifications and what are the qualities of the system. To save time and money, it is possible to use a single robot to test multiple devices simultaneously. How is this done? It is very simple so let’s look at it.

When performing actions to operate given a device, the robot knows where the device is located due to a calibration. The calibration file contains transformation matrix that can transform location on the device to robot’s coordinate system. The file also contains information about the device that is compatible with the calibration. How the calibration is computed is covered in this article. There is also calibration of the camera that contains information about the region of interest, meaning where exactly is the device screen located in the view of a camera. All of the calibration files are stored on the hard drive.

Example of calibration files for two Terminal Professionals:

{  
   "ScreenXAngle":0.35877067027057225,
   "DeviceId":18,
   "DeviceModelId":18,
   "DeviceName":"Terminal_Professional 4, 10.0.5.182",
   "MatrixArray":[[0.728756457140,0.651809992529,-0.159500429741,75.4354297376],[-0.683749126568,0.734998176419,-0.10936964140,71.1249458777],[0.0422532822652,0.187897834122,0.981120733880,-34.923427696],[0.0,0.0,0.0,1.0]] ,
   "ScreenSize":{  
      "Width":153.0,
      "Height":91.0
   }
}
{  
   "ScreenXAngle":0.25580856461537194,
   "DeviceId":27,
   "DeviceModelId":18,
   "DeviceName":"Terminal_Professional 4, 10.0.5.112",
   "MatrixArray":[ [0.713158843830,-0.686471581194,0.220191724515,-176.983055],[0.699596463347,0.6783511825194,-0.15148794414,-71.7788394],[-0.05297850752,0.2635031531279,0.963621817536,-29.83848504],[0.0,0.0,0.0,1.0] ],
   "ScreenSize":{  
      "Width":153.0,
      "Height":91.0
   }
}


For the robot to operate on multiple devices, all of the devices must be within the robot’s operational range, which is quite limited, so this feature is currently is only used for smaller devices, like mobile phones and Terminal Professional. It is theoretically possible to use a single robot on more devices, but for practical purposes, there are usually only 2 devices. Also, all devices must be at relatively the same height, which limits testing on multifunctional devices that have various height and terminal placement. Space is also limited by the camera’s range, so multiple cameras might be required, but this is not a problem as camera calibration also contains the unique identifier of a camera. Therefore a robot can operate on multiple devices using multiple cameras or just a single camera if devices are very close to each other.

Before testing begins, a robot needs to have all calibrations of devices available on the hard drive and all action elements (buttons) need to be within its operational range. Test configuration contains variables such as DEVICE_ID and DEVICE2_ID which need to contain correct device IDs as stored the robot’s database. Which tests will run on the devices and the duration of the tests also need to be specified. Tests used for these devices are usually measurement and endurance tests, which run in iterations. There are multiple variants for these tests, for example, let’s say we wish to run tests for 24 hours on two devices and each device should have an equal fraction of this time. This means that the test will run for 12 hours on one device and 12 hours on the other, which is called consecutive testing. Another variant is simultaneous testing, which means that the robot will alternate between the devices after each iteration for a total time of 24 hours. The robot loads the calibration for another device after each iteration and continues with the test on that device.  This is sometimes very useful should one device become unresponsive, the test can continue on the second device for the remaining time. Results of each iteration of the test for each device are stored in a database along with other information about the test and can be viewed later.

Testing multiple devices with a single robot also makes it possible to test and compare different versions of an application or operating system (in this case on Terminal Professional) without ending the test, reinstalling of the device and running the test again. This saves a lot of time and makes the comparison more accurate.

Reaction time measurement is a process of acquiring timespan for how long it takes for the tested device to change its state after clicking on action element. The most common scenario is the measuring of time needed to load new screen after clicking on the button that invokes screen change. This measurement directly testifies about user experience with the tested system.

So how does our robotic system do it?

The algorithm of reaction time measurement is based on calculation of pixel-wise differences between two consecutive frames. That simply subtracts values of pixels of one image from another. Let’s have two frames labeled as fr1 and fr2, there are three main types of difference computation:

  • Changes in fr1 according to fr2: diff = fr1 – fr2
  • Changes in fr2 according to fr1: diff = fr2 – fr1
  • Changes in both directions: diff = | fr1 – fr2 |

The last mentioned computation of differences is called Absolute difference and is used in our algorithm for reaction time measurements. In general, the input frames are grayscale 8-bit images of the same size. Computation of differences for color images is possible however it would only deliver more errors in RGB color spectrum due to more variables being dependent on surrounding lighting conditions. The final computed difference is just a number indicating the amount of changes between two frames, therefore, it is perfect for detecting the change of screen in the sequence of images.

Enough of theory, let’s make it work!

First of all, we need two images indicating the change of screen. For this purpose, I choose the following two pictures.

Imagine those are the two consecutive frames in the sequence of all frames we talked about earlier.

Next step is to change them into grayscale and make sure they are of the same height and width.

After those necessary adjustments, we are ready to calculate the difference between them. As was told before, it is computed as an absolute subtraction of one image from another. In about every computer vision library there is a method implemented for this purpose, i.e. in opencv it is called AbsDiff. The following image shows the result of subtraction of two images above. As you can see there is a visible representation of both images. That is completely fine because every non zero pixel tells us how much given pixel is distinct from the same pixel in the second image. If result image would be black it would mean that difference is zero and images are identical, vice versa for white image.

Next step is to sum values of all pixels in the result image. Remember that each pixel value for grayscale 8-bit image is in a range from 0 to 255. The difference for this image:

diff = 68964041

This value itself is not very descriptive of the change between the two images, therefore normalization needs to be applied. The form of normalization we use is to transform that computed difference number into a percentual representation of change in the screen with a defined threshold. Threshold specifies what value of a pixel is high enough to be classified as changed so rather than computing sum of all pixels in result image we find how many of pixels are above the defined threshold. The normalized difference for this image:

diffnormed =96.714% (with threshold =10)

This result compared to the previous one very precisely tells us how much change happened between the two images. The algorithm to detect the amount of change between two images was just the first part of the whole time measurement process. In our robotic system, we have implemented two modes of reaction measurement, Forward and Backward reaction time evaluation.

 

Forward reaction time evaluation

Forward RTE is based on real-time-ish evaluation meaning that algorithm procedurally obtains data from an image source and process them as they arrive. The algorithm does not have the ambition to find desired screen immediately but it rather searches for screen changes, evaluates them and then compares them to desired one.

Forward RTE diagram shows the process flow of the algorithm. At the start, it sets the first frame as the reference image. Differences against this reference are then computed with incoming frames. If the computed difference is above the threshold then the frame is identified and the result is compared to desired screen. If this does not match, the frame is set as new reference and differences are then calculated against it. If that does match then timestamp of image acquirement is saved and the algorithm ends. In theory, every screen change during measuring is identified only once, however it strongly depends on the threshold value that user needs to set. Even if this algorithm tries to be real-time the identification algorithms take so much time that it is not possible yet.

 

 

Backward reaction time evaluation

Backward RTE works pretty much the other way. Rather than searching desired image from the start, it waits for all images to be acquired, identifies the last frame and sets it as a reference, after that looks for the first appearance of given reference in sequence.

Backward RTE diagram shows the process flow of the algorithm. First of all, it waits for all frames from the subsequence of frames. After all frames are acquired, the last frame is identified and if the last frame is the actual desired screen then the reference is set and the algorithm proceeds. If the last frame was not desired screen it would mean that desired screen did not load yet or some other error happened. For this case, algorithm records backup sequences to provide additional consecutive frames. If there is no desired screen in those sequences then the algorithm is aborted.

After reference is set the actual algorithm starts. It looks for the first frame which is very similar to the reference one using difference algorithms described earlier. Found image is identified and compared to desired. If identified and desired screens match then the time of acquirement is saved and the algorithm ends. However, if they do not match then the sequence is shortened to start at the index of falsely identified frame and then the algorithm searches further. The furthest index is the actual end of sequence because image at the end sequence was at the start of RTE identified as desired.

Summary

This article contains information about difference based measurement of reaction time. It guides you through computation of differences between two images. Also, this article describes our very own two reaction time evaluation modes which we use in practice.

 

Additional notes

To keep the description of algorithms as readable as possible few adjustments are missing. Preprocessing of the images is the essential part where elimination of noise has high impact on the stability of the whole algorithm. We have also implemented few optimization procedures that reduce the amount of data that needs to be processed, e.i. bisection.