From question to solution, part 2: Accelerometer based movement detection August 27, 2021

Image for From question to solution, part 2: Accelerometer based movement detection

In part 1 of this blog post, we have talked about the scientific method and how to apply it to business problems. In this part, we will focus on applying this method to a real-world problem. The problem we tackled started with a question from the client:

We would like to know when trains are present at the platform. We would like to have this information to be accurate within a few seconds. Is that possible?

– The client –

 

To be able to provide an answer to this question, research was required as there does not exist any off-the-shelf solution that is capable of delivering this information. As discussed in part 1, this problem is approached using the scientific method. In short, five steps are applied iteratively to the problem until a solution is found. These steps are ‘Identify and analyse the problem’, ‘Form a hypothesis’, ‘Conduct experiments to test the hypothesis’, ‘Analyse Data’, ‘Conclude on the results’. In this part the process, the failures, the hurdles and finally the working solution will be presented.

Iteration 1: Where to even start?

Step 1: Identify and analyse the problem

The very first step will always be to translate the problem to a kind of research question that is measurable or quantifiable. In this case we came to the following:

“Is it possible to detect the arrival and departure of a train at a platform with time resolution of a few seconds using a smartphone?”

 

After this the focus will shift towards analysing the problem. What do we know? What do we need? What can we measure? What are our limitations?

For the current problem it is important to know what the properties are of arrival and departure at a platform. Arrival and departure are the states that need to be detected, thus we need a way to identify them. The arrival can be seen as the velocity is 0 m/s and the location is at the platform, while departure can be identified as the velocity is no longer 0 m/s and the location is no longer at the platform.

How can these properties be measured? The limitations of the problem require the use of a smartphone, and a smartphone alone. Luckily for us all modern-day smartphones have GPS, which can deliver coordinates for the position with a high enough accuracy. Furthermore, the velocity is also present in the received data with a very good accuracy, as it is measured using the Doppler shift of the carrier frequencies. 

In our approach, we still start from tapping from the same barrel of known truths and assumed knowledge. This will always form a starting point to tackle a problem, as it is the best way to make an educated proposition for a solution. When there aren’t any known truths, preliminary investigation will first be required. But in this case we had our earlier work to fall back on.

Step 2: Form a hypothesis

Then it is time to form a hypothesis, a statement that forms an educated proposition for the identified problem. This will often be something in the form of “When we do A, then B will happen.”. Based on the analysis in step 1 we proposed following hypothesis:

“When we collect the position and velocity with a GPS, we will be able to reliably determine the arrival and departure at a platform based on the position and velocity characteristics and with a good time resolution.”

This hypothesis is measurable, as we can setup an experiment where labeled datapoints can be used to validate if the labeled datapoints corresponds to the predicted datapoints. 

Step 3: Conduct experiments to test the hypothesis

In step 3 the focus lies on building a proof of concept, which will only focus on the required data and proposed solution. This is to minimize possible side effects or getting lost in the details which will only matter once the hypothesis is proven.

We built an application that stored GPS coordinates and velocity which were labeled based on driving or standing still. Due to earlier experience on this topic, we knew that GPS can suffer from bad accuracy, drift and jumps. Therefore, on top of storing the data from the GPS, filtering was applied on the location updates. Rather than ignoring some location updates, they are adjusted with filters that apply predictions based on accelerometer, physical limitations of trains, and the predicted route. This does result in a significant improvement of the location data. 

Step 4: Analyse Data

The data obtained in the experiment is then analysed to determine, if the hypothesis is proven, disproven, or should be adjusted. The conclusion from the data that was gathered was the following:

  • Very good results to detect arrival and departure at a platform
  • Stability of the system was very good
  • Velocity is not reliable, as it is just not always there depending on the origin of the location data
  • Time resolution is lacking, no predictable output of the location.

Step 5: Conclude on the results

The results from the first iteration were good. However, the proposed solution fails on one essential requirement of the original question, the time resolution. It is not possible to build a system that has a reliable output rate as one is driving in a gigantic Faraday cage that can block the signal.

Iteration 2: Back to the drawing board!

Step 1: Identify and analyse the problem

The first iteration did not deliver the expected results. However, this did not deter us to continue our search for a suitable solution. The earlier results showed that the time resolution was overlooked as an important factor in the solution. Furthermore, new skills were built up during the development of the first experiment. The accelerometer gives useful data at a reliable output rate. This can be combined with an orientation vector obtained from the built-in gyroscope and magnetometer, to give directional acceleration. 

We expect the acceleration pattern to be different for different modes of transportation and within each mode of transportation it will be different for each state. Standstill at a platform will have a walking acceleration pattern, while driving will have a superposition of the walking acceleration pattern and the train in motion. Based on successful work that we at Flow Pilots were doing at the time,we made the choice to focus on using a Naïve Bayes algorithm to do the pattern detection. The Naïve Bayes algorithm is a supervised learning algorithm that results in a probabilistic classifier. The goal is to give a prediction of standstill and driving based on differences in the measured properties.

Step 2: Form a hypothesis

Based on the initial research of the second iteration problem, the following hypothesis was made:

“Naïve Bayes classifier can be used to differentiate between walking acceleration pattern and its super position with a train in motion acceleration pattern.”

Step 3: Conduct experiments to test the hypothesis

Again, an application was built to gather data that was labelled with the correct state. Afterwards we started training the Naïve Base classifier based on different features of the gathered data. This included the maximum, the minimum, the mean, the norm, the standard deviation over a certain time window, the distance to the platform, and the minimal required velocity to reach the platform within a certain time window.

Step 4: Analyse Data

Several different classifiers were trained and were tested, and confusion matrices were generated for all of them. The confusion matrices give an indication how often the prediction matched reality, and how often it labelled the data incorrectly. For the below confusion matrix, you can see that driving is identified correctly 94% of the time. However, for standstill this is only 48% of the time, the other 52% of the time standstill is labelled as driving by the Naïve Bayes classifier.

Step 5: Conclude on the results

Based on all the confusion matrices that were generated, it is safe to say that the trained Naïve Bayes classifiers lack the accuracy required for our problem. The features that were used had a large overlap and the classifiers were not able to differentiate between standstill and driving. However, while looking at the data we noticed that the classifiers were not able to pinpoint the transient nature of arrival and departure, but there might be a way to determine that.

Iteration 3:  It is all about the change

Step 1: Identify and analyse the problem

From the two previous experiments the conclusions were that the velocity and position alone missed a good time resolution, and that probabilistic models tend to miss the transient nature of arrival and departure. With the lead on the transient nature of the problem it becomes even more clear that it is not the properties, such as velocity or position that are interesting, but it is the acceleration. Arrival can be defined as a deceleration followed by a period of no acceleration, while departure is acceleration preceded by a period of no acceleration.

Step 2: Form a hypothesis

The following hypothesis was provided:

“It is possible to use the transient nature of arrival and departure at a platform to detect the arrival and departure with a good time resolution and good accuracy.”

 

Step 3: Conduct experiments to test the hypothesis

As the focus of the hypothesis is the use of the accelerometer data to determine the arrival and departure, the data from the previous iteration was used. As we stumbled into a completely new field of research the goal of the experiment itself was building an algorithm for signal analysis which allows to classify arrival and departure.

In the following steps multiple graphs are added. The red background indicates that the train was at the platform, green indicates that it was in motion. 

Step a: Look at the data

What can we see? What is there? Do we notice any patterns that can help tell a computer program what to classify as what?

Below you see a graph of the Y-channel of the accelerometer. We decided to start with not projecting the acceleration to the real-world axis, so that it could be used as raw as possible and reduce the required computing power. 

There wasn’t much to tell our program with this data, it looks mostly random.

Step b: Perform data manipulations to improve the clarity

We know that the velocity while driving and at stand still can be considered as constant. While as we mentioned before Arrival can be defined as a deceleration followed by a period of no acceleration, while departure is acceleration preceded by a period of no acceleration. Thus, the velocity is no longer constant at arrival and departure at a platform.

If the acceleration is accumulated over the complete timeframe the below graph is obtained. It will show the same trend as the velocity due to the characteristics of the measurements. At arrival and departure one can see changes in the gradient of the curve. That is something that can be used to tell the program what is happening in the real world. However, it will still require more signal processing.

Step c: calculate the gradient

As taking the gradient from the cumulative curve would result in the original data due to the way the above curve was calculated. Therefore, extra data manipulation is done. The cumulative curve is smoothened out using a moving average over a period of 1 second. On a macroscopic level the curve looks exactly the same as above, but the noise of the data is smoothened out. Taking the gradient now results in the following graph (All three channels are added for completeness)

Step d: clean up the data

The current data is still very noisy and very difficult to let a program interpret. At the standstill regions (red region) it can already be seen that there is very little acceleration, while in the driving area of the graph large fluctuations are seen. This contradicts the believe of constant velocity when in motion, this is due to the vibration and acceleration and deceleration of the train during movement. This conclusion actually improves the probability of finding a working solution, due to the difference between the two states, and not only the transition between standstill and driving.

To clean up the data a high pass filter is used to remove the drift (green and blue curve not centered around 0) and electronic noise (constant frequency in the green curve) This gives following results:

Step e: Ignore sign of the acceleration

As we are not interested in the directionality of the acceleration, the knowledge about the sign and the channels can be removed by taking the absolute value and summing the 3 channels together. To reduce the effect of the noise in the data that is still present, a cut-off is applied. This means, when the acceleration is not significant enough, it can be ignored.

 

With this data, it becomes easy to write a very simple algorithm to determine the arrival and departure. Again, as mentioned before arrival can be defined as a deceleration followed by a period of no acceleration, while departure is acceleration preceded by a period of no acceleration. This is clearly visible in the graph. Applying some simple rules gives us the final result. In the below curve +1 stands for departure, -1 for arrival at the platform.

Step 4: Analyse Data

The analysis of the data was of course mainly done during the experiment. However, we tested our algorithm against data that was not used to build up the algorithm. There are clearly some false positives, but no false negatives were found which makes optimizing this data much easier. You cannot create data from nothing, but you can combine the measurements with aGPS data to filter out the false positives.

Step 5: Conclude on the results

The algorithm that was developed in this iteration of the scientific method, proved our last hypothesis.

“It is possible to use the transient nature of arrival and departure at a platform to detect the arrival and departure with a good time resolution and good accuracy.”

It can be used to solve the client’s problem. The results have a very good accuracy and truly match the time resolution requirement.

In this part of the blog post, we have described the complete process we went through to come up with a solution for a highly complex problem. By taking the necessary steps, iterating through them multiple times, taking new learned truths with you, and discarding assumed knowledge it was possible to come up with a reliable solution.

The combination of known mathematics with GPS data can truly detect arrival and departure at a platform using only a smartphone and with a time resolution of less than 1 second.