Dr. Daniel Nikovski, Group Manager of the Data Analytics Team for the Mitsubishi Electric Research Labs (MERLE) in Cambridge, Massachusetts, explains his work including his collaboration with prominent researchers in Academia on Advanced Primitives on Time Series Analysis providing an explanation of the progression of developing this technology. He also explains the importance, power, and use of this technology.

Video Transcript

[0:00] Ted Hill, President and CEO of ICONICS

I'm pleased to introduce Daniel Nikovski from MERL. Daniel holds a PhD degree in robotics from Carnegie Mellon in Pittsburgh and joined MERL in 2002. He's the Group Manager of the Data Analytics Team, and their team does primary research in algorithms. Daniel has been published in over 150 papers and holds 44 US patents; it's a little intimidating to introduce him. Today, Daniel is going to be sharing some of the work that his team did, specifically a few years ago with the University of California on fault detection from time series data. And for anyone that works with ICONICS knows that we have stored an awful lot of time series data. So, I think it's pretty exciting some of the things that we're investigating that we can do together. We're looking at how we can add some of the capabilities that Daniel is going to talk about to our Hyper Historian in analytics products. Daniel, welcome to Connect 2021. It's great to have somebody here live. Thank you.

[1:18] Dr. Daniel Nikovski, Group Manager of the Data Analytics Team for Mitsubishi Electric Research Labs (MERL)

Yes, thank you. Thank you, Ted for the introduction and also for the invitation to speak here at this ICONICS event today. Good morning, everybody here in Foxborough and also all over the world. My name is Daniel Nikovski. Today, I will be talking about a class of data analytics technologies that is very applicable to the analysis of large data sets in the form of time series. They're called advanced primitives for time series analysis. So, at this point, you might be wondering how something can be both primitive and advanced. And I'd be very happy to explain what I mean by this. But before that, let me give you an overview of the kind of work that we do in our lab, Mitsubishi Electric Research Labs, or MERL. It is by and large, a computational research lab. And we're working on artificial intelligence, including machine learning, optimization control, signal processing, and physical modeling and simulation. It is we have about 62 researchers here in the Cambridge location. But we are part of the central R&D Division of our company, which is 2000 researchers strong, and we work very closely with our collaborators in the domestic labs in Japan. At the same time, we are one of the most academically oriented labs, and we're very well connected to various university labs where we take a lot of technology, and we develop it further for the benefit of Mitsubishi Electric and its subsidiary companies. We are publishing a lot, more than 150 publications per year, more than 100 patents filed every year. MERL is one of the top 10 producers of IP in the state of Massachusetts, even though we have over 62 people who do this. And this year, we're celebrating the 30th year since the establishment of the lab. So, it has been 30 years of great innovation. Specifically, my group, we work on technologies essentially for making better decisions from data. And all of this work revolves around data models in decisions. The two main classes of technologies are in predictive modeling, where we analyze the data in built models, and then decision and optimization where we take these models and based on them, we make optimal decisions about the performance of equipment. 

[4:00]

And time series analysis, the topic that I'll be talking about today is in the former category of predictive modeling. So today Ted talked about applications; 375,000 of them deployed by ICONICS. And in many of these cases, the way to build these applications is to store your data in a suitable data storage system. It could be a relational database management system; it would be better if it's a historian for time series data. In many cases, though, the actual application is built right on top of the data system and is the responsibility of the system developer to access this data in an efficient manner. And this could be very laborious and very costly. At the same time, it's quite possible that many methods are reinvented and reimplemented which results in inefficiencies. These 307,000 applications, they must have some commonalities which can be leveraged to do this application development process more efficient. So, a better way is to recognize these commonalities and try to reduce the applications to prototypical classes of tasks. And in many of these examples that Ted gave, some of these tasks came up: it's fault detection, fault diagnostics, and fault prognostics. They were in the Continental example; they were in the car manufacturer example. I think they were also in the previous application from the previous speaker. So, when we recognize these commonalities, we can map them to prototypical analytics tasks. So, for example, fault detection can be mapped to anomaly detection. Fault diagnostics is mapped to classifications; we want to know which one of several classes of faults we're dealing with, and fault prognostics is well prognostics. Now, these tasks classifications, anomaly detections, prognostics have been studied in the field of AI and machine learning as a general abstract problem. So, if we can come up with general purpose advanced solutions, we can improve tremendously the speed of analysis and the efficiency of these algorithms. At the same time, we are increasing the size of the datasets that we can possibly handle. And also, it allows for this domain independent technology optimization where we can use maybe advanced hardware in addition to advanced algorithms. Now, how exactly can we do this in the case of time series analysis? 

[7:00]

It turns out that a very good way to compute analytics efficiently is to use so-called time series analysis primitives. So, these primitives are lower level tasks and data structures and algorithms for their discovery that we can use to solve the actual analytics problems. And moreover, each of these primitives can be handled by dedicated very high performance algorithms. Many research labs in academia and also in industry are working hard to optimize these algorithms. And some examples of such primitives are shapelets, discords, and motifs. And the reason I'm calling them primitives is that they are below the level of actual application, so they are more basic, more primitive, and at the same time so much work has gone into optimizing their computation that the current technology is very advanced. So, some examples are shapelets. So, what are shapelets? Shapelets are patterns within time series that are maximally distinctive, and they can help distinguish between two classes of time series. Here on the left hand side graph, you see time series from two different devices: a dishwasher and an oven cooker. These are power consumption; you can see that the power consumption has some common elements that distinguishes one device from the other. They're not exactly the same. They're similar, but they're not exactly the same. So, the question is, how can we find these shapelets, these patterns in the fastest possible manner. So, a lot of work has gone into this. They were proposed originally by Professor Ayman Keogh from the University of California at Riverside, and his students more than a decade ago. And since then, they have been used on all kinds of classification, anomaly detection, and clustering tasks. So that's why it's so primitive it can be used for so many different things. And we have been able to extend it in collaboration with this lab; one of the students in this lab came to our lab as an intern to the case of prognostics where the problem is harder. So, it is a kind of a classification, but from weakly labelled data. So, this arises when we have multiple time series from a device that is run to failure. It is very common, you record the data and at some point, your device fails. Another device of the same model is run again, and it fails. So, we'd like to know what subsequence in this time series is predictive of this failure. And we know that they failed. But it's a weak classification label because we don't know what part of this time series was normal and what was not normal. If we knew what the normal part was, we apply shapelets the standard algorithm and discover what distinguishes one from the other. However, in this case, we do not know it. If we knew what the predictive pattern is, then we can do the separation of the time series also trivially by where that pattern is and where it is not. However, we have to solve the two problem at the same time. It's not a trivial problem, we were able to solve it again in collaboration with this university lab.

[10:46]

And we got awarded a patent for this technology in 2015. So, this is an example of how we take precompetitive research from some of the top labs in the country and in the world, and we develop it creating IP that we can then contribute to the subsidiaries of Mitsubishi Electric. Another class of primitives are motifs and discords. So, a discord of a time series is its most dissimilar pattern with respect to the same or maybe a different time series. Here on the left hand side graph, you see that there is a subsequence which looks strange, right? It is the one that's in red; it is not similar to anything else that's happening in this time series. Most humans would agree this is anomalous. So how do we make the computer recognize this as an anomaly. So, in this case, we would like to find very quickly what subsequence in this time series is most dissimilar. It's obviously very useful for anomaly detection. Discords are just the opposite thing. This is the pair of subsequences which are most similar. And here on the right hand side, you see that there are three patterns, three sub sequences. The ones that are colored, that are quite similar to each other; they're not the same. They're not the same. But they're similar to each other and distinct different from everything else. So how do we find these in the fastest possible way. If you do the usual brute force approach, then it's very, very computation inefficient, and you cannot really apply it to long time series. So, a lot of algorithms for the fast computation of motifs and discords have been proposed in the last maybe about 20 years. Well, until about five years ago, you needed the separate algorithm for each primitive, and even had dozens of algorithms for each one of these primitives. And that's a good way of doing things. But there might be a better way which was again, proposed by the same lab at the University of California in Riverside. So, an innovation was the discovery and the implementation of the so called matrix profile of a time series which can be applied for the computation of many primitives. You can think of it as the Swiss Army knife of the time series analysis domain, and it has been widely considered to be one of the top developments in the field of time series analysis in the last 10 years. It was introduced about five years ago. So, what is the matrix profile of a time series? It is a companion time series of approximately the same length that stores the distance from each subsequence; you decide on the length of the subsequence. In this case, it's M samples. So, the nearest distance from this subsequence to any such subsequence of this length anywhere in this time series, or it could be in a different time series if you want to do it across time series. And you also store exactly which of these other time series the nearest neighbor is. So, it can be used to discover motifs, discords, and also other primitives. And you can see why it's so easy if you have the matrix profile to do this. The discord of a time series is simply the point with the highest value in the matrix profile.

[14:49]

Because the meaning of this matrix profile is how similar a subsequence is to the closest neighbor in the time series. If this value is high, this is a discord; it's not similar to anything else. And then the motif is just the opposite. The motif is the lowest value of the matrix profile. So, once you have this magical time series, the Matrix profile, everything else falls out. It's very easy to compute these primitives. And it turns out that it can be computed very efficiently and also makes very efficient use of modern hardware, including GPUs. So, we're working together with the university on extension, such as localized matrix profile, pan matrix profile, which is when you don't know the length of the subsequence, you want to compute it for all possible subsequences. Also, there's another development here is that the proposal of this tool, the matrix profile has enabled the creation of completely new time series primitives and one of them is time series chains, which has interesting history. The concept was proposed by a researcher from the Information Technology Center in Japan; we saw presentation from the center earlier during this keynote, Dr. Imamura, who is now a professor at Osaka University in Tokyo. So, the concept here is very similar to motifs. However, there is a direction of change in these motifs. So, an example is given on the left hand side. This is data from a freezer. Every half hour there is a pattern, but you can see that this pattern evolves; doesn't stay in one place. And in the middle of graph, we have shown here the difference between a time series chain and a motif. A motif in this abstract visualization space, we're showing that a motif are the red points and the magenta points which they change, they fluctuate but stay in the same locality. The time series chain is when this pattern evolves in a particular direction, signifying change. So, if you can model this movement, then we can detect gradual change. And if we can establish at what point during this evolution a failure occurs, we have a prognostic system. It's very easy to build a prognostic system. So, before the matrix profile was discovered, it was computationally impossible to find these time series chains efficiently. But after this discovery, turns out there is a very efficient algorithm to do it. So, here's how it works. At the right hand side graph, you can see a sequence of values. So, suppose the length of the sequence is just one, the time series chain is embedded there. This is the sequence 1,2,3,4, and 5, and there are some spurious values in between. That's what we want to discover. How can we do it? It turns out that if we compute the left matrix profile, so that is find the nearest neighbor on the left; in the right matrix profile, find the nearest profile nearest neighbor on the right, then an element of this time series belongs to a time series chain if it is the left nearest neighbor of its right nearest neighbor. It's as simple as that. So, once we have the matrix profile, we can discover it very quickly. So, we discovered this algorithm again with a student from this lab, and this got us the best student paper award at the International Conference on Data Mining in 2017.

[18:57]

So, with these examples, I wanted to make the point that time series analysis primitives are very powerful set of tools. They are domain independent; they can be used in various analytical problems in applications that came up during discussions here today for fault detection, diagnostics, and prognostics, also abrupt and gradual change detection. So, the performance of the algorithm, the algorithm for discovery of these primitives have been optimized and have fantastically fast performance: speed ups of 1000 times and millions of times are possible using advanced computer science techniques and also advanced the modern hardware. It's still an area of active research in academia and industry. And our labs have been working together both with academic partners, as well as with our collaborators in the domestic labs in Japan for the benefit of customers of ICONICS and other parts of Mitsubishi Electric. So, with this I will conclude. Thank you very much for your attention.