Machine learning allows users to leverage the inherent information contained within large datasets. This is not the top level information that is the raw data itself, but rather the secondary information that can be extracted through the elucidation of hidden patterns.
There are two forms of machine learning; supervised and unsupervised.
Supervised learning requires a labelled dataset, where different classes of behaviour are identified beforehand. This data is then fed into the machine learning algorithms, allowing them to learn the differentiating factors between these classes. Once this is done, a model is produced which can now be used to classify brand new data, predicting what class it belongs to from the original set. This classification process is used for tasks such as handwriting recognition, where a system will learn what the different letters look like using huge labelled datasets that contain different writing styles. The system can then “read” handwritten documents it has never seen before, identifying letters with incredible accuracy. This is how many countries sort their post, using handwriting recognition systems to read postcodes and route the letters to the correct regions.
However, in many cases labelled datasets aren’t available. Industrial process plants contain a large amount of data, often in the form of time series. These time series encapsulate the state of the plant at a very fine granularity but for a continuous process it is rarely straightforward to interrogate this data. Using unsupervised machine learning techniques it is possible to automatically separate different modes of operation by analysing the time series. An example of this is shown in figure 1, where the top panels show process data, and the bottom panels show how unsupervised machine learning has segregated this data into different operational modes.
Processing in the Cloud
Machine learning is all about Big Data, and efficiently extracting value from this data. It is therefore natural to build machine learning solutions in the cloud, which provides seamless scaling of processing power to meet the demands of the data. Sabisu uses a variety of cloud technologies such as Amazon’s Elastic Map Reduce, Spark, and the Spark-ML libraries to ensure rapid processing of even the largest datasets. Data noise is always a concern in any time-series processing application. Sabisu has developed a set of custom aggregators in Python and Scala to ensure the best results from real-world data.
Simplifying Machine Learning
You don’t have to be an expert to use Sabisu’s machine learning solutions. On the contrary, the finer implementation details are abstracted from users to allow you to concentrate on the thing that really matters; the results!
Sabisu will handle all of the pre-processing and calibration required to make the most of this advanced technology, meaning all you have to do is select the data you want to analyse.
We’re always interested in hearing from you with any comments or suggestions, feel free to get in touch.