Modelling processes with big data – limitations & solutions

Process modelling is used to identify behaviour on a plant and in some cases predict what’s going to happen next. The basic goal is to build an approximate “picture” of the process that is accurate enough to be useful for analysis. In general there are two ways to approach process modelling.

First Principle Modelling

In first principle modelling equations that describe the underlying physical processes are derived. These include chemical interactions, heat transfer for different parts of the process and other low-level phenomena.

This is invariably a complex process requiring expert knowledge but it can produce very accurate models. The drawbacks are the expense involved in creating the models and the lack of horizontal applicability; a first principle model for a cracker is of no use if you want to model an evaporator – you would have to start again, spending the time and incurring the costs.

Data Driven Analytics and Big Data

It is possible to generate a process model without in-depth knowledge of the process itself by using historical data to build a statistical picture of process, which can be used for identification and prediction in the same way.

To build an effective data driven model a large amount of historical data is required, ideally with labels that provide information about the state of the process. These labels could state that the process is starting up, shutting down, or that there was a fault with a specific piece of equipment. These data driven techniques can be used to build a model of any process or equipment, provided there is enough data available.

Big data is an umbrella term describing a set of data driven modelling techniques. Big data has become extremely well known through data mining of social networks, segmenting retail customers and driving retail/media recommendations.

Naturally, big data techniques have been applied to process data, e.g., at Sabisu we use them to identify similar features in time series.

However, these techniques provide limited insight with real industrial data, particularly in the area of fault identification and prediction.

The primary limiting factor is the lack of quality data. Although industrial sites produce a huge quantity of time series data there are too few examples of each failure to build a classifier capable of detecting and labeling specific failure modes. A customer rarely has more than a handful of examples of most failures and through noise or changing plant conditions these examples can be quite dissimilar.

Although great claims are made regarding big data and machine learning, “cure-all” solutions from vendors don’t translate into working real-world implementations.

Data Driven Modelling with Engineering Knowledge

Sabisu offers a third way; a combination of data driven modelling and engineering knowledge.

We start with statistical modelling methods such as our anomaly detection system, or a big data modelling techniques such as support vector machines or deep neural networks.

We then incorporate knowledge captured through discussions with plant engineers. This provides the extra information needed to tune data driven models – information often found only in first principle models, such as process cues to determine valid regions of operation or real world PF curves to allow realistic asset health predictions.

This methodology allows reliable, useful equipment models to be built rapidly using the available data. Engineering effort is minimal – just a simple chat to understand the problem and applicable techniques.

This is simply not possible with a purely data based approach and carries none of the large expenses associated with first principle modelling.


Contact Us

We’re always interested in hearing from you with any comments or suggestions, feel free to get in touch.

Start typing and press Enter to search