The prevailing approach to data architecture of the last, say, decade was to centralize your data in one place, namely a data warehouse or (later) a data lake. This means physically copying the data from all the various source systems (ERP, MES, SCADA) into that place to make it a single source of truth.

More recently, a decentralized approach to data architecture has been popularized and coined under the term data mesh. The main idea is to keep your data inside the source systems (without copying it), but making it accessible through one common “translation” layer. If an analyst requests data, he or she requests this data from the translation layer which retrieves the data from the source system.

The former approach of centralizing the data certainly has it’s drawbacks: new data has to be copied to the data lake or warehouse first, which can take a long time. On the other hand, the decentralized approach is in my view not the panacea to all your data problems either. How do you deal with data that spans multiple geographical regions (e.g. two plants on different continents)? Is it secure to allow unrestricted read-access to a critical control system like SCADA?

With an onslaught of new technologies, choosing the right data architecture paradigm becomes harder than ever and requires thorough knowledge of both the client’s business as well as the technological landscape.