In his recent article Andrew Ng defines the term Data-Centric AI as the discipline of systematically engineering the data needed to successfully build an AI system. Instead of focusing on the code to improve the model, the focus is on improving the data. The reason is that foundational (pretrained models) are already really good - just look at GPT-3 for Natural Language Processing.

This is exciting news for manufacturers. In this industry machine learning expertise is very scarce. However, there’s a lot of domain knowledge available through process engineers and operators. Recognizing bad data and improving it is just a matter of having the right goal and tools.

Besides improving bad data, engineering the data could include getting more data of a particular type. In practice, this is just setting up an experiment or measuring the production process under particular conditions - something that process engineers are very familiar with.

I suspect that a shift from “abstract coding” to “getting good data” will empower manufacturers to get hands on with machine learning. I encourage anyone interested in what machine learning can do in his industry to read Andrew’s article.