
By Steve Lange, managing member, ProcessDev LLC
Increased availability of sensors and data provides new opportunities to improve quality and process reliability in converting processes. This three-part series covers the process of effectively leveraging data while avoiding pitfalls. Part 1: Preparing for a Converting Process Data Project covered the necessary actions to ensure the right data are gathered in the right ways for the right reasons. Part 2: Converting Process Data Wrangling described the steps of aggregating, cleaning and exploring data through visualization. In Part 3: Converting Process Data Analytics Implementation, analysis methods and tips for applying and preserving analysis insights are outlined.
Revisiting why you are analyzing the data
After the initial data project is planned and the first datasets are assembled, cleaned and visualized, before complex analysis and modeling is attempted, it is a good time to reassess the purpose of the project to direct what a successful outcome of the analysis would be and for whom. This step prevents spending effort on the wrong problems for the wrong customers.

Confirming the purpose of the analysis – such as for process understanding, predicting key performance indicators, prescribing actions by people and who they are specifically – will help narrow the analysis methods and how the results are reported (see Figure 1).
To Explain: If the converting process being developed is very new and early in its development, the purpose of a data analysis might first be to understand better how a prototype process works, or to understand how different materials or grades of materials interact with different equipment options; for example, which materials are likely to wrinkle or breakout at different line speed, and tension dynamics that affect registration or wound-roll quality. Another need might be to define relationships between process setpoint values and product quality… or how combinations of process setpoints and materials cause machine stops… or how a process step affects material properties, such as modulus or thickness.
Confirming the system of interest, such as the system’s scale or what process steps are included, will help focus the analysis. Is the analysis for a single unit operation, an entire production line or plant, or for a specific quality issue (such as a coating defect)?
To Predict: In manufacturing settings, an academic understanding of a converting process might be of less value than just making it work, by being able to predict what will happen in the future given the current conditions and avoiding particularly bad outcomes, e.g.: quality defects or major failures for components, such as motors and pumps. Knowing in advance which materials are likely to cause issues might enable changes in process setpoints to lessen the effect, or even to avoid running the material, if possible.
To Prescribe: If the ability to predict exists, a natural progression is to recommend actions to take to either prevent a failure or recover from one by changing out equipment, cleaning it or changing the setpoints of the equipment to make it more robust to noise factors, such as material variation or environmental conditions. For a given quality defect, instructions can be given on what changes to make to stop generating the defect. These instructions might be executed automatically in a model-based control scheme [1].
For Whom: Probably the most important reassessment is understanding who will use the insights of the data analytics about to be performed. What problems are they encountering, and in what form and frequency do they expect help? Burying people in charts, graphs and tables on overly complex dashboards that do not address the real problems at hand will simply be rejected and will stall future projects. Frequently, the real problem needs to be defined before it can be solved, and the data you have chosen might or might not contribute to the problem definition.
Choosing a learning method

There are many methods for analyzing data to gain insights to solve converting problems [2] (see Figure 2). Just as data had to be selected from a large potential set, so too must the analysis methods. The data itself can inform what methods are possible. It is common for multiple analysts to try different methods, as seen in on-line data-science contests, so it can be a team sport, if the resources are available.
Unsupervised Learning: If the type of data available is not understood as to which variables are inputs and which are outputs (i.e., they are “unlabeled” in data-science terms), unsupervised learning methods are used to find patterns and groupings in the data without prior knowledge as to their relationship.
Correlation analysis can be used to find the strength of relationships between variables – either positive or negative relationships – although these relationships do not necessarily prove causation of the effect of one variable on the other.
Cluster analysis can be used to reduce the dataset size in terms of the number of rows without significant information loss [3], as well as to find homogeneous groups in the data that previously were unknown. This can be helpful in reducing the size of the dataset to make modeling more efficient and computationally easier.
Principal component analysis is another technique for dimension reduction, mapping higher order data (e.g.: many process variables) onto smaller number of components that still are predictive of the outputs.
Supervised Learning: If the data are “labeled,” meaning the variables are known in terms of which are inputs and which are outputs, then “supervised learning” methods are used to build models that allow prediction of the outputs from the inputs. Before modeling, however, it is necessary to divide the dataset into different subsets, where one set is used to build the model known as the training set, one is used to validate and tune the model, and a third set that is used to test the final version of the model [4]. This prevents overfitting the data with a model that seems to work well on the original dataset, but upon further use with new data will perform poorly.
Classification:Analysis methods where the outputs are multi-level categories or have binary outputs such as “yes/no” or “good/bad” use classification methods, including decision trees (and ensemble models of decision trees [Random Forest]), logistic regression and neural networks. The goal is to be able to predict with high probability the output; for example, will a wrinkle form or not in a span, given key information about the web and the process/equipment. Or if a quality measure had several discrete levels, such as “good/fair/poor,” and the right set of process and material variables are measured that affect the level of this quality measure, a classification model could be used to predict the level of the next product based on the inputs.
Regression:Analysis methods are termed “regression” where there are measured outputs with continuous values that can be predicted from equations derived from numerical values of input variables. There are many regression methods, including general linear regression, random forest and neural networks. Several model types can be tried and compared to each other to find the best predictive model. Models can be combined in “ensemble” models that can perform better than individual models and be included in the model comparison. The same train/validate/test process is used to build a model that will better predict data not used in building the model.
The key to these models succeeding in converting processes is having done the prework to measure the process, material and product variables most likely to affect the outputs but, of course, these are not always known in advance. If the models do not explain enough of the variation of the outputs of interest, it is a signal that there are key variables not being included in the model and deeper investigation is needed to identify possible missing variables and to start measuring them. The models also may provide insights into measurements that are redundant or ineffective, allowing sensors to be removed or sampling to be reduced.
Time Series Analysis:For data that are collected at a recurring time interval, time series analysis may be used for a variety of purposes, including forecasting the future value of a single variable based on previous data (e.g.: autoregressive integrated moving average {ARIMA} models) or identifying the frequency of cyclical variations that stem from equipment, material or environmental variation (spectral analysis, e.g.: Fast Fourier Transform) [5, 6]. Cross-correlation of different time series enables the determination of time-based relationships between variables; for example, how strong the relationship is for different time lags between an upstream process and a downstream process.
For all the analysis methods mentioned above and many others, there are many commercial software tools available, and open-source libraries in programming languages like R and Python or a combination of commercial and open-source tools may be used.
Post-Analysis
We are far along in the data analytics race, but we have not yet “hit the tape.” Hard work to define the data, collect and prepare it and now to analyze it and extract insights has been done. The insights need to be checked for reasonableness against engineering models, with subject-matter experts and common sense. Various models may be scored to compare them and select the best one to meet the goal.
Finally, if the results are not communicated and implemented, all will be for naught. The most actionable insights need to be integrated into the manufacturing operation. Caretakers of the system used to generate and maintain the insights needed to be supported or else, like physical equipment, the system will fail to continue the positive results or be able to replicate them in the future.
The successes and failures of the recommendations and actions from the data analysis should be tracked to enable continuous improvement, with input from users and updates based on new data. Finally, the management partners who funded and supported the effort need to be kept informed of progress to ensure their goals were enabled and that they remain a sustaining advocate.
Summary
Being mindful of the purpose and the people for whom data from a converting operation are being collected, choosing the right analysis methods, and sharing the insights derived from the data with both users and project sponsors will maximize the utility of the total data system of sensing, collection, storage, analysis, and communication. Treating the data system like an ecosystem that requires stewardship will ensure its longevity beyond the life of a single project.
References
- Brosilow, Coleman and Joseph, Babu. “Techniques of Model-Based Control.” Prentice Hall, 2002.
- https://towardsdatascience.com/types-of-machine-learning-algorithms-you-should-know-953a08248861
- Yifu Li, Xinwei Deng, Shan Ba, William R. Myers, William A. Brenneman, Steve J. Lange, Ron Zink & Ran Jin (2021) “Cluster-based data filtering for manufacturing big data systems,” Journal of Quality Technology, DOI: 10.1080/00224065.2021.1889420
- https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7
- Parent, F., & Hamel, J. (2013, June). “Evaluating the impact of non-uniform paper properties on web lateral instability on printing presses.” Paper presented at 12th International Conference on Web Handling (IWEB), Stillwater, OK, https://hdl.handle.net/11244/322005
- Cole, K. A., Hopkins, R. W., Hotto, N. A., Scheible, J. J., Schneider, B. J., & White, T. J. (2019, June). “Real-time data analytics as applied to web-to-roller traction in manufacturing.” Paper presented at 15th International Conference on Web Handling (IWEB), Stillwater, OK, https://hdl.handle.net/11244/320265
Steve Lange is managing member of ProcessDev LLC, a manufacturing process-development consulting company. Steve is a retired Research Fellow from the Procter & Gamble Co., where he spent 35 years developing web-converting processes for consumer products and as an internal trainer of web-handling, modeling/simulation and data analytics. Steve can be reached at 513-886-4538, email: [email protected] or www.processdev.net.