Overview of methods for bilinear modeling of batch data, including theory, methodologies and examples for experienced professionals in the biotech, pharmaceutical and petrochemical industries.
Process Analytical Technologies (PAT) have become increasingly important with the establishment of the quality-by-design paradigm in industrial processes, particularly where batch operation is standard. PAT plays an instrumental role in advancing process understanding and operational efficiency, while strengthening safety and reliability to ensure consistent on-spec product quality and minimize environmental impact. Empirical methods based on latent variables, often referred to as chemometric methods, are a main component of PAT. When used alongside Batch Multivariate Statistical Process Control (BMSPC), these methods enable the timely detection and diagnosis of process upsets. Furthermore, process understanding can be improved by applying Latent Variable Models (LVMs), such as Principal Component Analysis (PCA) and Partial Least Squares (PLS), particularly relevant in batch processes, where the inherent complexity of the model results in a high degree of uncertainty in the operation.
Data Science for Batch Processes: Statistical Learning, Monitoring and Understanding provides a comprehensive and rigorous examination of the bilinear modeling and monitoring of batch processes, comprising data alignment, pre-processing, three-way-to-two-way data transformation, data analysis and design of monitoring systems, including practical challenges and considerations when analyzing multi-dimensional batch data. Case studies and hands-on MATLAB examples using the MVBatch toolbox bridge theory and practice, illustrating how these methods can be applied.
Data Science for Batch Processes: Statistical Learning, Monitoring and Understanding is an essential guide for professionals and academics who seek both foundational knowledge and advanced techniques in batch processes and data analysis.
Foreword
Prologue: Challenges for the Third Millennium
About the Companion Website
1 Introduction
1.1 Industrial Batch Processes
1.2 Types of Sensors
1.3 Batch Process Modeling
1.3.1 Knowledge-based Models
1.3.2 Data-driven Models
1.3.3 Hybrid Models
1.4 Bilinear Modeling Cycle for Batch Process Monitoring
2 Data-driven Models Based on Latent Variables
2.1 Compression
2.2 Principal Component Analysis
2.2.1 Data Preprocessing
2.2.2 Selection of the Number of Principal Components
2.2.3 Parameters Stability
2.3 Regression
2.4 Regression Models based on Latent Variables
2.4.1 Principal Component Regression
2.4.2 Partial Least Squares
2.4.3 Data Preprocessing
2.4.4 Selection of the Number of Latent Variables
2.4.5 PLS Versus Other Regression Models
2.5 Multivariate Exploratory Data Analysis
2.6 Missing Data
2.6.1 Model Exploitation
2.6.2 Model Building
2.6.3 Final Reflections about Missing Data Imputation and MSPC
3 Batch Data Equalization
3.1 Introduction
3.2 Challenges in Batch Equalization
3.3 Equalization of Variables within a Batch
3.3.1 Discarding Intermediate Values
3.3.2 Estimating Missing Values
3.3.2.1 Comparison of Equalization Methods Based on Latent Variable Models
3.3.3 Rearranging Data
3.4 Multirate System
4 Batch Synchronization
4.1 Introduction
4.2 Synchronization Approaches
4.2.1 Indicator Variable
4.2.2 Time Linear Expanding/Compressing
4.2.2.1 Observation (OWU) Level and TLEC Synchronization Approach
4.2.3 Dynamic Time Warping
4.2.3.1 Warping Function Constraints
4.2.3.2 The DTW Algorithm
4.2.3.3 Optimization Problem
4.2.3.4 End-of-batch DTW Synchronization for Batch Process Monitoring
4.2.3.5 On the Use of Warping Information
4.2.4 Relaxed Greedy Time Warping
4.2.4.1 Enhanced Global Constraints
4.2.4.2 Cross-validation for the Estimation of the RGTW Parameters
4.2.5 Multisynchro
4.2.5.1 Asynchronism Detection
4.2.5.2 Specific Batch Synchronization
4.2.5.3 Iterative Batch Synchronization and Anomaly Detection Procedure
4.3 Effects of Synchronization on the Correlation Structure
5 Batch Data Preprocessing
5.1 Batch Preprocessing Operations
5.2 Mean Centering
5.3 Scaling
6 Three-way to Two-way Transformation
6.1 Introduction
6.2 Single-model Approach
6.2.1 Batch-wise Unfolding
6.2.2 Variable-wise Unfolding
6.2.3 Batch Dynamic Unfolding
6.3 K-models Approach
6.3.1 Hierarchical-model Approach
6.4 Multiphase Approach
6.4.1 Phases in Batch-wise Data
6.4.2 Phases in Variable-wise Data
6.4.3 Phases in Batch Dynamic Data
6.5 Conclusion
7 Batch Process Data Analysis and Statistical Monitoring
7.1 Introduction
7.2 Historical Batch Data Analysis
7.3 Batch Multivariate Statistical Process Control
7.3.1 Phase I
7.3.2 Phase II
7.3.2.1 Post-batch Process Monitoring
7.3.2.2 Real-time Process Monitoring
7.4 Practical Issues
List of Acronyms
Bibliography
Index
José M. González-Martínez is Manager of the Department of Chemometrics and Digital Chemistry at Shell in the Netherlands, overseeing worldwide operations and leading key consultancy efforts, new technology developments and R&D business initiatives. He specializes in Chemometrics and Statistics for Chemicals, Catalysis, Integrated Gas, CO2 Abatement and Low Carbon Fuel and Gas solutions. He has published multiple scientific articles and patents, and has been awarded several academic and industry prizes.
José Camacho is a Full Professor at the Department of Signal Theory, Telematics and Communication and leader of the Computational Data Science Laboratory (CoDaS Lab) at the University of Granada, Spain. He specializes in extracting knowledge from data and the design of new data science algorithms and software in domains like precision medicine, industrial processes, cybersecurity or ecology. He is Scientific Advisor at Datharsis.
Joan Borràs-Ferrís is a researcher and specialist in chemical engineering, applied statistics, and process modeling in digitalized industrial environments. He holds a PhD in Statistics and Optimization from the Universitat Politècnica de València, Spain. He is currently Chief Technology Officer at Kensight Solutions. He has received the ENBIS Young Statistician Award for his work introducing innovative methods that promote the use of statistics in daily practice.
Alberto Ferrer is a Full Professor of Statistics at the Universitat Politècnica de València, Spain, head of the Multivariate Statistical Engineering Group, Chief Scientific Officer at Kenko Imalytics, Scientific Advisor at Kensight Solutions, and elected member of the International Statistical Institute. His research focuses on the development and integration of machine learning and multivariate statistics to address the digitalization challenges in industry, healthcare, and technology. He is the recipient of the ENBIS Box Medal Award 2025.