New hybrid multivariate analysis approach to optimize multiple response surfaces considering correlations in both inputs and outputs

Quality control in industrial and service systems requires the correct setting of input factors by which the outputs result at minimum cost with desirable characteristics. There are often more than one input and output in such systems. Response surface methodology in its multiple variable forms is one of the most applied methods to estimate and improve the quality characteristics of products with respect to control factors. When there is some degree of correlation among the variables, the existing method might lead into misleading improvement results. Current paper presents a new approach which takes the benefits of principal component analysis and multivariate regression to cope with the mentioned difficulties. Global criterion method of multiobjective optimization has been also used to reach a compromise solution which improves all response variables simultaneously. At the end, the proposed approach is described analytically by a numerical example.


Introduction
Making decisions about complex problems involving process optimization and engineering design strongly depends on well identified effective factors.From the viewpoint of quality, a process should be designed so that the products could satisfy customer's needs.Quality engineering techniques try to find the interrelations between input parameters and output quality characteristics (also called response variables) as well as to improve outputs.
A common problem in product or process design is to determine optimal level of control variables where there are different outputs, which are often highly correlated.This problem is called multi-response optimization (MRO) with correlated responses.
Several studies have presented approaches addressing multiple quality characteristics but few published papers have focused primarily on the existence of correlation.
Correlation can also meaningfully affect the analysis of MRO problem in another way.Nuisances in experiments may be classified into the following three categories (MONTGOMERY, 2005).
'Known and controllable variables' that are controllable, but their effect is not of interest as a factor.For this kind of nuisance, a technique called blocking can be used to systematically eliminate its effect in the statistical analysis.
'Unknown and uncontrollable variables', that is, the existence of the factor is unknown and it may even be changing levels while the experiments are conducted.Randomization is the design technique used to analyze such a nuisance factor.
'Known and uncontrollable variables', especially, it could be measured during the experiment runs called covariates.In this case, finding individual effect of covariate and their interaction with other variables could help analysts to improve response values.
Complex process or system may be affected by stochastic covariates which can be correlated.The correlation among inputs adds more complexity in estimation as well as optimization.
This paper proposes a methodology that can analyze correlated multiple response surfaces fitted on control factors and correlated covariates.Global criterion (GC) method of vector optimization is also applied since there are several output characteristics to be optimized.
The structure of the remaining part of this paper is as follows.The next section provides a summary of MRO approaches with special focus on correlated responses and correlated covariates.Afterwards, the required information about the proposed methodology is provided.Finally, section 4 illustrates the method by a numerical example.
In multiresponse modeling there are often three types of variables: Factors, nuisances and responses.When a significant degree of correlation exists among the variables, the standard methods cannot estimate the model precisely and, consequently, the optimization results might be unreliable.Modeling and optimization of correlated response surfaces have been recently heightened by many researchers.Chiao and Hamada (2001) considered experiments with correlated multiple responses whose means, variances, and correlations depended on experimental factors.Analysis of these experiments consists of modeling distributional parameters in terms of the experimental factors and finding factor settings which maximize the probability of being in a specification region, i.e., all responses are simultaneously meeting their respective specifications.
It is assumed that the multiresponse set has a multivariate normal distribution and also that each response variables is desired to be within a predefined specification region.Kazemzadeh et al. (2008) applied multiobjective goal programming model to provide a general framework for multiresponse optimization problems.Shah et al. (2004) used the seemingly unrelated regressions (SUR) method for estimating the regression parameters where there are correlated dependent variables.The method can be useful in MRS problem with correlated responses and leads to a more precise estimate of the optimum variable setting.PCA is a well-grounded statistical multivariate technique for dimension reduction and making independent components from a set of correlated variables.Tong et al. (2005) used PCA to convert correlated response variables to ordinary response surfaces and also applied a multi-criteria decision-making method called TOPSIS to aggregate several quality characteristics.Antony (2000) used PCA with Taguchi's method.In this method, it is assumed that only those components whose eigenvalues are greater than one can be selected to form final response variables.Thus, their method could not be applied if the problem has more than one component with such characteristic.Tong et al. (2005) determined the optimization direction of each component based on corresponding variation mode charts.Furthermore, Wang (2007) used TOPSIS to find an overall performance index as a criterion for optimizing the multiple quality characteristics.
In order to analyze covariates in MRO problem some research studies have recently been conducted.Hejazi et al. (2011)  According to the literature, many works have been conducted on using Principal Components Analysis (PCA) to solve correlated multiresponse problems.PCA converts several correlated columns to independent components by linear transformations.These components are then substituted into multiple original responses.Another approach to solve this problem is based on prediction of the correlation as an individual response variable by Response Surface Methodology (RSM).Each of the mentioned approaches has specific benefits and limitations.It seems a sensible claim that PCA cannot provide proper directions for optimization of components.Moreover, if the number of selected components is less than the number of original responses, some information is lost.Consideration of correlation coefficients as separate response variables requires multi-replicated design for experiments.Additionally, the accuracy of estimated correlation is strongly dependent on the number of replications.However, more experiment runs are more costly and time-consuming.Furthermore, even though there are enough experimental runs, the statistical error in response regression is unavoidable.The last approach in solving multiresponse optimization problem is multivariate regression method that is very useful when response variables are correlated.
The proposed method aims to consider all of location effects and correlation among the responses.In addition, probabilistic covariates are included into the multiresponse model to reduce error terms and uncovered variance.

Material and methods
When the problems involve several equations with common variables, it is recommended to estimate the parameters through a system of equations simultaneously.Various methods such as Ordinary Least Squares (OLS), Cross-Equation Weighting method, SUR, Two-Stage Least Squares (2SLS), Weighted Two-Stage Least Squares (WTSLS), Three-stage Least Squares (3SLS), Full Information Maximum Likelihood (FIML), and the Generalized Method of Moments (GMM) have been proposed to solve such problems.Among them, SUR and FIML methods have been used in this paper to estimate the response surfaces simultaneously.
The SUR method, also known as the multivariate regression, or Zellner's method, estimates the parameters of the system, accounting for heteroscedasticity and contemporaneous correlation in the errors across equations.
Full Information Maximum Likelihood (FIML) estimates the likelihood function under the assumption that the contemporaneous errors have a joint normal distribution.
The aforementioned methods are compared with respect to the main characteristics in Table 2.In this study, there are two main approaches included in the proposed methodology to analyze correlation among the inputs as well as the outputs.The covariates are initially transformed by PCA to remove their correlation and after that, the response surfaces between correlated response variables and input (including PCs and control factors) are fitted through a simultaneous equations system.
Consecutive steps of the proposed approach are as follows: Step 1: Identify input and outputs variables.In this step, all potentially effective variables (namely responses, factors, covariates and other nuisances) should be identified.
Step 2: Select a proper design and run the experiments.
A proper design is selected for conducting the experiments regarding the number of variables and their levels.
Perform PCA on correlated covariates to get independent components (see appendix (A) for more details about PCA).
Step 4: Develop a system of equations.4) a. Perform an initial RSM to get an insight about the more effective factors on each response.4) b.Define an equation for relations between each response and other variables.
Next, enter each response variable and related factors as an equation into the system.In addition let each response be considered as a predictor variable for other ones.
Step 5: Estimate parameters of the system.
If the error terms are normally distributed, use FIML, otherwise perform ISUR method to estimate the coefficient of effects.
Step 6: Construct multi-objective optimization model including the following objective functions.
-Response surfaces related to quality characteristics.
-Probability function of the PCs derived by using PCA transformation equations and probability function of original covariates.
Step 7: Apply Global Criterion (GC) method to solve the multi-objective optimization model.
In Section 4 these steps are discussed in details.

Model representation
A general multiresponse problem can be expressed as: Subject to: ; (1) where: ˆ( ) i R x represents response surface for ith quality characteristic; ( ) j f pc is the probability function of jth PC; x is vector of control factors; c is covariate vector calculated by inverting the PCA transformation.
Furthermore, it is assumed that the process is statistically under control and the control range for covariate vector is [ lcl, ucl ].

Optimization method (Global Criterion)
This method allows one to transform a multiobjective optimization problem into a singleobjective problem.The function traditionally used in this method is distance.The multi-objective method can be written as follows: where T i is the optimum value of problem objective function when only ith objective was considered; wi is a value representing importance of each objective; di is the range of ith response within the observed experimental runs (DONOSO; FABREGAT, 2007).In this study GC method was applied to convert problem into single objective form.

Results and discussion
This section is organized to demonstrate the computational steps of the proposed approach.For this purpose, a numerical example from the literature is considered with some modifications (MONTGOMERY, 2005).
Step 1: A chemical experiment with three controllable variables and two covariates is designed to be analyzed by the proposed method.The outputs are conversion (Y1) and activity (Y2) levels.Humidity (c1) and environment temperature (c2) are considered as probabilistic covariates.
Step 2: A CCD design is selected and the experiments are conducted accordingly.Table 3 shows the results of experiments gathered by a Central Composite Design (CCD).
Step 3. PCA is performed on Humidity and Temperature factors.According to the observations, they have the following probability distribution.Since, there is a significant linear relationship between two covariates, it is reasonable to consider a bivariate distribution for their treatments.It may be observed that these two covariates follow a normal distribution with the following parameters: (3) Consider the above distributions as marginal probability functions of c1 and c2.Therefore, the bivariate normal probability distribution for the covariates can be estimated as follows: (4) PCA gives the following equations to transform the set of covariates into a set of independent ones (The required calculations are performed in Minitab statistical package). (5) Step 4. Understanding the strong effects helps us to fit better surfaces of response variables.Therefore, Figure 1 is provided to show the effects graphically and separate RSMs have been initially conducted on each response to guess which predictive terms should be included in the estimation.The results showed that the following terms would be considered to construct the system of equations.
In this case, the problem is analyzed by Iterative Seemingly Unrelated Regression (ISUR) and FIML.The response surfaces regressed by the mentioned methods are given below in Table 4 (Eviews statistical package has been used to estimate the parameters in system).
Table 4.Estimated equations in the system using FIML and ISUR method.

Method
Estimated system ISUR (6) The last constraints calculate the original value of covariates by inverting the transformation matrix (A) and ensure that the covariates are within the prespecified statistical control limits.The following calculations are required to calculate the probability function of the PCs.
Theorem 1-If C is vector of p random variables jointly distributed by N p (µ c , ∑ c ), and A is a q p matrix, then the distribution of PC = AC remains a multivariate normal with the following parameters (Proofs are available in Rencher and Schaalje ( 2008)).9) is a nonlinear programming due to the first two objective functions.It can be simplified to quadratic programming model by considering this point that the mode value of each normal distribution occurs at mean value.Therefore, the maximum probability equals to minimum distance form mean value.
With this property of normal distribution, the final multiobjective quadratic programming can be written as: Subject to: The same constraints (11) Table 5 gives a summary of optimal solutions obtained by solving the above model for each objective functions separately.According to Table 6, the final multi-objective mathematical model using Global Criterion can be constructed by replacing the objective functions of the above multi-objective program as Equation ( 6). (12) In this example, we consider the same important degrees for all objective functions.Table 6 shows the optimal solution and the related objective values for this example.
The results support the claim that the method which applies PCA on outputs cannot correctly find optimization direction.But the application of PCA to solve co-linearity among covariates would lead into better and more accurate estimations.It is also observed that most probable values of covariates would lead into the more reliable results.The PCA method reaches the target of first objective due to the large coefficient of first response in the first PC.It seems PCA is more useful for correlated predictors rather than correlated multiresponse problems.Most existing MRO works used PCA to gain uncorrelated responses, but they usually disregarded the proper direction of location effects.Moreover, the proposed methodology has following main features:   The effects of covariates with known distribution function can be identified in this approach, PCA is used to solve co-linearity issues when there are meaningful dependencies among the covariates.
Several objective functions and performance indices of a quality engineering problem can be optimized simultaneously by using GC method, The desired direction for optimization of responses doesn't change after modeling and optimization.

Conclusion
This study proposes a new hybrid approach on multiresponse optimization in which PCA method applies to handle co-linearity among the covariates and uses multivariate system regression to predict the correlated responses.Current study tries to model the multiresponse-multicovariate problem in a simultaneous system of equations and use the estimated equations to construct an optimization program.
For further studies, the mixed set of categorical and numerical responses is suggested.In this work, only the variances of observed values were considered.Therefore, the variances of predicted responses can be another future research on this subject.

FIMLFigure 1 .
Figure 1.Matrix plot for the experimental data.
is the transpose of matrix A. According to Theorem 1, the distribution function of the PCs is given below.As shown above, the new components have zero covariance so their probability distributions can be expressed by two individual and univariate normal variables.pc1~N(15.3,6.682) and pc2~N(-0.4,0.029) Now, model represented by Equation set (6) can be explicitly formed as: analyzed by the methodology,

Table 1 .
Comparative study of the major works on MRO with correlated data.

Table 2 .
Characteristics of the major methods of system estimation.

Table 3 .
Results of designed experiments for numerical example.

Table 5 .
Trade off matrix and required parameters of GC method.

Table 6 .
Optimal results of the numerical example.