A Robust Approach of Multi-sensor Fusion for Fault Diagnosis Using Convolution Neural Network

Multi-sensor measurement is widely employed in rotating machinery to ensure the safety of machines. The information provided by the single sensor is not comprehensive. Multi-sensor signals can provide complementary information in characterizing the health condition of machines. This paper proposed a multi-sensor fusion convolution neural network (MF-CNN) model. The proposed model adds a 2D convolution layer before the classical 1-D CNN to automatically extract complementary features of multi-sensor signals and minimize the loss of information. A series of experiments are carried out on a rolling bearing test rig to verify the model. Vibration and sound signals are fused to achieve higher classification accuracy than typical machine learning model. In addition, the model is further applied to gas turbine abnormal detection, and shows great robustness and generalization.


I. Introduction
Rotating machinery predisposes to various faults under extreme working speed and load for a long time, which directly threaten the safe operation of mechanical systems. Undetected faults may lead to undesirable vibration and noise, the reduction of equipment service life, huge economic losses, or even catastrophic personal casualty. A reliable and robust approach to machine diagnosis will detect the fault in time and reduce the cost of maintenance. For the guarantee of the safe operation of machine equipment with large size and complex structure, a large number of sensors are mounted to monitor whether the equipment works. Multi-sensor data provides more complementary information of the health condition, which helps achieve higher diagnosis accuracy [1].
In order to extract the valid information and reduce the redundant information from a mass of sensors, the approaches of fusion were proposed. Depending on which signal processing step the information is combined together, common fusion methods can be classified as data-level fusion, feature-level fusion, and decision-level fusion [2]. Inturi [3] extracted the statistical features of the vibration and sound signals based on EMD and identified the fault sensitivity features by utilizing the decision tree algorithm. Safizadeh et al. [4] employed principal component analysis(PCA) to reduce the redundancy of features, after calculating the vibration and load signal indices. Gomes et al. [5] used a recursive feature elimination algorithm to screen features in the time and frequency domain of vibration and sound signals, and then established the mapping relationship between the optimal features and the tool states. Tang et al. [6] used the multi-layer selective integration algorithm to simulate the cognitive process of experts and fused the feature information of sound and vibration spectrum by integration submodels. Dewallef et al. [7] integrated a Kalman filter and a Bayesian belief networks for fault diagnosis, and achieved decision-level fusion of information. Based on fuzzy measure and fuzzy integral theory, Liu et al. [8] proposed a feature-level fusion model and a decision-level fusion model for fault diagnosis. However, with the deepening of fusion level, the complexity of the method increases. In contrast, data-level fusion is used to combine the information from multi-sensor before the complex feature extraction operation, less prior knowledge is required. Common fusion methods includes weighted average [9][10][11], machine learning (ML) and so on.
ML-based method has been widely used for multi-sensor fusion in engineering [12]. Guo[13] converted accelerometer signals to several time-frequency feature maps using continuous wavelet transform (CWPT), and the maps were fused by a convolution neural network(CNN) later. Lu et al. [14] employed empirical mode decomposition (EMD) and multi-fractal to fuse signals and construct the feature maps. Some simple geometric operation is also used to construct the input map, without the advanced signal processing. Zhang et al. [15] fused accelerometer signals by splicing their spectrum and using deep boltzmann machine(DBN) to identify ball screw performance degradation. Azamfar et al. [16] stacked the motor current signals into a matrix as input. Wang et al. [17] extracted features of accelerometer signals and built a composite image as input. These methods using signals with the same physical quantity. However, signals with many physical quantities are used in engineering, including vibration, sound, air pressure, flow, etc. The research of a robust and adaptive fusion method is necessary.
This paper proposed a robust MF-CNN method for machine fault diagnosis, which executes the fusion process by a 2-D convolution layer. The principal contributions of this paper are summarized as follow: (1) An effective multi-sensor fusion method is proposed, which could execute multi-sensor signals fusion to improve the diagnosis accuracy. Series of tests are conducted to validate the method. (2) A robust MF-CNN model is established which has great generalization and is robust to different industrial application. Moreover, it performs well for bearing fault diagnosis with strong noise disturbing. (3) The proposed fault diagnosis model could execute fault diagnosis without prior knowledge or advanced data pre-processing methods.
Below is an outline of the remainder of the paper. The theoretical background of this approach is introduced in section II, and the MF-CNN approach is introduced in section III. In section IV, the performance of the model is valid by several experiments. And the robustness and generalization of the model is valid by two applications in engineering. Finally, conclusions are drawn in section V.

A. Basic components of CNN
CNN is a typical deep feed-forward network model. The basic CNN frameworks usually include several convolutional layers, pooling layers, activation layers and fully-connected layers [18].
The convolutional layer is the main component of CNN, which aims to learn feature of the inputs. Convolutional layer consists of several convolution filters to extract different kinds of feature. To generate complete feature maps, the weight of kernels will be shared by all spatial locations of the input. The output value where index (i, j, k) means the position (i, j) of the k-th matrix, l means the layer number. w is the weight vector, b is the bias, and the x is the input matrix. Particularly, convolution operation is a process of weighted addition of values of multi-sensor signals at the same index.
The pooling layer is usually inserted between two convolutional layers. It could reduce the resolution of the feature maps for faster convergence. Denoting the pooling function as pool(·), which commonly represent max pooling or average pooling function, for local feature map , , l i j k a , we have: where y is the calculation result of the pooling layer.
ReLU is one of the most famous activation functions, which converts the negative part to zero. The ReLU function could be concluded as: where z is the input of the activation function. ReLU allows the network to solve more nonlinear problems, calculate faster than other activation functions, and easily obtain sparse representations.
After stacking several layers for feature extraction, there might be a fully-connected layer for final reason and classification. And the last layer of CNN is output layer. The Softmax function is usually used for classification tasks: where i S is the probability of the i-th class among j class.
It is important to choose an appropriate loss function for a specific task. During training, the optimum parameters can be obtained by minimizing the loss of the network. Denoting N desired output ( ) n o of CNN with target label ( ) n y , where n means the n-th output, the loss could be calculated by: where θ represents the optimum parameters of the loss function.

B. Optimization
The ability of trained deep CNNs is highly dependent on the amount of training data. Data augmentation [19] is a straightforward strategy to solve data scarcity, which transforms the available data in a simple geometric way without changing their features. For a 1-D time serial, shifting is the operation commonly used. A fixed window is shifted across the data with partial overlap.
The backpropagation algorithm uses gradient descent to update the parameters. Standard gradient descent is conducted by: where [ ( )] E L θ is the expectation of loss value over the batch of training data, and  is the learning rate. Nesterov momentum [20] is a gradient descent method which takes historical gradient into consideration and anticipatory update to prevent the excessive optimization. Nesterov momentum can be described as: is the current velocity vector, and  is the momentum term.
With the increase of network depth, the distribution of feature maps will greatly differ from the raw data, which may result in an inaccurate network. BN layer linearly transforms batch of data to have zero-mean and unit variance to alleviate this phenomenon.  Multi-sensor signals are a combination of multiple one-dimensional signals, which can be arranged into a two-dimensional array. One dimension represents time, and the other dimension is the number of channels. The first approach that comes to mind should be to directly use 2D-CNN for classification. Theoretically, 1-D CNN is a special form of 2-D CNN. However, multisensor signals have physical meaning in the time dimension, and have no physical meaning in the channel number dimension. The direct use of 2-D CNN may attach some redundant meaning to them. The sequence of multi-sensor signals forming the matrix will also influence the diagnosis results. The method adopted in this paper is to use a convolution operation to fuse the signals of all channels into a 1-D signal to avoid this problem.

III. Proposed Multi-sensor Fusion Method Based on CNN
In this study, MF-CNN method is proposed for machine fault diagnosis. The structure of MF-CNN model is shown in Fig. 2 Table 1. It should be noted that the proposed model is able to fuse multi-sensor signals automatically in data level, so that the loss of information is minimized.

A. Experiment on Rolling Bearing Test Rig
Rolling elements bearing is the important part of rotating machinery systems, and health conditions of bearings directly affect whether the machine can work properly. In this work, an experiment on rolling bearing fault diagnosis is conducted for method validation. Figs. 3(a) shows the test rig used in experiment. The rotor is driven by a motor (0.735kW, three-phase) and a data acquisition system is used to collect the data. The driven speed range of the motor is 0rpm to 6000rpm. The testing bearing (ER-12K deep grove balling bearing from MB) is mounted on the drive end. Figs. 3(b-e) shows the four health conditions of bearing tested in this work. Vibration and sound signals are widely used in bearing fault diagnosis. Accelerometers mounted on the fault part can capture the vibration signals with higher signal-to-noise ratio (SNR). Sound signals are collected by microphones in a wilder frequency bandwidth (20-20000Hz) than vibration, and are often used in industrial fault diagnosis [21]. As homologous signals, effectively fusing vibration and sound signals can make full use of their complementary characteristic information for bearing fault diagnosis. A mono-axial accelerometer (DYTRAN 3035B) is mounted on the top of the pedestal, adjacent to the testing bearing for acquiring vibration signals. The sound signal is measured by a microphone (AWA14425), which is installed 20cm away from the testing bearing. For this experiment, data is collected from different speeds at different speed, with 48000Hz sample rate. Each test is repeated 3 times to enable statistical robustness.
In this study, linear trend was removed from the vibration and sound data. The result of a linear least-squares fit to data is subtracted from data. Besides, standardization is used to process the data. As a linear transformation, standardization does not lose the information of data, and helps accelerate the calculation of gradient descent. A comparison of the data acquired from four health conditions in time domain and frequency domain is given in Figs. 4 and Figs. 5 respectively. It is difficult to judge the health conditions intuitively. The fault frequencies could be extracted from the frequency spectrum but prior knowledge is required for diagnosis. It is expected MF-CNN would get better results while using raw data in frequency domain without other advanced data processing or prior knowledge.
In this section, the MF-CNN model is tested by the experimental data. After preprocessing, signals collected at different working speed are constructed to an input dataset. For each condition of the bearing, data at different working speeds are mixed together to build the training dataset, aiming to verify the performance of the model under non-stationary conditions. The detailed description of the dataset is shown in Table  2. Parts of the samples are used to training the model and the rest are used as testing samples. For each experiment, 20 tests are conducted to avoid unexpected contingency in the results, and then calculate the average and standard deviation of these 20 runs. A comparison among the MF-CNN model and the classical CNN models with single signal as input is constructed and the results are shown in Fig. 6. By the way, models with different proportion of training data and different sample length are tested. Accordingly, the MF-CNN model gets better diagnostic results than the typical 1-D CNN model using single signal, which proves that the MF-CNN method is effective for multisensor fusion. When the proportion of training data varies from 30% to 75%, the model has good diagnostic ability. With the increase of sample length, the diagnostic accuracy showed a slight growth. However, increasing the sample length may lead to the increase of training time and GPU memory load. According to the result shown in Fig. 6, 4800 is chosen as the sample length for the subsequent diagnosis Classical ML methods, such as k-Nearest Neighbours (kNN), Support vector machine (SVM), Artificial Neural Network (ANN), are used for comparison. The dataset proposed in Table 2 was adjusted to the suitable format of these method. Hyparameters of each ML method have been carefully optimized and shown in Table 3. A comparison of the results of different method with different input data is shown in Fig. 7. The MF-CNN method performs better than the classical ML method, which proves that the method is more effective for bearing fault diagnosis. The achieved results indicate that SVM and the classical CNN model also perform better than other ML methods. It should be noted that, the accuracy of SVM model using both vibration and sound signals is lower than using single sound signal. It proves that some repetitive information in signals may become an obstacle of SVM for classification task. The classical SVM is not a suitable method for multi-sensor signals fusion and fault diagnosis.
Additionally, the measured signals collected in engineering are more complex. Signals acquired from mechanical equipment parts  will be coupled together, which brings challenges to fault diagnosis. Therefore, the diagnosis algorithm needs to have the ability to classify the signal with strong noise. In this study, white Gaussian noise with different signal-to-noise ratios (SNR) is manually added to the raw signals, which is conducted by awgn function in MATLAB. 3dB and 6dB are two commonly used values and are selected for the test. 3dB means that the power of the signal is twice that of the noise, and 6dB means that the power of the signal is four times that of the noise.
The testing results of different classification methods with noise are shown in Fig. 8, which proves that the MF-CNN method proposed has better ability of bearing fault diagnosis with strong noise disturbing. MF-CNN and SVM have higher robustness to noise disturbing. Their diagnostic accuracy is higher than 98%. kNN is sensitive to noise, which reduces accuracy by about 10%. However, it should be noted that, while the input with strong noise, the MF-CNN model might face the risk of falling into the local optimal solution or even non-convergence in the process of training. In this experiment, the gross errors of results are selected and deleted. However, in practical engineering, the trained model with high accuracy will be chosen, thus the model is still of practical engineering significance.

B. Industrial Application on Gas Turbine Abnormal Detection
Experimental validation on gas turbine is conducted in this section. Gas turbine technology is actually considered as a key step toward a low-carbon society [22]. To meet the fluctuation of future energy market, frequent starts and fast load brings challenges to gas turbines [2]. Effective diagnostic methods are essential for the sustainable management of gas turbine plants.
The layout of monitored sensors for gas turbine abnormally detection is shown in Fig.  9(a). The pressure signals of the compressor inlet and the turbine exhaust, the vibration signals of four bearings, and the fuel flow signal of combustor are monitored. Their detailed information is listed in Table 4. All data points collected are the peak-to-peak values in 1 minute interval.
In a main overhaul, different cracks were found after inspection on the blades and the seals of the compressor as shown in Fig.  8(b). The data provided have been classified as normal and abnormal by the operators, and Fig. 10 displays the signals in different healthy conditions as example [23].
The process of dataset establishment is similar to the experiment in section VI(A). MF-CNN and other classical ML methods are tested for comparison and the results are shown in Table 5. In this experiment, MF-CNN and kNN method performs well. Although the accuracy of kNN method is higher than MF-CNN, reaching 95.29%, the proposed MF-CNN method has better generalization and stability, combining with the former experiment.

V. Conclusions
This paper proposed an effective multisensor fusion method for fault diagnosis, based on a CNN framework. Signals collected by different sensors are stacked row by row to form a matrix, and the signals are combined together through a convolution operation. In the experiment, some classical ML methods get lower accuracy while using multi-sensor signals. Instead, the proposed MF-CNN model makes better use of the information of multi-sensor signals and fused them in an appropriate way. The model finally reaches a higher accuracy of fault diagnosis.
Moreover, the proposed MF-CNN model has great performance and robustness to different working scenarios, which is verified by serials of experiments. In the two experiments, including laboratory condition, and industrial application condition, the MF-CNN model performs well and successively get high accuracy of 99.96% and 92.06%.