Rolling Bearing Fault Diagnosis Based On Convolutional Capsule Network

Fault diagnosis technology has been widely applied and is an important part of ensuring the safe operation of mechanical equipment. In response to the problem of frequent faults in rolling bearings, this paper designs a rolling bearing fault diagnosis method based on Convolutional Capsule Network (CCN). More specifically, the original vibration signal is converted into a twodimensional time-frequency image using continuous wavelet transform (CWT), and the feature extraction is performed on the two-dimensional time-frequency image using the convolution layer at the front end of the network, and the extracted features are input into the capsule network. The capsule network converts the extracted features into vector neurons, and the dynamic routing algorithm is used to achieve feature transfer and output the results of fault diagnosis. Two different datasets are used to compare with other traditional deep learning models to verify the fault diagnosis capability of the method. The results show that the convolutional capsule network has good diagnostic capability under different working conditions, even in the presence of noise and insufficient samples, compared to other models. This method contributes to the safe and reliable


Introduction
With the progress and development of intelligent manufacturing, mechanical equipment plays an important role in the trend of intelligent manufacturing, and rolling bearings are indispensable parts in most mechanical equipment [1]. Rolling bearings are more prone to damage due to their complex working environment of variable load and speed changes. The failure of rolling bearings not only affects the operation of equipment, but also leads to serious safety issues and significant economic losses. Therefore, improving the accuracy and efficiency of fault diagnosis is very important.
Nowadays, the processing and analysis methods of fault signals have also been developed and widely used, from traditional time domain waveform analysis to new analysis methods such as wavelet analysis [2], Wigner-Ville technique [3], Hilbert demodulation [4] and other timefrequency analysis. The increasing number of mechatronic products in the market highlights the need for automated fault diagnosis methods. Traditional fault diagnosis methods rely heavily on empirical knowledge and require multiple indicators to reach a diagnosis, making it challenging to meet the demands of the rapidly growing market [5].
With the development of artificial intelligence, machine learning methods such as artificial neural networks [6], Bayesian classifiers [7], and support vector machines [8] began to emerge gradually, which are applied for extracting features to diagnose bearing failures. Even though bearing fault features are reliable indicators of machinery health, extracting these features usually requires complex mathematical techniques. Furthermore, the feature extraction methods used for different types of faults may vary. Therefore, manual extraction of fault features in diagnosis methods is highly dependent on the knowledge and expertise of experts [9]. As the number of bearing monitoring points and data volume grow, traditional diagnostic methods cannot meet the demands of big data analysis [10].
With the development of deep learning, which has achieved excellent results in various fields, it has also gained more and more attention in fault diagnosis [11]. In particular, the rise of convolutional neural networks [12], a usual deep learning method, has gradually led to intelligent and automated fault diagnosis. Huang et al. [13] proposed an end-to-end process architecture and a rolling bearing fault diagnosis model based on convolutional neural attention module-convolutional neural network, capable of adaptively extracting fault features and free from the reliance on manual processing of complex signals. Guo et al. [14] constructed an improved convolutional generative adversarial network to improve the accuracy of bearing fault diagnosis under complex operating conditions with the help of the data generation capability of generative adversarial networks and the feature extraction capability of improved deep convolutional networks. Xie et al. [15] proposed a hybrid model based on CNN (Convolutional Neural Network) and individual classifiers to diagnose bearing faults. Pan et al. combined convolutional neural networks with long-term and short-term memory networks to achieve good results in bearing fault diagnosis [16]. Wang  The experimental results show that the features extracted by this method are more comprehensive and can significantly improve fault diagnosis accuracy [20]. Zhi et al. proposed an intelligent fault diagnosis method based on convolutional neural networks to solve the problem of imbalanced bearing data [21]. The above literature has applied CNN to various fields of engineering practice and achieved excellent results. However, the training of deep neural networks, such as convolutional neural networks, relies on many samples, and the models suffer from overfitting factors when the training data is insufficient. Moreover, CNN is difficult to extract signal features contaminated by noise in noisy environments.
In traditional neural networks, each neuron is scalar in feature transmission and does not carry spatial position features, resulting in weak fault diagnosis capabilities. The capsule network was first proposed by Sabour et al. [22] in 2017, and its emergence solved this problem. Specifically, each neuron in the capsule network is a vector, not a scalar. This enables the capsule network to extract more comprehensive detailed features from the input data while reducing the loss of spatial feature information. To effectively address the problem of traditional deep learning models being unable to effectively extract spatial feature information when detecting fault signals, decreasing fault diagnosis ability. In this paper, we design a fault diagnosis method using convolutional capsule networks, which combines convolutional neural networks with capsule networks capable of extracting more comprehensive features to perform complete extraction of fault features. The innovations and contributions of this paper are summarized as follows: (1) This article designs a fault diagnosis method based on CCN, which avoids the drawbacks of manually extracting features and relying on expert experience. It can adaptively learn features and provide a foundation for implementing intelligent fault diagnosis.
(2) This article transforms the original vibration signal into a time-frequency domain signal after continuous wavelet transform processing, which can fully express the amplitude characteristics and frequency components of non-stationary signals, making the network more capable of learning features.
(3) This article improves the feature extraction layer of the capsule network to a combination of the convolutional layer and pooling layer, which can extract deeper features and reduce the number of parameters. However, the backend of CCN still uses the capsule network to vectorize and mine the spatial information of features. The rest of this paper is organized as follows: Section 2 describes the theory, and the methodology is introduced in Section 3; Section 4 uses experimental datasets to verify the effectiveness of the developed methodology; conclusions are drawn in Section 5.

Convolutional neural networks
Convolutional neural network(CNN) was proposed by Lecun in 1998 [23] as one of the representative algorithms of deep learning. It is a class of feed-forward neural networks with a deep structure that includes convolutional computation. CNN has achieved proud results in several fields.
In the field of fault diagnosis, CNN is also gaining more and more attention. Typically, CNN contains a convolutional layer, a pooling layer, and a fully connected layer, and their basic framework is shown in Figure 1.
In Equation (1), � � represents the output feature and � �,� represents the size of the convolution kernel elements in the ith row �th column, the � �,� represents the ith row �th column element size, and � is the bias.  The fully connected layer acts as a classifier in a convolutional neural network, transforming all feature matrices into one-dimensional feature vectors. The fully-connected layer is generally at the very end of the structure in a CNN and is responsible for the final output of the model.

Capsule networks
Capsule networks are used for feature delivery by vector neurons, which can well capture the position-relative relationship between features and avoid the loss of position feature information.
The computational process of the capsule network can be divided into three steps. In the first step, the prediction vector � �|� of the capsule network is the neuron � � multiplied by the weight value � �� , � � is the � th neuron, � �� is the weight matrix, and � � is the � th vector generated by the prediction of the �th input feature. Then its formula can be expressed as: In the second step, the output vector � � is obtained by multiplying the prediction vector � �|� by the coupling coefficient � �� . The formula can be expressed by Equation (3), � �� is the coupling coefficient between the �th vector in the main capsule layer and the �th vector in the digital capsule layer.
In the third step, the output vector � � is calculated by the nonlinear transformation of the total output vector � � . � � denotes the total output vector at the �th layer. The nonlinear transformation function is shown in Equation (4).
The coupling coefficient � �� is obtained by a dynamic routing operation, the purpose of which is to allow the input neurons to be intelligently selected for transmission to the next layer of neurons according to the features they carry. It is calculated as in Equation (5) and Equation (6) Figure 4 shows the operation process of the dynamic routing algorithm, in which the initialization of the paranoid coefficients � �� is done using 0 pairs, and the coupling coefficient � �� is calculated using Equation (5) to derive the output vector � � . The value of the new paranoid coefficients � �� is calculated using Equation (6) to calculate the value of the new � �� and the value of � � is further modified by the dynamic routing algorithm to change the value of the output vector � � . The value of the output vector � � is further modified by the dynamic routing algorithm.

Continuous wavelet transform
Continuous wavelet transform is a signal processing method that is gaining popularity in the field of fault diagnosis [24]. The continuous wavelet transform can be implemented by the following operations [25]: In Equation (7), � is the scale parameter, � is the translation parameter, �( ) is the original timedomain signal, � is the wavelet function, and � * is the complex conjugate of �.

Constructing the model loss function
During the training of the model, the weight parameters of the model are updated by a backpropagation algorithm. And back propagation requires a loss function that can calculate the spacing of the model's output from the true value. The loss function is used in the training process, and the parameters of the model's weight values are updated continuously by back propagation. In this paper, the expression of the loss function used is

Experiments and analysis of results
Due to the difficulty in collecting data on rolling bearings and the limited variety of data collected.
Therefore, this article selected the Case Western Reserve University Bearing Dataset [26] and the Patburn University Bearing Dataset [27] for simulation. Table 1 shows the model parameters of CCN, where the input time-frequency map size is 32*32.

Model parameter settings
CCN uses two convolutional layers to extract features and one pooling layer for parameter reduction. The pooling layer is set with a small perceptual field, mainly to reduce the parameters without losing too many features. The two convolutional layers perform scale transformation, and feature extraction on the data, and the extracted data are sent to the capsule module for the initial capsule construction. The data of the digital capsule layer is finally output after squeezing, and the final output is ten probability values corresponding to ten types of faults.

Introduction to the comparison methods
To evaluate the performance of the proposed method, this paper verifies the feature extraction capability of the convolutional capsule network by comparing it with other deep learning models.
The compared models are all traditional convolutional neural networks, which are introduced as follows: 1. LeNet-5 was proposed by LeCun in 1998 to solve the handwritten digit recognition problem and is considered one of the seminal works in convolutional neural networks [23]. This network was one of the first neural networks to be widely used for digital image recognition and one of the milestones in deep learning.
2. VGG16 is a classical convolutional neural network architecture [28]. VGG was developed to increase the depth of convolutional neural networks to improve the model performance.
3. ResNet18 is the model proposed by He et al. in 2015 [29]. The innovation of the residual structure is to increase the depth of the convolutional neural network and to make the convergence of the convolutional neural network faster. It also allows the convolutional neural network to have significantly fewer parameters at deeper layers than previous deep convolutional neural networks.

Introduction and data analysis
As shown in Figure 6, the test bench consists of a motor on the far left, a torque transducer in the middle, a force gauge on the right, and control electronics. The bearing under test is a motor support bearing, type SKF6205 bearing with deep groove balls. The three operating conditions of the data set are presented in Table 2. Data A is the data collected at a bearing speed of 1772 rpm and a load of 1 HP. Data B is collected when the bearing speed is 1750rpm, and the load is 2HP. Data C is collected when the bearing speed is 1730 rpm, and the load is 3HP. As shown in Table 3, each data set contains nine fault types and one normal status type, for a total of 10 fault types. A damage degree of 0.007 inches indicates mild damage, 0.014 inches is moderate, and 0.021 inches is severe. Each sample is labeled using a ten-dimensional one-hot coding vector, where only one of the ten numbers in the vector has a value of 1, and the rest are 0. The location index with a value of 1 indicates a category. In deep learning, due to the powerful fitting ability of artificial neural networks, too few samples in the training set can lead to overfitting of the model on the training set and a decrease in accuracy on the test set. To avoid overfitting, many samples are usually required as the training set. Therefore, this paper uses overlapping sampling to construct the dataset. The acquisition starts from the beginning of the original vibration signal, and each time 1024 data points are acquired, the acquisition is moved backward by 400 data points and continues. This is done until the complete original data is collected.
The original signal vibration waveform is plotted as shown in Figure 7. After plotting the original vibration waveform, the segmented signal is subjected to a continuous wavelet transform CWT process. Cmor3-3 wavelet is selected as the CWT wavelet basis function. Figure 8 shows its corresponding time-frequency graph after compression processing. The size of the picture is 32*32.

Results analysis
This section comprehensively analyzes and discusses the convolutional capsule network model designed in this paper. Specifically, firstly, we conduct experiments on the model under different working conditions and prove that the model has good generalization and robustness. Secondly, we conduct experiments on the model under different signal-to-noise ratios to verify the noise immunity of the designed model. Finally, we conduct experiments with insufficient training data to verify the powerful feature extraction capability of the convolutional capsule network designed in this paper.

(1) Fault diagnosis under different working conditions
In practical application scenarios, rolling bearings typically operate at different speeds and loads. [30][31][32]. Therefore, it is of great practical engineering importance to evaluate the fault diagnosis capability of the model under different working conditions [33][34][35][36]. To verify the fault diagnosis performance of the model under different working conditions, data from three different working conditions are selected for testing. The training and testing sets selected for this experiment are under the same working conditions and divided into three different working conditions for discussion. The accuracy curves of multiple deep learning models at a load of 1hp (data A) are shown in Figure  9. From Fig. 9, it can be seen that: in terms of the diagnostic capability of the network, the convolutional capsule network can reach a smooth convergence state quickly, and the diagnostic accuracy is much higher compared to other models, indicating that the proposed method has more robust feature extraction capability compared to other deep learning network models. In this paper, the processed time-frequency map is used as input to verify the effectiveness of the proposed method by diagnosing its fault classes. To reduce the influence of random factors and verify the stability of the proposed method, the method and other deep learning network models are repeated five times under three different operating conditions. Meanwhile, to quantitatively compare the diagnostic accuracy of the four diagnostic methods, the diagnostic accuracy of each test and the average diagnostic accuracy are listed as shown in Table 4. As shown in Table 4, the average test accuracy of the CCN was 100% under the three different working conditions. Compared with VGG16, ResNet18, and LeNet-5, it improved by 1.53%, 1.83%,

(2) Fault diagnosis under noise conditions
Due to the fact that rolling bearings are usually located in complex environments, they are inevitably affected by noise interference during actual equipment operation. Therefore, it is of great practical engineering importance to evaluate the noise immunity performance of the model in a noisy environment. To verify the fault diagnosis performance of the model in a noisy environment, add additive Gaussian white noise with different Signal to Noise Ratio (SNR) to the test data set. The SNR is an important index to evaluate the amount of noise contained in the signal. Its calculation formula is where � ��푔�� is the original vibration signal power, � �표��� is the noise signal power, and SNR is the signal-to-noise ratio.
In the noise-resistance experiments, Gaussian white noise with different signal-to-noise ratios is chosen to be added to the data set A for the experiments. The diagnosis results of different algorithms in different noise environments are shown in Figure 12  The diagnostic performance of LeNet-5 in the noisy environment is significantly lower than the other three networks, as shown in Figure 13. In the environment with SNR=10, the recognition accuracy of VGG16, ResNet18, and LeNet-5 all start to show a serious decline. At SNR=10, the accuracy rates of VGG16, ResNet18 and LeNet-5 were only 85.21%, 79.96% and 64.28%, respectively. Compared with the other network models, CCN only replaces the fully connected layer with the capsule layer, but the noise immunity has been improved significantly. This also shows that using vector neurons can extract more detailed information, enabling it to maintain a high recognition rate even in noise-polluted signals.
To better reflect the fault diagnosis performance of the CCN in the noisy environment, Figure 13 shows the classification results of each fault for different models. As can be seen from the figure, the accuracy of each classification of LeNet-5 in the noisy environment is significantly lower than the other three networks. At SNR=20, the classification ability of VGG16, ResNet18, and CCN showed relatively stable performance. However, in the SNR=10 environment, the classification accuracies of VGG16 and ResNet18 all start to show a severe decline, and CCN can still accurately classify seven types of faults. On the other hand, it proves that CCN has a more vital feature extraction ability and better noise immunity than other deep learning models.

(3) Fault diagnosis under insufficient samples
In practical engineering, due to the difficulties in collecting rolling bearing data, we are unable to obtain sufficient fault data. Therefore, verifying the model's performance in fault diagnosis with insufficient samples has important practical significance. Therefore, different numbers of the training set samples in date A were selected for testing. The numbers of training samples in each category in the experiment are 15, 30, 45, and 60, respectively, and the number of samples in each test set is 150. Figure 14 Fault diagnosis accuracy of each model with different samples As shown in Figure 14, the diagnostic performance of LeNet-5 in the under-sample condition is significantly lower than the other three networks. show that CCN has stronger feature stunning performance, enabling it to maintain high fault diagnosis ability even under insufficient samples.

Introduction and data analysis
The data for the Paderborn University bearing data set Alpha was collected from the modular test stand shown in Figure 15, where (1) is the motor, (2) is the torque measurement shaft, (3) is the rolling bearing test module, (4) is the flywheel, and (5) is the load motor. The various working conditions of the constructed dataset are shown in Table 5. Data D is the data collected when the bearing speed is 1500 rpm, the torque is 0.1 Nm, and the radial load is 1000 N. Data E is the data collected when the bearing speed is 1500 rpm, the torque is 0.7 Nm, and the radial load is 400 N. Data F is the data collected when the bearing speed is 1500 rpm, the torque is 0.7 N.m, and the radial load is 1000 N. As shown in Table 6, each data set contains five types of faults and one normal state type, for a total of 6 fault types. N represents the normal state, IF represents the inner ring fault, and OF represents the outer ring fault, where a damage degree of 1 indicates mild damage, 2 is moderate damage, and 3 is severe damage. Each sample is labeled using a six-dimensional one-hot coding vector, where only one of the six numbers in the vector has a value of 1, and the rest are 0. The location index with a value of 1 indicates a category. The overlapping sampling method is used to construct the data set, starting from the beginning of the original vibration signal, acquiring 1024 data points each time, and then moving backward 500 data points to continuing acquiring after the acquisition is completed until 300 samples of the original data are collected. The rest of the experimental parameters are consistent with the data set of Case 1.

Results analysis
To verify the generalization and robustness of the model, we also analyzed the fault diagnosis capability of the method under different operating conditions, noisy conditions, and insufficient samples in Case 2 and compared it with other deep learning models.

(1) Fault diagnosis under different working conditions
The accuracy of various models for the fault diagnosis experiments under different operating conditions is shown in Table 7. The accuracy of various models under data set D is shown in Figure 16:   Figure 18 shows the confusion matrix obtained for the different models using the test set D.

(2) Fault diagnosis under noise conditions
Since Case 2 has six fault types, the difficulty of fault diagnosis is lower compared to Case 1, with ten fault types, so a smaller signal-to-noise ratio is chosen. The accuracy of various models under different noise conditions is shown in Figure 19.  Analysis of experimental results: It can be observed from the above experimental results that the conclusions we got in case 1 have been proved successful. Firstly, in experiments under different working conditions, the accuracy rate of the method designed by us reaches 100%, which proves that the method has good generalization and robustness. Secondly, in the experiments under the condition of noise and the condition of insufficient training samples, the method designed in this paper performs better than other traditional deep learning models. It is again proved that the method designed in this paper has a specific anti-noise ability and strong feature extraction ability in the case of insufficient samples

Impact of Iteration Times on Dynamic Routing Algorithms
The dynamic routing algorithm is the core algorithm of capsule networks, used to calculate the similarity weight coefficients of capsules and update similarity. The dynamic routing algorithm is equivalent to doing a fully connected mapping, where each path requires a fully connected mapping of all dimensions of the upper and lower capsules, resulting in a vast number of parameters. Too many iterations of dynamic routing algorithms can lead to excessive training parameters, while too few iterations can lead to incomplete mapping and insufficient diagnostic ability. Therefore, evaluating the number of iterations of dynamic routing algorithms for stopping CCN is of great significance. This section selects data with SNR=10 in date A for validation, and the results are shown in Table 8: The first column in Table 8 shows the number of iterations of the dynamic routing algorithm, the second column shows the accuracy of diagnosis, and the third column shows the execution time of each training round. The table shows that the accuracy is highest when the number of iterations is 3, and the execution time is also in an intermediate position. Therefore, the dynamic routing algorithm selected in this article has an iteration number of 3.

Conclusion
This article designs a rolling bearing fault diagnosis method based on convolutional capsule network, and the conclusions are summarized as follows: (1) Compared to other traditional machine learning models, the method proposed in this paper can adaptively extract fault features, avoiding the drawbacks of manual feature extraction and relying on expert experience, and providing a foundation for implementing intelligent fault diagnosis. (5) To verify the applicability and generalization of the method proposed in this article, data from six operating conditions from two datasets were used for validation. The results indicate that the fault diagnosis ability is stable in various working conditions, and the diagnostic accuracy is significantly higher than other comparison methods.
Although capsule networks have good diagnostic performance under noisy conditions and insufficient samples, they still have the problem of difficult training due to large training parameters.
In future research and learning processes, the capsule network model will be improved to address the drawback of high equipment requirements due to the large parameters of the capsule network model.

Acknowledgements
This work was partially supported by the Science and Technology Planning Project of Inner Mongolia of China under contract number 2021GG0346.