U-shaped Vision Transformer and Its Application in Gear Pitting Measurement
Keywords:Vision transformer; residual connection; dilation rate; information interaction; pitting measurement
Although convolutional neural networks (CNNs) have become the mainstream segmentation model, the locality of convolution makes them cannot well learn global and long-range semantic information. To further improve the performance of segmentation models, we propose u-shaped vision Transformer (UsViT), a model based on Transformer and convolution. Specifically, residual Transformer blocks are designed in the encoder of UsViT, which take advantages of residual network and Transformer backbone at the same time. What’s more, transpositions in each Transformer layer achieve the information interaction between spatial locations and feature channels, enhancing the capability of feature learning. In the decoder, for enhancing receptive filed, different dilation rates are introduced to each convolutional layer. In addition, residual connections are applied to make the information propagation smoother when training the model. We first verify the superiority of UsViT on Automatic Portrait Matting public dataset, which achieves 90.43% Acc、95.56% DSC and 94.66% IoU with relatively fewer parameters. Finally, UsViT is applied to gear pitting measurement in gear contact fatigue test, and the comparative results indicate that UsViT can improve the accuracy of pitting detection.