Temporal Behavior Recognition Technology of Thermal Protection Tile Gluing Process Based on SimA3D Model
Citations
GUO Chengda, LI Shuanggao, HOU Guoyi, et al. Temporal behavior recognition technology of thermal protection tile gluing process based on SimA3D model[J]. Aeronautical Manufacturing Technology, 2026, 69(5): 25020153.
College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing210016, China
Citations
GUO Chengda, LI Shuanggao, HOU Guoyi, et al. Temporal behavior recognition technology of thermal protection tile gluing process based on SimA3D model[J]. Aeronautical Manufacturing Technology, 2026, 69(5): 25020153.
Abstract
The gluing quality of thermal protection tile on hypersonic vehicles directly affects thermal insulation performance and flight safety. Current gluing process predominantly relies on manual operations strictly following established procedures. However, their dynamic complexity and strictly time-sequenced characteristics lead to frequent occurrences of operational sequence errors and component mis-assemblies, necessitating intelligent temporal behavior recognition and monitoring methods. To address these challenges, this study first defines the temporal behavioral characteristics of tile gluing process. Subsequently, we construct the SimA3D model for temporal behavior recognition by integrating the SimAM parameter-free attention mechanism into the C3D network architecture. A cosine annealing dynamic learning rate strategy is introduced in conjunction with an adaptive AdamW optimizer to enhance model convergence stability. Furthermore, a triple collaborative data augmentation strategy is proposed to expand sample diversity and input data complexity, effectively alleviating overfitting issues in small-sample temporal behavior recognition scenarios. Experimental results demonstrate that the SimA3D model achieves 98.32% recognition accuracy for gluing process behaviors, and the accuracy is improved by 19.9 percentage points over the baseline C3D network.
传统的热防护瓦装配以人工操作为主[ 郭朝邦, 李文杰. 高超声速飞行器结构材料与热防护系统[J]. 飞航导弹, 2010(4): 88-94.GUO Chaobang, LI Wenjie. Structural materials and thermal protection system of hypersonic vehicle[J]. Aerodynamic Missiles Journal, 2010(4): 88-94. 1],胶接工艺尤为关键,需严格遵循既定工艺顺序,如图1所示。以配胶、涂胶等工艺为例,配胶须按序控制胶体配比与搅拌时序,因对胶体流动性具有一定的要求,需目视判断胶体黏度;涂胶须严格遵循贴合面到隔离垫的涂覆顺序,并根据瓦块曲面调整施力角度和施力方向,确保表面平整无凸起,工艺具有动态过程复杂的特点,这些具有严格时序约束的行为构成了典型的胶接工艺时序行为。由于飞机表面热防护瓦块数量多达上万块,使得胶接工作量较大,因此胶接工艺时序行为对操作者的技能经验与专注度要求极高,极易因疲劳或规程执行偏差引发操作顺序出错、零件混装等不规范行为,从而影响装配质量、效率和安全。
图1 热防护瓦胶接工艺
Fig.1 Thermal protection tile gluing process
传统工业领域聚焦产品质量监测,通过质量监测追溯操作问题,无法实时规避装配人员操作失误,难以适应现代装配制造需求[ 王天诺. 基于深度学习的装配操作监测研究[D]. 青岛: 青岛理工大学, 2019.WANG Tiannuo. Research on assembling operation monitoring based on deep learning[D]. Qingdao: Qingdao University of Technology, 2019. 2]。随着计算机技术的突破性发展,行为识别已成为计算机领域的研究热点之一,这使得人工装配过程的智能化监测成为可能,可有效规避人工失误导致的缺陷问题。
传统行为识别依赖于梯度方向直方图(HOG)、灰度共生矩阵(GLCM)和加速鲁棒特征(SURF)[ 乔琦. 装配过程中的动作识别与作业规范性判别方法研究[D]. 西安: 西安理工大学, 2023.QIAO Qi. Research on action recognition and operation normative discrimination method in manual assembly process[D]. Xi’an: Xi’an University of Technology, 2023. 3]等手工特征,面临预处理复杂、速度慢、稳定性不足等问题,难以有效捕捉动态时序细节。Carreira等[ CARREIRA J, ZISSERMAN A. Quo vadis, action recognition a new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 4724-4733. 4]提出了膨胀3D模型(I3D),通过将一个现有优秀2D模型扩张到3D,避免重构3D网络的巨大工作量,其精度首次超越手工特征方法,推动了行为识别领域的深度学习研究。当前基于双流CNN与3D卷积神经网络的人体行为识别算法已成为主流方法[ 邹新雷. 基于3D卷积神经网络的行为识别研究与应用[D]. 成都: 电子科技大学, 2022.ZOU Xinlei. Research and application of action recognition based on 3D convolutional neural networks[D]. Chengdu: University of Electronic Science and Technology of China, 2022. 5]。双流CNN模型由于需要提取光流,因而速度不佳;3D卷积核则是通过扩展时间维度(帧序列)可直接提取时空特征,如图2所示,契合行为识别对连续动作建模的需求,2015年,Tran等[ TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015: 4489-4497. 6]提出的C3D模型采用纯3D卷积核堆叠结构,通过多层3D卷积和池化操作逐层捕捉行为模式,使用UCF101数据集训练,识别率为85.2%。但该模型应用于胶接工艺时序行为识别任务时,测试准确率仅为78.42%。
图2 三维卷积核运算示意图
Fig.2 Schematic of 3D convolution kernel operation
目前,工业领域中与行为识别的相关研究相对较少。Jones等[ JONES J D, CORTESA C, SHELTON A, et al. Fine-grained activity recognition for assembly videos[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 3728-3735. 7]针对组装过程中的行为识别任务,通过融合运动学建模与多模态感知,实现了对装配动作的细粒度识别,展现出高精度与强鲁棒性。Chen等[ CHEN C J, WANG T N, LI D N, et al. Repetitive assembly action recognition based on object detection and pose estimation[J]. Journal of Manufacturing Systems, 2020, 55: 325-333. 8]提出一种基于双流深度学习模型的装配工人操作行为识别方法,旨在实现装配操作次数统计与动作类别的精准识别。但热防护瓦胶接工艺相关行为仍依赖于人工自我约束和检测人员的提醒,智能化监测技术滞后于其他工业领域,缺乏深入研究。
如图4所示,C3D网络由对称堆叠的卷积层、池化层以及全连接层构成。各卷积层的滤波器数量依次为64、128、256、512、512,所有卷积核尺寸均遵循Tran等[ TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015: 4489-4497. 6]提出的最优设计准则,即采用3×3×3的时空卷积核,并设置步长(stride)与填充(padding)为1×1×1,以确保输入与输出的时空维度一致。特征图尺寸的缩减由3D最大池化层实现,首层池化核尺寸为1×2×2;其余池化层核尺寸与步长均为2×2×2。此设计避免了早期时间维度的过度下采样的同时适配标准16帧输入长度。网络的输入张量维度为(3×16×112×112),表示为C×L×H×W,其中C为通道数(RGB图像为3);L为时序长度(帧数);H与W分别为帧高与帧宽。训练过程中采用随机梯度下降(SGD)优化算法,激活函数为ReLU,并引入Dropout(P=0.5)以抑制过拟合[ 席志红, 冯宇. 基于改进型C3D网络的人体行为识别算法[J]. 应用科技, 2021, 48(5): 47-53.XI Zhihong, FENG Yu. A human behavior recognition algorithm based on improved C3D network[J]. Applied Science and Technology, 2021, 48(5): 47-53. 9]。
图4 C3D模型结构
Fig.4 C3D model architecture
1.3 SimA3D模型构建
1.3.1 SimAM注意力机制
胶接工艺时序行为的特点包括:(1)高动态性。关键操作在时间轴上呈现密集的特征,传统3D卷积神经网络(C3D)的固定时序采样易导致关键帧信息丢失。(2)局部显著性。特定时间点的微小动作对整体装配质量具有决定性影响,需增强模型对局部时空特征的敏感性。因此要求网络精准聚焦关键帧并抑制冗余时序噪声。标准C3D网络因均匀池化与固定卷积核限制,难以动态调整时空特征权重。为此,本文引入SimAM无参注意力机制[ YANG L X, ZHANG R Y, LI L D, et al. SimAM: A simple, parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning. New York: PMLR, 2021. 10],通过轻量化特征校准提升高动态时序行为的建模能力。
相较于传统的SE[ HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141. 11]、ECN[ WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA: IEEE, 2020: 11531-11539. 12]、CBAM[ WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Computer Vision-ECCV 2018. Cham: Springer, 2018: 3-19. 13]等通道或空间注意力机制,SimAM是一种结合通道信息和空间信息的3D权重注意力机制,在不增加网络参数的情况下,可为时序特征图的每个神经元生成自适应3D权重,其模型结构如图5所示。
图5 SimAM三维注意力机制
Fig.5 SimAM 3D attention module
SimAM注意力机制基于神经科学理论[ BARBEY A K. Network neuroscience theory of human intelligence[J]. Trends in Cognitive Sciences, 2018, 22(1): 8-20. 14],通过神经元能量函数量化其相对重要性,神经元能量越低,它与其他神经元的差异就越大,随之重要性越高。每个神经元的能量函数为
在胶接工艺时序行为识别任务中,C3D模型因时序数据的高动态性和动作重复性易陷入过拟合。高动态性导致时序信息有限,加剧参数更新波动;未归一化数据造成层间输入分布不稳定,迫使模型采用更低学习率和精细参数初始化策略,显著延缓收敛速度。同时,时序行为的微小位移及背景干扰易被未归一化特征放大噪声影响,加之不同装配场景的输入分布偏移,导致模型泛化能力下降。为此,本文采用批归一化处理[ IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. New York: ACM, 2015: 448-456. 15](Batch normalization,BN)来对网络进行正则化处理。
原始C3D网络采用随机梯度下降(Stochastic gradient descent,SGD)[ RUDER S. An overview of gradient descent optimization algorithms[EB/OL]. [2025-04-16]. https://arxiv.org/abs/1609.04747. 16]优化器,进行随机抽取单样本或小批量样本的梯度更新,该策略引入了梯度噪声,导致参数更新偏离全局最优轨迹。由于工业场景中的光照突变、机械振动等干扰,输入数据的时序关联性与空间一致性易受破坏,进一步加剧梯度偏差。为此,本文采用AdamW算法,该算法在经典Adam优化器基础上进行改进。
Adam(Adaptive moment estimation)[ KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. [2025-04-16]. https://arxiv.org/abs/1412.6980. 17]是一种融合AdaGrad与RMSProp双重优势的自适应优化算法。其更新规则包括计算梯度的一阶矩估计和二阶矩估计,并使用这些估计值来调整每个参数的学习率。Adam优化器可以表示为
AdamW(Adam with weight decay)[ LASHKARI M, GHEIBI A. Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by Adam and AdamW optimizers[EB/OL]. 2023: arXiv: 2303.16464. https://arxiv.org/abs/2303.16464. 18]在Adam基础上解耦权重衰减与梯度更新,设参数为θt,梯度为gt,学习率为η,权重衰减系数为λ1,动量系数为β1,β2,时刻为t,其更新步骤为
(14)
式中,和为偏差校正后的一阶、二阶矩估计;λ1为独立的权重衰减系数。
权重衰减(Weight decay)[ PRASAD R, UDEME A U, MISRA S, et al. Identification and classification of transportation disaster tweets using improved bidirectional encoder representations from transformers[J]. International Journal of Information Management Data Insights, 2023, 3(1): 100154. 19]作为经典正则化技术,通过在损失函数中添加权重的平方范数惩罚项来限制模型参数的大小,从而减少过拟合的风险。与Adam优化器不同,AdamW通过解耦权重衰减与参数更新,确保正则化强度与学习率无关,从而更有效抑制过拟合。这种改进使得AdamW能够更好地控制模型的复杂度,提高模型的泛化能力,提升对工业场景下噪声干扰的鲁棒性。
在增强数据集质量方面,本文融合了Mixup数据增强策略[ ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: Beyond empirical risk minimization[EB/OL]. [2025-04-16]. https://arxiv.org/abs/1710.09412. 21]。Zhang等[ ZHANG L J, DENG Z, KAWAGUCHI K, et al. How does mixup help with robustness and generalization [EB/OL]. [2025-04-16]. https://arxiv.org/abs/2010.04819. 22]证明了Mixup算法可以帮助模型提高鲁棒性和泛化能力,减少过拟合风险。Mixup数据增强策略表达式为
(17)
(18)
式中,(xi,yi)和(xj,yj)为训练数据中任意两条样本;x为样本特征,y为样本标签。从文献[ GREEN S B. How many subjects does it take to do a regression analysis[J]. Multivariate Behavioral Research, 1991, 26(3): 499-510. 23]可知,λ∈[0,1]的概率值,λ服从参数α的Beta分布。(,)是样本(xi,yi)和(xj,yj)在λ服从β(α,α)情况下新生成的数据,新生成的数据融合了训练样本的特性,可以候选加入样本集中优化模型。在实际应用以及试验过程中发现,参数α的选择对训练结果有很大的影响,因为Mixup本质上是对两张图像进行线性插值。参数α的设置将影响两段行为序列混合的效果,设置较小α值的效果通常较好,α值较大时网络性能变化不大甚至出现下降,这是由于过度的混合破坏了原有行为序列的图像空间特征,因此将参数α设置为α≤0.4。经过Mixup增强后(α≤0.4)的数据效果如图7所示。
为了客观验证SimA3D模型及其优化方法的优越性,试验选取I3D[ Carreira J, Zisserman A. Quo vadis, action recognition, a new model and the kinetics dataset[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6299-6308. 25]和SlowFast[ Feichtenhofer C, Fan H, Malik J, et al. Slowfast networks for video recognition[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6202-6211. 26]作为基准模型。所有模型均在GPTBRD-E数据集下统一使用相同Mixup数据增强策略(α=0.4),并全部采用AdamW优化器(weight_decay=5e-4),配合余弦退火动态学习率策略(Tmax=20)。在相同的试验环境下,试验对比结果如图12所示。
郭朝邦, 李文杰. 高超声速飞行器结构材料与热防护系统[J]. 飞航导弹, 2010(4): 88-94. GUOChaobang, LIWenjie. Structural materials and thermal protection system of hypersonic vehicle[J]. Aerodynamic Missiles Journal, 2010(4): 88-94.
[2]
王天诺. 基于深度学习的装配操作监测研究[D]. 青岛: 青岛理工大学, 2019. WANGTiannuo. Research on assembling operation monitoring based on deep learning[D]. Qingdao: Qingdao University of Technology, 2019.
[3]
乔琦. 装配过程中的动作识别与作业规范性判别方法研究[D]. 西安: 西安理工大学, 2023. QIAOQi. Research on action recognition and operation normative discrimination method in manual assembly process[D]. Xi’an: Xi’an University of Technology, 2023.
[4]
CARREIRAJ, ZISSERMANA. Quo vadis, action recognition a new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 4724-4733.
[5]
邹新雷. 基于3D卷积神经网络的行为识别研究与应用[D]. 成都: 电子科技大学, 2022. ZOUXinlei. Research and application of action recognition based on 3D convolutional neural networks[D]. Chengdu: University of Electronic Science and Technology of China, 2022.
[6]
TRAND, BOURDEVL, FERGUSR, et al. Learning spatiotemporal features with 3D convolutional networks[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015: 4489-4497.
[7]
JONESJ D, CORTESAC, SHELTONA, et al. Fine-grained activity recognition for assembly videos[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 3728-3735.
[8]
CHENC J, WANGT N, LID N, et al. Repetitive assembly action recognition based on object detection and pose estimation[J]. Journal of Manufacturing Systems, 2020, 55: 325-333.
[9]
席志红, 冯宇. 基于改进型C3D网络的人体行为识别算法[J]. 应用科技, 2021, 48(5): 47-53. XIZhihong, FENGYu. A human behavior recognition algorithm based on improved C3D network[J]. Applied Science and Technology, 2021, 48(5): 47-53.
[10]
YANGL X, ZHANGR Y, LIL D, et al. SimAM: A simple, parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning. New York: PMLR, 2021.
[11]
HUJ, SHENL, SUNG. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[12]
WANGQ L, WUB G, ZHUP F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA: IEEE, 2020: 11531-11539.
[13]
WOOS, PARKJ, LEEJ Y, et al. CBAM: Convolutional block attention module[C]//Computer Vision-ECCV 2018. Cham: Springer, 2018: 3-19.
[14]
BARBEYA K. Network neuroscience theory of human intelligence[J]. Trends in Cognitive Sciences, 2018, 22(1): 8-20.
[15]
IOFFES, SZEGEDYC. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. New York: ACM, 2015: 448-456.
[16]
RUDERS. An overview of gradient descent optimization algorithms[EB/OL]. [2025-04-16]. https://arxiv.org/abs/1609.04747.
[17]
KINGMAD P, BAJ. Adam: A method for stochastic optimization[EB/OL]. [2025-04-16]. https://arxiv.org/abs/1412.6980.
[18]
LASHKARIM, GHEIBIA. Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by Adam and AdamW optimizers[EB/OL]. 2023: arXiv: 2303.16464. https://arxiv.org/abs/2303.16464.
[19]
PRASADR, UDEMEA U, MISRAS, et al. Identification and classification of transportation disaster tweets using improved bidirectional encoder representations from transformers[J]. International Journal of Information Management Data Insights, 2023, 3(1): 100154.
[20]
CAZENAVET, SENTUCJ, VIDEAUM. Cosine annealing, mixnet and swish activation for computer go[M]//Advances in Computer Games. Cham: Springer International Publishing, 2022: 53-60.
[21]
ZHANGH Y, CISSEM, DAUPHINY N, et al. Mixup: Beyond empirical risk minimization[EB/OL]. [2025-04-16]. https://arxiv.org/abs/1710.09412.
[22]
ZHANGL J, DENGZ, KAWAGUCHIK, et al. How does mixup help with robustness and generalization [EB/OL]. [2025-04-16]. https://arxiv.org/abs/2010.04819.
[23]
GREENS B. How many subjects does it take to do a regression analysis[J]. Multivariate Behavioral Research, 1991, 26(3): 499-510.
CarreiraJ, ZissermanA. Quo vadis, action recognition, a new model and the kinetics dataset[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6299-6308.
[26]
FeichtenhoferC, FanH, MalikJ, et al. Slowfast networks for video recognition[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6202-6211.