HRNet

深度学习/ Paper

创建时间:2020-02-18 23:59

阅读:

ADE20K数据集网络结构排名

图像处理的粒度不同，适用不同的问题场景

HRNet通过并行连接高分辨率到低分辨率卷积来保持高分辨率表示，并通过重复跨并行卷积执行多尺度融合来增强高分辨率表示。在像素级分类、区域级分类和图像级分类中，证明了这些方法的有效性。
HRNet Main Body 网络结构 (four stages)
(1) 并行连接高分辨率到低分辨率的子网，而不是像大多数现有解决方案那样串行连接。因此，我们的方法能够保持高分辨率，而不是通过一个低到高的过程恢复分辨率，因此预测的热图可能在空间上更精确。(parallel high-to-low resolution subnetworks)
(2) 大多数现有的融合方案都将低层和高层的表示集合起来。相反，HRNet使用重复的多尺度融合，利用相同深度和相似级别的低分辨率表示来提高高分辨率表示，反之亦然，从而使得高分辨率表示对于姿态的估计也很充分。因此，预测的热图可能更准确。(multi-resolution subnetworks (multi-scale fusion))

Stem

# stem net
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1, bias=False)
self.bn1 = BatchNorm2d(64, momentum=BN_MOMENTUM)
self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False)
self.bn2 = BatchNorm2d(64, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=False)
# self.layer1 = self._make_layer(block, 64, num_channels, num_blocks)
self.layer1 = self._make_layer(Bottleneck, 64, 4)

HRNet Main Body (four stages)

特征融合的方式：
HRNet multi-resolution fusion unit

HRNetW32 意思是 channel 的数量，表格中的 C

不同的 head 应用于不同的场景

a. v1 b. v2 c. v2p
v1 Human Pose Estimation
The output is the representation only from the high-resolution stream. Other three representations are ignored

regress the heatmaps simply from the high-resolution representations output by the last exchange unit, which empirically works well.

This means that only a subset of output channels from the high-resolution convolutions is exploited and other subsets from low-resolution convolutions are lost

v2 全景分割 = 语义分割 + 实例分割
We rescale the low-resolution representations through bilinear upsampling without changing the number of channels to the high resolution, and concatenate the four representations, followed by a 1 × 1 convolution to mix the four representations.

语义分割技术方向，对象上下文 (Object Context) , 利用一个像素周围的信息进行判断。

V2p Object Detection
We construct multi-level representations by downsampling the high-resolution representation output from HRNetV2 to multiple levels.

Object Contextual Representation

参考资料：
• Deep High-Resolution Representation Learning for Visual Recognition
• Deep High-Resolution Representation Learning for Human Pose Estimation
• High-Resolution Representations for Labeling Pixels and Regions
• https://github.com/HRNet
• https://github.com/rbgirshick/yacs
• https://github.com/openseg-group/openseg.pytorch

请多多指教。