CAM实现的流程(pytorch)

   日期:2020-05-28     浏览:129    评论:0    
核心提示:之前写了一个简化版本的可视化过程,简化版的可视化没有考虑到通道之间的关系。流程图:算法思路:将要可视化的图片输进网络模型,判断出所属类别获取最后一个卷积层的输出特征图通过图片所属类别,得到权重,对获取的特征图的各个通道赋值,并且相加为单通道的特征图举个例子:如果输入一张图片,通过网络模型之后,判断这张图片为第500类(总共1000类)。特征图shape为(1,512,13,13),假设分类层为1 x 1卷积(这里就不算是最后一个卷积层,而是属于分类层)和全局平均池化组成。那么,1000个python

之前写了一个简化版本的可视化过程,简化版的可视化没有考虑到通道之间的关系。这篇将介绍cam的流程。

目录

      • 流程图
      • 算法思路
      • 举个例子
      • 代码分析
        • 1.导入各种包,并且读取类别标签
        • 2.读取图片,并预处理
        • 3.加载预训练模型
        • 4.获取特征图
        • 5.获取权重
        • 6.定义计算CAM的函数
        • 7.生成图片

流程图

算法思路

  1. 将要可视化的图片输进网络模型,判断出所属类别
  2. 获取最后一个卷积层的输出特征图
  3. 通过图片所属类别,得到权重,对获取的特征图的各个通道赋值,并且相加为单通道的特征图

举个例子

如果输入一张图片,通过网络模型之后,判断这张图片为第500类(总共1000类)。获取的特征图shape为(1,512,13,13),假设分类层为1 x 1卷积(这里就不算是最后一个卷积层,而是属于分类层)和全局平均池化组成。那么,1000个类别有1000种权重,也就是说能够给特征图赋1000种值。每个权重关注点不一样,所以才需要知道图片属于哪个类别。知道它是500类后,那么只需要拿出第500个类别的权重赋给特征图就ok了。
CAM算法有一个制约条件,需要用到全局平均池化的操作,如果最后有多层全连接层,那么CAM算法就不适用了。比如vgg16,最后一个卷积层之后,接了三个全连接层,由于卷积层的输出特征图需要flatten才能接入全连接层,在经过三个全连接层后,已经难以算出通道之间的联系,则很难去计算各个特征图通道的权重重要性。这种情况下就需要用到Grad-Cam算法了。

代码分析

先准备图片、标签以及模型
类别标签下载方法:
先安装axel:
sudo apt-get install axel
执行下载命令
axel -n 5 https://s3.amazonaws.com/outcome-blog/imagenet/labels.json
图片下载:
axel -n 5 http://media.mlive.com/news_impact/photo/9933031-large.jpg
模型下载:
senet1_1:axel -n 5 https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth
resnet18:axel -n 5 https://download.pytorch.org/models/resnet18-5c106cde.pth
densenet161: axel -n 5 https://download.pytorch.org/models/densenet161-8d451a50.pth

1.导入各种包,并且读取类别标签

from PIL import Image
import torch
from torchvision import models, transforms
from torch.autograd import Variable
from torch.nn import functional as F
import numpy as np
import cv2
import json

# 读取 imagenet数据集的类别标签
json_path = './cam/labels.json'
with open(json_path, 'r') as load_f:
    load_json = json.load(load_f)
classes = {int(key): value for (key, value)
           in load_json.items()}

2.读取图片,并预处理

# 读取 imagenet数据集的某类图片
img_path = './cam/9933031-large.jpg'
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)

# 图片预处理
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    normalize
])

img_pil = Image.open(img_path)
img_tensor = preprocess(img_pil)
img_variable = Variable(img_tensor.unsqueeze(0))

3.加载预训练模型

# 加载预训练模型
model_id = 1
if model_id == 1:
    net = models.squeezenet1_1(pretrained=False)
    pthfile = r'./pretrained/squeezenet1_1-f364aa15.pth'
    net.load_state_dict(torch.load(pthfile))
    finalconv_name = 'features'  # 获取卷积层的特征
elif model_id == 2:
    net = models.resnet18(pretrained=False)
    finalconv_name = 'layer4'
elif model_id == 3:
    net = models.densenet161(pretrained=False)
    finalconv_name = 'features'
net.eval()	# 使用eval()属性
print(net)

我只下了senet1_1,如果想使用其余两个模型,依葫芦画瓢自行修改。
打印模型的结果:

SqueezeNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (4): Fire(
      (squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (6): Fire(
      (squeeze): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (7): Fire(
      (squeeze): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (8): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (9): Fire(
      (squeeze): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (10): Fire(
      (squeeze): Conv2d(384, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (11): Fire(
      (squeeze): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (12): Fire(
      (squeeze): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AdaptiveAvgPool2d(output_size=(1, 1))
  )
)

可以看到特征提取部分在(features)中,分类层在(classifier)中。

4.获取特征图

features_blobs = []     # 后面用于存放特征图

def hook_feature(module, input, output):
    features_blobs.append(output.data.cpu().numpy())

# 获取 features 模块的输出
net._modules.get(finalconv_name).register_forward_hook(hook_feature)

register_forward_hook可以获取中间层输出,具体可自行百度。

5.获取权重

# 获取权重
params = list(net.parameters())
print(len(params))		# 52
weight_softmax = np.squeeze(params[-2].data.numpy())	# shape:(1000, 512)

params 中保存了模型的所有权重,怎么索引到我们需要的呢?再回到模型打印结果那里,由于pooling层和dropout层是不保存参数的,如果将所有的卷积、激活操作数下来,发现一共有52层有参数。如果要获取features模块到classifier模块的权重,那么就是获取classifier中(1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))的参数。这时,忽略最后一个全局平均池化,那么就是索引为-2的参数了。

logit = net(img_variable)				# 计算输入图片通过网络后的输出值
print(logit.shape)						# torch.Size([1, 1000])
print(params[-2].data.numpy().shape)	# 权重有1000种 (1000, 512, 1, 1)
print(features_blobs[0].shape)			# 特征图大小为 (1, 512, 13, 13)

# 结果有1000类,进行排序,并获得排序索引
h_x = F.softmax(logit, dim=1).data.squeeze()	
print(h_x.shape)						# torch.Size([1000])
probs, idx = h_x.sort(0, True)
probs = probs.numpy()					# 概率值排序
idx = idx.numpy()						# 类别索引排序,概率值越高,索引越靠前

# 取概率值为前5的类别看看类别名和概率值
for i in range(0, 5):
    print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))
''' 0.678 -> mountain bike, all-terrain bike, off-roader 0.088 -> bicycle-built-for-two, tandem bicycle, tandem 0.042 -> unicycle, monocycle 0.038 -> horse cart, horse-cart 0.019 -> lakeside, lakeshore '''

6.定义计算CAM的函数

# 定义计算CAM的函数
def returnCAM(feature_conv, weight_softmax, class_idx):
    # 类激活图上采样到 256 x 256
    size_upsample = (256, 256)
    bz, nc, h, w = feature_conv.shape
    output_cam = []
    # 将权重赋给卷积层:这里的weigh_softmax.shape为(1000, 512)
    # feature_conv.shape为(1, 512, 13, 13)
    # weight_softmax[class_idx]由于只选择了一个类别的权重,所以为(1, 512)
    # feature_conv.reshape((nc, h * w))后feature_conv.shape为(512, 169)
    cam = weight_softmax[class_idx].dot(feature_conv.reshape((nc, h * w)))
    print(cam.shape)		# 矩阵乘法之后,为各个特征通道赋值。输出shape为(1,169)
    cam = cam.reshape(h, w) # 得到单张特征图
    # 特征图上所有元素归一化到 0-1
    cam_img = (cam - cam.min()) / (cam.max() - cam.min())  
    # 再将元素更改到 0-255
    cam_img = np.uint8(255 * cam_img)
    output_cam.append(cv2.resize(cam_img, size_upsample))
    return output_cam

7.生成图片

# 对概率最高的类别产生类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[0]])
# 融合类激活图和原始图片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM0.jpg', result)

cv2.applyColorMap函数的作用这里不再赘述,上一篇博客中已经涉及。


# 对概率排在第五的类别产生类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[4]])
# 融合类激活图和原始图片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM1.jpg', result)


差别一目了然

参考链接:
https://blog.csdn.net/qq_36825778/article/details/104193642
https://blog.csdn.net/u014264373/article/details/85415921

 
标签: 文章标签: CAM
打赏
 本文转载自:网络 
所有权利归属于原作者,如文章来源标示错误或侵犯了您的权利请联系微信13520258486
更多>最近资讯中心
更多>最新资讯中心
更多>相关资讯中心
0相关评论

推荐图文
推荐资讯中心
点击排行
最新信息
新手指南
采购商服务
供应商服务
交易安全
关注我们
手机网站:
新浪微博:
微信关注:

13520258486

周一至周五 9:00-18:00
(其他时间联系在线客服)

24小时在线客服