之前写了一个简化版本的可视化过程,简化版的可视化没有考虑到通道之间的关系。这篇将介绍cam的流程。
目录
- 流程图
- 算法思路
- 举个例子
- 代码分析
- 1.导入各种包,并且读取类别标签
- 2.读取图片,并预处理
- 3.加载预训练模型
- 4.获取特征图
- 5.获取权重
- 6.定义计算CAM的函数
- 7.生成图片
流程图
算法思路
- 将要可视化的图片输进网络模型,判断出所属类别
- 获取最后一个卷积层的输出特征图
- 通过图片所属类别,得到权重,对获取的特征图的各个通道赋值,并且相加为单通道的特征图
举个例子
如果输入一张图片,通过网络模型之后,判断这张图片为第500类(总共1000类)。获取的特征图shape为(1,512,13,13),假设分类层为1 x 1卷积(这里就不算是最后一个卷积层,而是属于分类层)和全局平均池化组成。那么,1000个类别有1000种权重,也就是说能够给特征图赋1000种值。每个权重关注点不一样,所以才需要知道图片属于哪个类别。知道它是500类后,那么只需要拿出第500个类别的权重赋给特征图就ok了。
CAM算法有一个制约条件,需要用到全局平均池化的操作,如果最后有多层全连接层,那么CAM算法就不适用了。比如vgg16,最后一个卷积层之后,接了三个全连接层,由于卷积层的输出特征图需要flatten才能接入全连接层,在经过三个全连接层后,已经难以算出通道之间的联系,则很难去计算各个特征图通道的权重重要性。这种情况下就需要用到Grad-Cam算法了。
代码分析
先准备图片、标签以及模型
类别标签下载方法:
先安装axel:
sudo apt-get install axel
执行下载命令
axel -n 5 https://s3.amazonaws.com/outcome-blog/imagenet/labels.json
图片下载:
axel -n 5 http://media.mlive.com/news_impact/photo/9933031-large.jpg
模型下载:
senet1_1:axel -n 5 https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth
resnet18:axel -n 5 https://download.pytorch.org/models/resnet18-5c106cde.pth
densenet161: axel -n 5 https://download.pytorch.org/models/densenet161-8d451a50.pth
1.导入各种包,并且读取类别标签
from PIL import Image
import torch
from torchvision import models, transforms
from torch.autograd import Variable
from torch.nn import functional as F
import numpy as np
import cv2
import json
# 读取 imagenet数据集的类别标签
json_path = './cam/labels.json'
with open(json_path, 'r') as load_f:
load_json = json.load(load_f)
classes = {int(key): value for (key, value)
in load_json.items()}
2.读取图片,并预处理
# 读取 imagenet数据集的某类图片
img_path = './cam/9933031-large.jpg'
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
# 图片预处理
preprocess = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
normalize
])
img_pil = Image.open(img_path)
img_tensor = preprocess(img_pil)
img_variable = Variable(img_tensor.unsqueeze(0))
3.加载预训练模型
# 加载预训练模型
model_id = 1
if model_id == 1:
net = models.squeezenet1_1(pretrained=False)
pthfile = r'./pretrained/squeezenet1_1-f364aa15.pth'
net.load_state_dict(torch.load(pthfile))
finalconv_name = 'features' # 获取卷积层的特征
elif model_id == 2:
net = models.resnet18(pretrained=False)
finalconv_name = 'layer4'
elif model_id == 3:
net = models.densenet161(pretrained=False)
finalconv_name = 'features'
net.eval() # 使用eval()属性
print(net)
我只下了senet1_1,如果想使用其余两个模型,依葫芦画瓢自行修改。
打印模型的结果:
SqueezeNet(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2))
(1): ReLU(inplace)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(3): Fire(
(squeeze): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(4): Fire(
(squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(6): Fire(
(squeeze): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(7): Fire(
(squeeze): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(8): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(9): Fire(
(squeeze): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(10): Fire(
(squeeze): Conv2d(384, 48, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(11): Fire(
(squeeze): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
(12): Fire(
(squeeze): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
(squeeze_activation): ReLU(inplace)
(expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
(expand1x1_activation): ReLU(inplace)
(expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(expand3x3_activation): ReLU(inplace)
)
)
(classifier): Sequential(
(0): Dropout(p=0.5)
(1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
(2): ReLU(inplace)
(3): AdaptiveAvgPool2d(output_size=(1, 1))
)
)
可以看到特征提取部分在(features)中,分类层在(classifier)中。
4.获取特征图
features_blobs = [] # 后面用于存放特征图
def hook_feature(module, input, output):
features_blobs.append(output.data.cpu().numpy())
# 获取 features 模块的输出
net._modules.get(finalconv_name).register_forward_hook(hook_feature)
register_forward_hook可以获取中间层输出,具体可自行百度。
5.获取权重
# 获取权重
params = list(net.parameters())
print(len(params)) # 52
weight_softmax = np.squeeze(params[-2].data.numpy()) # shape:(1000, 512)
params 中保存了模型的所有权重,怎么索引到我们需要的呢?再回到模型打印结果那里,由于pooling层和dropout层是不保存参数的,如果将所有的卷积、激活操作数下来,发现一共有52层有参数。如果要获取features模块到classifier模块的权重,那么就是获取classifier中(1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
的参数。这时,忽略最后一个全局平均池化,那么就是索引为-2的参数了。
logit = net(img_variable) # 计算输入图片通过网络后的输出值
print(logit.shape) # torch.Size([1, 1000])
print(params[-2].data.numpy().shape) # 权重有1000种 (1000, 512, 1, 1)
print(features_blobs[0].shape) # 特征图大小为 (1, 512, 13, 13)
# 结果有1000类,进行排序,并获得排序索引
h_x = F.softmax(logit, dim=1).data.squeeze()
print(h_x.shape) # torch.Size([1000])
probs, idx = h_x.sort(0, True)
probs = probs.numpy() # 概率值排序
idx = idx.numpy() # 类别索引排序,概率值越高,索引越靠前
# 取概率值为前5的类别看看类别名和概率值
for i in range(0, 5):
print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))
''' 0.678 -> mountain bike, all-terrain bike, off-roader 0.088 -> bicycle-built-for-two, tandem bicycle, tandem 0.042 -> unicycle, monocycle 0.038 -> horse cart, horse-cart 0.019 -> lakeside, lakeshore '''
6.定义计算CAM的函数
# 定义计算CAM的函数
def returnCAM(feature_conv, weight_softmax, class_idx):
# 类激活图上采样到 256 x 256
size_upsample = (256, 256)
bz, nc, h, w = feature_conv.shape
output_cam = []
# 将权重赋给卷积层:这里的weigh_softmax.shape为(1000, 512)
# feature_conv.shape为(1, 512, 13, 13)
# weight_softmax[class_idx]由于只选择了一个类别的权重,所以为(1, 512)
# feature_conv.reshape((nc, h * w))后feature_conv.shape为(512, 169)
cam = weight_softmax[class_idx].dot(feature_conv.reshape((nc, h * w)))
print(cam.shape) # 矩阵乘法之后,为各个特征通道赋值。输出shape为(1,169)
cam = cam.reshape(h, w) # 得到单张特征图
# 特征图上所有元素归一化到 0-1
cam_img = (cam - cam.min()) / (cam.max() - cam.min())
# 再将元素更改到 0-255
cam_img = np.uint8(255 * cam_img)
output_cam.append(cv2.resize(cam_img, size_upsample))
return output_cam
7.生成图片
# 对概率最高的类别产生类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[0]])
# 融合类激活图和原始图片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM0.jpg', result)
cv2.applyColorMap
函数的作用这里不再赘述,上一篇博客中已经涉及。
# 对概率排在第五的类别产生类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[4]])
# 融合类激活图和原始图片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM1.jpg', result)
差别一目了然
参考链接:
https://blog.csdn.net/qq_36825778/article/details/104193642
https://blog.csdn.net/u014264373/article/details/85415921