SiamRPN代码讲解，推理测试讲解

您所在的位置：网站首页 › 汉宁窗参数 › SiamRPN代码讲解，推理测试讲解

SiamRPN代码讲解，推理测试讲解

2024-06-28 13:59:58| 来源: 网络整理| 查看: 265

siamRPN论文：High Performance Visual Tracking with Siamese Region Proposal Network

gitHub代码：https://github.com/HonglinChu/SiamTrackers/tree/master/SiamRPN/SiamRPN

论文模型架构：在这里插入图片描述

在此文章中将以代码+注释的形式详解推理过程，即test.py中的代码。后续有空将会详解训练过程即train.py的代码。

siamrpn推理的大致过程：

代码阅读大致顺序：

如果读过前面本人写的Siamfc相关文章： SiamFC代码讲解，训练过程讲解 SiamFC代码讲解，推理测试讲解

那么阅读本篇文章会比较轻松。（因为希望你了解本人的书写流程） Siamrpn与Siamfc相比，就是多了一个rpn网络，导致会多一个相关后处理的操作。后处理的相关操作比较复杂，希望本文能讲解清楚。

如果你不了解什么是rpn网络，那么本人推荐先看此篇文章：（博主：懒人元）RPN 解析

代码流程

./bin/my_test.py 首先看到Line27，进入到

SiamRPNTracker类的init函数

./siamrpn/tracker.py中代码详解（注释）过程简述：

超参初始化创建siamfc架构；模型加载；初始transformer初始所有候选区域框初始proposed_box_num个 window（由hanning窗生成，用于最后加权求和） ExperimentOTB类：

回到./bin/my_test.py 看到Line33 通过ExperimentOTB类获取数据集，与siamFC类似： siamFC中相关代码详解（注释）这里不做赘述

experiments/otb.py下run函数

回到./bin/my_test.py 看到Line56 （关键代码）进入到./experiments/otb.py中run函数代码详解（注释）过程简述：

通过for循环遍历dataset 并执行如下操作：

创建输出结果.txt（如果不存在的话）进行推理（最重要）结果保存

当前函数中看到tracker.track（experiments/otb.py—Line55）这是最关键的一步，点进去，进入到./siamrpn/racker.py下的track函数

siamrpn/tracker.py下track函数

ps:需要注意不要进入到siamFC.py下track函数代码详解（注释）过程简述：

获得第一个框（目标框的参数）如果是第一帧，进行推理初始化（init函数）如果是非第一帧，进行正常推理（update函数）结果框的显示 siamrpn/tracker.py下init函数：

代码详解（注释）过程简述：

所有初始box 的坐标转换（left,up,w,h）–>(cx,cy,w,h)初始化裁剪图片的中心点坐标 (self.pos)初始化大小（后续会随着过程改变），原始图片大小(self.target_sz;;self.origin_target_sz)初始化exemplar_img初始化score_filters_z, reg_filters_z(self.model.track_init函数)并固定，用于后续互相关

如果非第一帧，跳转进入到update函数（siamfc.py --Line301）：

siamrpn/tracker.py下update函数：：

代码详解（注释）过程简述：

获得detection_img经过模型，得到pred_score, pred_regression(dx,dy,dw,dh)后处理 3.1 分类后处理 3.2 框偏移后处理获取惩罚因子并更新前景概率获得最优前景概率下的id(index)再次修正框值，得到最终值修改下一帧的中心偏差和图片大小偏差返回Box值，用于后续可显示化

————————————————————————————————————

代码解析： TrackerSiamRPN类init函数 ./siamrpn/tracker.py----Line18 class SiamRPNTracker: def __init__(self, model_path,cfg=None,is_deterministic=False): self.name='SiamRPN' if cfg: config.update(cfg) self.model = SiamRPNNet() self.is_deterministic = is_deterministic checkpoint = torch.load(model_path) if 'model' in checkpoint.keys(): self.model.load_state_dict(torch.load(model_path)['model'])# else: self.model.load_state_dict(torch.load(model_path)) self.model = self.model.cuda() self.model.eval() self.transforms = transforms.Compose([ToTensor()]) self.anchors = generate_anchors(config.total_stride, ##8 config.anchor_base_size, ##8 config.anchor_scales, ##np.array([8,]) config.anchor_ratios, ##np.array([0.33, 0.5, 1, 2, 3]) config.valid_scope) # ##19 self.window = np.tile(np.outer(np.hanning(config.score_size), np.hanning(config.score_size))[None, :], [config.anchor_num, 1, 1]).flatten() ##生成5个window, 由汉宁窗生成,汉宁窗是使用加权余弦形成的。 ##self.window的值：越靠近中心，值越大 ##用于最后加权求和 """ np.outer() 的doc说明 Compute the outer product of two vectors. Given two vectors, a = [a0, a1, ..., aM] and b = [b0, b1, ..., bN], the outer product [1] is: [[a0*b0 a0*b1 ... a0*bN ] [a1*b0 . [ ... . [aM*b0 aM*bN ]] """

对应代码补充： config.update（参数初始化） SiamRPNNet类（详细模型搭建代码） generate_anchor函数

run函数 otb.py下run函数（otb.py---Line38） def run(self, tracker, visualize=False): print('Running tracker %s on %s...' % ( tracker.name, type(self.dataset).__name__)) # loop over the complete dataset for s, (img_files, anno) in enumerate(self.dataset): ## img_files是list ，保存的是dataset中某一子文件夹中的所有图片路径 ##anno 是list ，保存的是dataset中某一子文件夹中的groundtruth路径 seq_name = self.dataset.seq_names[s] ##取第s批数据 print('--Sequence %d/%d: %s' % (s + 1, len(self.dataset), seq_name)) # skip if results exist record_file = os.path.join( self.result_dir, tracker.name, '%s.txt' % seq_name) if os.path.exists(record_file): print(' Found results, skipping', seq_name) continue # tracking loop boxes, times = tracker.track( ##最重要部分 img_files, anno[0, :], visualize=visualize) assert len(boxes) == len(anno) """ img_files: 保存的是一个文件夹下所有图片的路径 anno[0, :]: 第一张图片的annotation值；目标框的annnotation值（因为siamfc始终实以第一帧图片作为目标框） visualize: 结果的可视化 """ # record results self._record(record_file, boxes, times) track函数 siamrpn/tarcker.py下track函数（.siamrpn/tracker.py---Line245） def track(self, img_files, box, visualize=False): ##box: 第一帧，在siamfc中是要跟踪的物体，且后续不会发生变化 frame_num = len(img_files) ##总帧数 boxes = np.zeros((frame_num, 4)) ##准备预测所有框的参数 boxes[0] = box ##获得第一个框（目标框的参数） times = np.zeros(frame_num) ## 时间，用于后续计算fps for f, img_file in enumerate(img_files): img = ops.read_image(img_file) ##img_file 这里传入的是一个img路径 begin = time.time() if f == 0: ##第一帧 self.init(img, box) ##初始化了很多参数，并固定feature_z 并作为后续的卷积核 else:##不过不是第一帧 boxes[f, :] = self.update(img) ##重要函数，实际推理过程 times[f] = time.time() - begin##耗时 if visualize: ops.show_image(img, boxes[f, :]) return boxes, times init函数（第一帧） siamrpn/tarcker.py下init函数（.siamrpn/tracker.py---Line72） def init(self, frame, bbox): """ initialize siamrpn tracker Args: frame: an RGB image bbox: one-based bounding box [left_x, up_y, width, height] """ self.bbox = np.array([bbox[0]-1 + (bbox[2]-1) / 2 , bbox[1]-1 + (bbox[3]-1) / 2 , bbox[2], bbox[3]]) #cx,cy,w,h self.pos = np.array([bbox[0]-1 + (bbox[2]-1) / 2 , bbox[1]-1 + (bbox[3]-1) / 2]) # center x, center y, zero based ## center_pos ##self.pos=self.bbox[:2,...] self.target_sz = np.array([bbox[2], bbox[3]]) # width, height ## ps,后续会更新 self.origin_target_sz = np.array([bbox[2], bbox[3]])#原始w,h 后续不会更新 # get exemplar img self.img_mean = np.mean(frame, axis=(0, 1)) ###边界填充像素值 ## exemplar_img, scale_z, _ = get_exemplar_image(frame, self.bbox,config.exemplar_size, config.context_amount, self.img_mean) ## scale_z ==127 / crop_size 未用到 # get exemplar feature exemplar_img = self.transforms(exemplar_img)[None, :, :, :]#在测试阶段，转换成tensor类型就可以了 self.model.track_init(exemplar_img.cuda()) ##初始score_filters_z, reg_filters_z

对应代码补充： self.model.track_init()

update函数（非第一帧） siamrpn/tarcker.py下update函数（.siamrpn/tracker.py---Line97） def update(self, frame): """track object based on the previous frame Args: frame: an RGB image Returns: bbox: tuple of 1-based bounding box(xmin, ymin, xmax, ymax) """ instance_img_np, _, _, scale_x = get_instance_image(frame, self.bbox, config.exemplar_size, config.instance_size, config.context_amount, self.img_mean) ##scale_x == 271/detection_frame_crop_size 值靠近1 instance_img = self.transforms(instance_img_np)[None, :, :, :] ##model_inference_ pred_score, pred_regression = self.model.track(instance_img.cuda())##互操作；；后续会有补充代码说明 # pred_score=1,2x5,19,19 ; pre_regression=1,4x5,19,19 ## 前后背景概率；；；；；；；预测框的值：偏移中心点坐标和预测wh ######后处理 ### 1. 分类后处理 pred_conf = pred_score.reshape(-1, 2, config.anchor_num * config.score_size * config.score_size).permute(0,2,1) ##1,2*5,19,19---->1,2,5*19*19--->1，5*19*19,2 # 前后背景所有框个数： 5*19*19;;;分类问题 score_pred = F.softmax(pred_conf, dim=2)[0, :, 1].cpu().detach().numpy() # 计算预测分类得分 ##score_pred.shape==(1, 2*5*19*19,1) ↑ 只取"物体"的概率，这里，默认第一个值是”背景"的概率，第二个值是”物体“的概率 ### 2.框偏移后处理 pred_offset = pred_regression.reshape(-1, 4,config.anchor_num * config.score_size * config.score_size).permute(0,2,1) ##1,4*5,19,19---->1,4,5*19*19--->1，5*19*19,4 # 回归问题，，true_box==proposed_box+offset proposed_box是初始固定的，这里回归问题是offset，也就是模型要学习的东西 delta = pred_offset[0].cpu().detach().numpy() #使用detach()函数来切断一些分支的反向传播;返回一个新的Variable，从当前计算图中分离下来的，但是仍指向原变量的存放位置,不同之处只是requires_grad为false，得到的这个Variable永远不需要计算其梯度，不具有grad。 #即使之后重新将它的requires_grad置为true,它也不会具有梯度grad #这样我们就会继续使用这个新的Variable进行计算，后面当我们进行反向传播时，到该调用detach()的Variable就会停止，不能再继续向前进行传播 box_pred = box_transform_inv(self.anchors, delta) #通过 anchors 和 offset 来获得预测的box ；；；后续会有补充代码说明 ###self.anchors: 初始的anchors == x,y,w,h 其中x,y 是相对于图中心的坐标偏移（经过dx,dy计算后）；；w,h为候选预测框的大小(经过dw,dh计算后) ###e.g. self.anchors[0]==-72,-72 ,104,32 ##因此，这里计算的box_pred x,y也是相对于图中心的坐标偏移 """ 下面如不好理解可以跳过所以true_box_x = F( center_x + box_pred[0] ) ##其中F 代表某种变换，可能是惩罚因子 ;;其中center_x ==上一时刻的box[0] true_box_y = F( center_y + box_pred[1] ) true_box_w = G( w + box_pred[2] ) ##其中G 代表某种变换,可能与F相同也可能不同;;功能可能是加权求和 true_box_h = G( y + box_pred[3] ) """ def change(r): return np.maximum(r, 1. / r) #np.maximum(x,y): x 和 y 逐位进行比较选择最大值 def sz(w, h): pad = (w + h) * 0.5 ##（很像模板图像扩充)） sz2 = (w + pad) * (h + pad) return np.sqrt(sz2) def sz_wh(wh): pad = (wh[0] + wh[1]) * 0.5 sz2 = (wh[0] + pad) * (wh[1] + pad)## (模板图像扩充) return np.sqrt(sz2) ####获得惩罚因子，对应论文中 4.3 公式(13) s_c = change(sz(box_pred[:, 2], box_pred[:, 3]) / (sz_wh(self.target_sz * scale_x))) # scale penalty """ sz_result=sz(box_pred[:, 2], box_pred[:, 3]) sz_wh_result=sz_wh(self.target_sz * scale_x) ##target_sz.shape==(1,2) scale_x近似 127/257 change(sz_result/sz_wh_result) 只有当box_pred[:,2:]===self.target_sz的时候为1 ，其他情况下肯定大于1 """ r_c = change((self.target_sz[0] / self.target_sz[1]) / (box_pred[:, 2] / box_pred[:, 3])) # ratio penalty ##获取宽高比率的最大值 penalty = np.exp(-(r_c * s_c - 1.) * config.penalty_k)#尺度惩罚和比例惩罚获得最终惩罚因子 ###penalty_k =0.22 ####获得惩罚因子，对应论文中 4.3 公式(13) pscore = penalty * score_pred#对每一个anchors的分类预测×惩罚因子 pscore = pscore * (1 - config.window_influence) + self.window * config.window_influence #再乘以余弦窗，加重图片中间是物体的概率，削减图片边缘是物体的概率 """ pscore：两个权重求和 self.window汉宁窗大小19*19，越靠近中心点值越大 window_influence = 0.40 增加中心点附近的的像素值为物体的概率 """ ##选择最优框 best_pscore_id = np.argmax(pscore) #得到最大的得分对应的下标 target = box_pred[best_pscore_id, :] / scale_x #最优物体概率对应id的box进行scale_x（==271/detection_frame_crop_size）缩放 lr = penalty[best_pscore_id] * score_pred[best_pscore_id] * config.lr_box#预测框的学习率 ##CONFIG.LR_BOX==0.30 ???推理预测的时候为什么还要涉及到lr???（只影响后续更新 box_w,box_h） ###原因是------用于后续加权修正W,H ####如下步骤是获得真正的box (true_box) ##首先要知道,这里的target的值===x,y,w,h 其中x,y 是相对于图中心的坐标偏移（经过dx,dy计算后）；；w,h为候选预测框的大小(经过dw,dh计算后) res_x = np.clip(target[0] + self.pos[0], 0, frame.shape[1])#w=frame.shape[1] ##等价于np.clip(x+offset_x , 0 , W) 防止cx 越界 ##np.clip(a,a_min,a_max)函数该函数的作用是将数组a中的所有数限定到范围a_min和a_max中。 """ res: result 最终结果 target== cx,cy,w,h self.pos== center_x,center_y """ res_y = np.clip(target[1] + self.pos[1], 0, frame.shape[0])#h=frame.shape[0] ###宽高加权修正， lr_box的作用 res_w = np.clip(self.target_sz[0] * (1 - lr) + target[2] * lr, config.min_scale * self.origin_target_sz[0], config.max_scale * self.origin_target_sz[0]) """ config.min_scale ==0.1 config.max_scale==10 target_sz初始值为 origin_target_sz,后续会进行更新一般情况下： res_w= self.target_sz[0] * (1 - lr) + target[2] * lr ；；权重求和 """ res_h = np.clip(self.target_sz[1] * (1 - lr) + target[3] * lr, config.min_scale * self.origin_target_sz[1], config.max_scale * self.origin_target_sz[1]) ###更新信息 self.pos = np.array([res_x, res_y]) #更新之后的坐标 self.target_sz = np.array([res_w, res_h]) #更新框大小 bbox = np.array([res_x, res_y, res_w, res_h])##得到最终预测的box值 self.bbox = ( #cx, cy, w, h np.clip(bbox[0], 0, frame.shape[1]).astype(np.float64), ##np.clip 在这里是防止越界 np.clip(bbox[1], 0, frame.shape[0]).astype(np.float64), np.clip(bbox[2], 10, frame.shape[1]).astype(np.float64), np.clip(bbox[3], 10, frame.shape[0]).astype(np.float64)) #这个用来画图使用 bbox=np.array([# left,up, W, H self.pos[0] + 1 - (self.target_sz[0]-1) / 2, self.pos[1] + 1 - (self.target_sz[1]-1) / 2, self.target_sz[0], self.target_sz[1]]) #return self.bbox, score_pred[best_pscore_id] """ 可以看到没有修改 feature_z 和 score_filters_z, reg_filters_z 和siamfc一致，只选取第一帧作为模板图(exemplar_img) """ return bbox

论文中，惩罚因子的计算公式：在这里插入图片描述

对应代码补充： self.model.track box_transform_inv函数

————————————————————————————————————

代码补充说明 Config类功能：超参数的定义 class Config: # dataset related exemplar_size = 127 # exemplar frame_size instance_size = 271 #默认271 # detection frame_size；；与论文不同，论文大小是255 context_amount = 0.5 # context amount 图像模板扩充大小 sample_type = 'uniform' #-------- train_epoch_size = 1000 # val_epoch_size = 100 # out_feature = 19 #对应论文，互相关后的尺寸大小；；与论文不同，这里是19，因为detection_frame_size的不同 #max_inter = 80# eps = 0.01 # #-------- # training related #... # training related total_stride = 8 ## training related valid_scope = int((instance_size - exemplar_size) / total_stride + 1) ## (271-127)/8+1 ==18+1==19 等于最后互相关后的尺寸大小 anchor_scales = np.array([8,]) ##Backbone会进行8倍下采样 anchor_ratios = np.array([0.33, 0.5, 1, 2, 3]) ##5个anchor的尺度大小 anchor_num = len(anchor_scales) * len(anchor_ratios)##==5 论文中对应K的数字 anchor_base_size = 8 ##Backbone会进行8倍下采样 pos_threshold = 0.6 neg_threshold = 0.3 num_pos = 16 num_neg = 48 lamb = 5 # 原始 cls:res = 1:5 save_interval = 1 show_interval = 100#100 show_topK = 3 pretrained_model = './models/alexnet.pth' # tracking related gray_ratio = 0.25 blur_ratio = 0.15 score_size = int((instance_size - exemplar_size) / total_stride + 1) ##### (271-127)/8+1 ==18+1==19 等于最后互相关后的尺寸大小 penalty_k =0.22 #0.22 #0.22 #0.16 window_influence = 0.40#0.40 #0.40 #0.40 会创建汉宁窗相关窗口，进行后续的加权求和 lr_box =0.30 #0.30 #0.30 min_scale = 0.1 ##预测框的最小情况比率 max_scale = 10 ##预测框的最大情况比率 def update(self, cfg): ##更新超参数 for k, v in cfg.items(): setattr(self, k, v) self.score_size = (self.instance_size - self.exemplar_size) //self.total_stride + 1 # #self.valid_scope = int((self.instance_size - self.exemplar_size) / self.total_stride / 2)#anchor的范围 self.valid_scope= self.score_size ## (271-127)/8+1 ==18+1==19 config = Config()

补充说明： Config类的超参数不用刻意去记住，等看到代码中存在config.XXX的时候，再去查看。这能进一步去理解代码的含义。

SiamRPNNet类（详细模型搭建代码） init函数+update函数 class SiamRPNNet(nn.Module): def __init__(self, ): super(SiamRPNNet, self).__init__() ##exemplar尺寸变化情况 detection尺寸变化情况 ## conv_O=(I-K+2P)/S+1 maxpool_O=(I-k)/s+1 exemplar_size = 127 instance_size = 271 self.featureExtract = nn.Sequential( ## 127 271 nn.Conv2d(3, 96, 11, stride=2), # stride=2 (127-11)/2+1==59 (271-11)/2+1==131 nn.BatchNorm2d(96), # nn.MaxPool2d(3, stride=2), # stride=2 默认valid模式， ##(59-3)/2+1==29 （131-3）/2 +1==65 nn.ReLU(inplace=True), nn.Conv2d(96, 256, 5), ##25 61 nn.BatchNorm2d(256), nn.MaxPool2d(3, stride=2), # stride=2 ##12 ##30 nn.ReLU(inplace=True), nn.Conv2d(256, 384, 3), ##10 ##28 nn.BatchNorm2d(384), # nn.ReLU(inplace=True), nn.Conv2d(384, 384, 3), ##8 ##26 nn.BatchNorm2d(384), nn.ReLU(inplace=True), nn.Conv2d(384, 256, 3), ##6 ## 24 nn.BatchNorm2d(256), # ) self.anchor_num = config.anchor_num #每一个位置有5个anchor self.input_size = config.instance_size ##config.instance_size==271 self.score_displacement = int((self.input_size - config.exemplar_size) / config.total_stride) ##(271-127)/8 ==18 self.conv_cls1 = nn.Conv2d(256, 256 * 2 * self.anchor_num, kernel_size=3, stride=1, padding=0) self.conv_r1 = nn.Conv2d(256, 256 * 4 * self.anchor_num, kernel_size=3, stride=1, padding=0) self.conv_cls2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=0) self.conv_r2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=0) self.regress_adjust = nn.Conv2d(4 * self.anchor_num, 4 * self.anchor_num, 1)# ## 1*卷积用于线性回归后的一个1*1conv操作 def forward(self, template, detection): N = template.size(0) # N=32 batch==32 template_feature = self.featureExtract(template)#[N,256,6,6] ###exemplar_frame--->exemplar_feature detection_feature = self.featureExtract(detection)#[N,256,24,24] ##detection_frame--->detection_feature ##self.anchor_num==5 kernel_score = self.conv_cls1(template_feature).view(N, 2 * self.anchor_num, 256, 4, 4) #N,2*5,256,4,4 kernel_regression = self.conv_r1(template_feature).view(N, 4 * self.anchor_num, 256, 4, 4) #N,4*5,256,4,4 conv_score = self.conv_cls2(detection_feature) #N,256,22,22#对齐操作 conv_regression = self.conv_r2(detection_feature)#N,256,22,22 ##组卷积类别分支互相关操作 conv_scores = conv_score.reshape(1, -1, self.score_displacement + 4, self.score_displacement + 4)#1,Nx256,22,22 score_filters = kernel_score.reshape(-1, 256, 4, 4) # Nx10,256,4,4 pred_score = F.conv2d(conv_scores, score_filters, groups=N).reshape(N, 10, self.score_displacement + 1,self.score_displacement + 1)##groups=N result.shape==1,32*10,19,19 ##组卷积线性回归分支互相关操作 conv_reg = conv_regression.reshape(1, -1, self.score_displacement + 4, self.score_displacement + 4)##N,256,22,22-->1,N*256,22,22 reg_filters = kernel_regression.reshape(-1, 256, 4, 4)##N,4*5,256,4,4--->N*20,256,4,4 pred_regression = self.regress_adjust(F.conv2d(conv_reg, reg_filters, groups=N).reshape(N, 20, self.score_displacement + 1,self.score_displacement + 1)) ### result.shape==1,32*20,19,19 return pred_score, pred_regression

##补充说明：

卷积输出conv_O=(I-K+2P)/S+1池化输出maxpool_O==I-K+2P)/S+1 （PS：池化一般P=0） self.conv_cls1，self.conv_r1;;self.conv_cls2，self.conv_r2; 对应论文中如下： self.conv_cls1 = nn.Conv2d(256, 256 * 2 * self.anchor_num, kernel_size=3, stride=1, padding=0) self.conv_r1 = nn.Conv2d(256, 256 * 4 * self.anchor_num, kernel_size=3, stride=1, padding=0) self.conv_cls2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=0) self.conv_r2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=0)

在这里插入图片描述

需要重点关注 exemplar_frame和detection_frame经过backbone的尺度变化。需要注意，论文中detection_frame_size初始值为255，而代码中为271。这就导致了论文中模型架构图的一些尺寸发生变化，紫框对应的是255尺寸大小后的结果，与271尺寸大小后的结果相差[（271-255）/8==]2。固互相关后的尺寸大小为19而不是17。 组卷积。口诀：被卷积的（w,h比较大的）的通道为B×C，卷积核(小的wh)的通道数为C。group=N, 组卷积后结果为：1, N×10,19,19。列如本文中的分类分支： detecition_cls_feature.shape=N,256,22,22 ，则需要reshape为 -1, N*C,22,22 (C=256) [被卷积的（w,h比较大的）的通道为B*C] exemplar_cls_feature.shape=N,2*5,256,4,4，则需要reshape为 -1, C,4,4 (C=256) [卷积核(小的wh)的通道数为C] generate_anchor函数 ./utils.py---Line33 功能：初始创建19*19尺寸下的所有anchor def generate_anchors(total_stride, base_size, scales, ratios, score_size): """ ##total_stride=8 ##base_size=8 ##scales=numpy.array([8.]) ##rations=numpy.array([0.33,0.5,1,2,3]) ##score_size=19 """ anchor_num = len(ratios) * len(scales) ##默认为5 anchor = np.zeros((anchor_num, 4), dtype=np.float32) ## size = base_size * base_size ##64， 19*19下一个像素点相对于原图的大小；也即标准锚框大小 count = 0 for ratio in ratios: # ws = int(np.sqrt(size * 1.0 / ratio)) ws = int(np.sqrt(size / ratio)) ##根据ration 生成不同的毛框大小 hs = int(ws * ratio) for scale in scales: wws = ws * scale hhs = hs * scale anchor[count, 0] = 0 anchor[count, 1] = 0 anchor[count, 2] = wws ##初始w anchor[count, 3] = hhs ##初始h count += 1 anchor= np.tile(anchor, score_size * score_size)# [5,4]----->[5,4 *score_size*score_size]==[5,4*19*19] #tile铺平复制 anchor = anchor.reshape((-1, 4))#[5,4*19*19]--->[5*19*19,4] # ##总共有 5*19*19个框 ori = - (score_size // 2) * total_stride ##-19//2 *8 ===-9*8=-72 # the left displacement xx, yy = np.meshgrid([ori + total_stride * dx for dx in range(score_size)], [ori + total_stride * dy for dy in range(score_size)]) ##获得原始尺寸大小下相对于中心点的 x,y坐标 xx, yy = np.tile(xx.flatten(), (anchor_num, 1)).flatten(), np.tile(yy.flatten(), (anchor_num, 1)).flatten() ## #tile铺平复制 """ xx[:20] >>> [-72 -64 -56 -48 -40 -32 -24 -16 -8 0 8 16 24 32 40 48 56 64 72 -72] yy[:20] >>> [-72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -72 -64] """ anchor[:, 0], anchor[:, 1] = xx.astype(np.float32), yy.astype(np.float32) return anchor

##补充说明：

生成的anchor，数据形如： anchor[0]== -72，-72，104，104。一定要注意！一定要注意！一定要注意！这里的前面两个值代表的是：原始尺寸大小下相对于中心点的 x,y坐标例如这里的anchor[0]，应该是第一个像素点的第一个anchor框，即图片左上角的anchor框。 -72,-72则是相对于图像中间的偏移。这里一定要好好理解，后面后处理的时候会用到。

涉及到np.tile函数和np.meshgrid函数，读者可以上网自行查阅，本文只做小demo演示 np.tile函数：功能，铺平复制

demo: import numpy as np a=np.arange(0,20) a=a.reshape(5,4) print(a) print(a.shape) a=np.tile(a,17*17) print(a[0,:20]) print(a[1,:20]) print(a.shape) ---------------------- >>> [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19]] >>> (5, 4) >>> [0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3] >>> [4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7] >>> (5, 1156)

np.meshgrid函数：功能：生成网格数据

demo: import numpy as np x = np.linspace(1, 4, 4) y = np.linspace(6, 7, 2) X, Y = np.meshgrid(x, y) print(x) print(y) print(X) print(Y) ---------------------- >>> [1. 2. 3. 4.] >>> [6. 7.] >>> [[1. 2. 3. 4.] [1. 2. 3. 4.]] >>> [[6. 6. 6. 6.] [7. 7. 7. 7.]] track_init函数 ./siamrpn/network.py --Line79 (跟踪推理第一帧的时候会调用到) 功能：初始化 score_filters_z和reg_filters_z def track_init(self, template): ##初始 score_filters_z, reg_filters_z N = template.size(0) ##N==1 template_feature = self.featureExtract(template)# 输出 [1, 256, 6, 6] # kernel_score=1,2x5,256,4,4 kernel_regression=1,4x5, 256,4,4 kernel_score = self.conv_cls1(template_feature).view(N, 2 * self.anchor_num, 256, 4, 4) ##后续分类回归的2k个box kernel_regression = self.conv_r1(template_feature).view(N, 4 * self.anchor_num, 256, 4, 4)###regression self.score_filters = kernel_score.reshape(-1, 256, 4, 4) # 2x5, 256, 4, 4 self.reg_filters = kernel_regression.reshape(-1, 256, 4, 4)# 4x5, 256, 4, 4 ./network.py下track函数功能：前面步骤通过track_init得到了self.score_filters和self.reg_filters。这里通过传入detection_frame，经过Backbone，可以得到conv_scores,conv_reg，进而得到互相关后的两个结果。对应论文公式:

在这里插入图片描述

def track(self, detection): N = detection.size(0) ##batch推理的时候，默认N==1 detection_feature = self.featureExtract(detection)##推理：feature_x.shape 1，256，24，24 conv_score = self.conv_cls2(detection_feature) ##不清楚可以看network.py下的__init__函数 1，256，22，22 conv_regression = self.conv_r2(detection_feature)##不清楚可以看network.py下的__init__函数 1，256，22，22 conv_scores = conv_score.reshape(1, -1, self.score_displacement + 4, self.score_displacement + 4) #self.score_displacement==18, conv_score的shape变化： 1，256，22，22---> 1，256，22，22 conv_reg = conv_regression.reshape(1, -1, self.score_displacement + 4, self.score_displacement + 4) pred_score = F.conv2d(conv_scores, self.score_filters, groups=N).reshape(N, 10, self.score_displacement + 1,self.score_displacement + 1)##组卷积 pred_regression = self.regress_adjust(F.conv2d(conv_reg, self.reg_filters, groups=N).reshape(N, 20, self.score_displacement + 1,self.score_displacement + 1)) return pred_score, pred_regression ## shape : 1,10,19,19 ||| 1,20,19,19 box_transform_inv函数

功能：通过给定anchor框和offset(dx,dy,dw,dh)计算预测框的值对应论文公式：

在这里插入图片描述

def box_transform_inv(anchors, offset): anchor_xctr = anchors[:, :1] #cx anchor_yctr = anchors[:, 1:2]#cy anchor_w = anchors[:, 2:3] #w anchor_h = anchors[:, 3:] #h offset_x, offset_y, offset_w, offset_h = offset[:, :1], offset[:, 1:2], offset[:, 2:3], offset[:, 3:], box_cx = anchor_xctr+ anchor_w * offset_x box_cy = anchor_yctr+ anchor_h * offset_y box_w = anchor_w * np.exp(offset_w) box_h = anchor_h * np.exp(offset_h) box = np.hstack([box_cx, box_cy, box_w, box_h]) #水平方向堆叠 np.vstack竖直方向堆叠 return box

这里只讲解box_cx,box_cy的计算公式，box_w和box_h类似。首先看论文中的相关公式：在这里插入图片描述

如下图，我们知道了cx1,cy1,w1,h1(蓝色框)，这是anchor_box初始创建的。后来通过模型可以得到reg_delta。如何计算得到cx2,cy2呢？

在这里插入图片描述

首先，可以知道 cx1+dx=cx2 ;;cy1+dy=cy2; 而 dx=reg_delta[0]*w1;;dy=reg_delta[2]*h1 所以 cx2=cx1+reg_delta[0]*w1;;cy2=cy1+reg_delta[1]*h1

注意！！这里得到的box_cx,box_cy,box_w,box_h，其中box_cx,box_cy还是相对于中心图片的偏移值，和anchors的前两个值的意义相同。

回顾 siamrpn推理的大致过程：

欢迎指正

因为本文主要是本人用来做的笔记，顺便进行知识巩固。如果本文对你有所帮助，那么本博客的目的就已经超额完成了。本人英语水平、阅读论文能力、读写代码能力较为有限。有错误，恳请大佬指正，感谢。

欢迎交流邮箱：[email protected]

【本文地址】

公司简介

联系我们

今日新闻

点击排行

实验室常用的仪器、试剂和: 说到实验室常用到的东西，主要就分为仪器、试剂和耗

不用再找了，全球10大实验: 01、赛默飞世尔科技（热电）Thermo Fisher Scientif

三代水柜的量产巅峰T-72坦: 作者：寞寒最近，西边闹腾挺大，本来小寞以为忙完这

通风柜跟实验室通风系统有: 说到通风柜跟实验室通风，不少人都纠结二者到底是不

集消毒杀菌、烘干收纳为一: 厨房是家里细菌较多的地方，潮湿的环境、没有完全密

实验室设备之全钢实验台如: 全钢实验台是实验室家具中较为重要的家具之一，很多

图片新闻

实验室药品柜的特性有哪些: 实验室药品柜是实验室家具的重要组成部分之一，主要

小学科学实验中有哪些教学: 计算机计算器一般打孔器打气筒仪器车显微镜

实验室各种仪器原理动图讲: 1.紫外分光光谱UV分析原理：吸收紫外光能量，引起分

高中化学常见仪器及实验装: 1、可加热仪器：2、计量仪器：（1）仪器A的名称：量

微生物操作主要设备和器具: 今天盘点一下微生物操作主要设备和器具，别嫌我啰嗦

浅谈通风柜使用基本常识: 　众所周知，通风柜功能中最主要的就是排气功能。在

SiamRPN代码讲解，推理测试讲解

SiamRPN代码讲解，推理测试讲解

今日新闻

点击排行

推荐新闻

图片新闻

专题文章