PyTorch教程14.4之錨箱

2098244 2023-06-05 | pdf | 0.40 MB | 次下載 | 免費

資料介紹

物體檢測算法通常在輸入圖像中采樣大量區域，判斷這些區域是否包含感興趣的物體，并調整區域的邊界，從而更準確地預測物體的真實邊界框。不同的模型可能采用不同的區域采樣方案。在這里，我們介紹其中一種方法：它生成多個以每個像素為中心的具有不同比例和縱橫比的邊界框。這些邊界框稱為錨框。我們將在14.7 節設計一個基于錨框的目標檢測模型。

首先，讓我們修改打印精度以獲得更簡潔的輸出。

						%matplotlib inline
import torch
from d2l import torch as d2l

torch.set_printoptions(2) # Simplify printing accuracy

						%matplotlib inline
from mxnet import gluon, image, np, npx
from d2l import mxnet as d2l

np.set_printoptions(2) # Simplify printing accuracy
npx.set_np()

						 

14.4.1。生成多個錨框

假設輸入圖像的高度為h和寬度 w. 我們以圖像的每個像素為中心生成具有不同形狀的錨框。讓規模成為s∈(0,1]縱橫比（寬高比）為 r>0. 那么anchor box的寬高分別是hsr和 hs/r，分別。請注意，當中心位置給定時，將確定一個已知寬度和高度的錨框。

為了生成多個不同形狀的錨框，讓我們設置一系列尺度s1,…,sn和一系列縱橫比 r1,…,rm. 當以每個像素為中心使用這些尺度和縱橫比的所有組合時，輸入圖像將總共有whnm錨箱。雖然這些anchor boxes可能會覆蓋所有的ground-truth bounding boxes，但是計算復雜度很容易過高。在實踐中，我們只能考慮那些包含s1或者r1:

(14.4.1)(s1,r1),(s1,r2),…,(s1,rm),(s2,r1),(s3,r1),…,(sn,r1).

也就是說，以同一個像素為中心的anchor boxes的個數為 n+m?1. 對于整個輸入圖像，我們將生成總共 wh(n+m?1)錨箱。

上面生成anchor boxes的方法是在下面的multibox_prior函數中實現的。我們指定輸入圖像、比例列表和縱橫比列表，然后此函數將返回所有錨框。

							#@save
def multibox_prior(data, sizes, ratios):
  """Generate anchor boxes with different shapes centered on each pixel."""
  in_height, in_width = data.shape[-2:]
  device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
  boxes_per_pixel = (num_sizes + num_ratios - 1)
  size_tensor = torch.tensor(sizes, device=device)
  ratio_tensor = torch.tensor(ratios, device=device)
  # Offsets are required to move the anchor to the center of a pixel. Since
  # a pixel has height=1 and width=1, we choose to offset our centers by 0.5
  offset_h, offset_w = 0.5, 0.5
  steps_h = 1.0 / in_height # Scaled steps in y axis
  steps_w = 1.0 / in_width # Scaled steps in x axis

  # Generate all center points for the anchor boxes
  center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
  center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
  shift_y, shift_x = torch.meshgrid(center_h, center_w, indexing='ij')
  shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)

  # Generate `boxes_per_pixel` number of heights and widths that are later
  # used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
  w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]),
          sizes[0] * torch.sqrt(ratio_tensor[1:])))\
          * in_height / in_width # Handle rectangular inputs
  h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),
          sizes[0] / torch.sqrt(ratio_tensor[1:])))
  # Divide by 2 to get half height and half width
  anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                    in_height * in_width, 1) / 2

  # Each center point will have `boxes_per_pixel` number of anchor boxes, so
  # generate a grid of all anchor box centers with `boxes_per_pixel` repeats
  out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
        dim=1).repeat_interleave(boxes_per_pixel, dim=0)
  output = out_grid + anchor_manipulations
  return output.unsqueeze(0)

							 

							#@save
def multibox_prior(data, sizes, ratios):
  """Generate anchor boxes with different shapes centered on each pixel."""
  in_height, in_width = data.shape[-2:]
  device, num_sizes, num_ratios = data.ctx, len(sizes), len(ratios)
  boxes_per_pixel = (num_sizes + num_ratios - 1)
  size_tensor = np.array(sizes, ctx=device)
  ratio_tensor = np.array(ratios, ctx=device)
  # Offsets are required to move the anchor to the center of a pixel. Since
  # a pixel has height=1 and width=1, we choose to offset our centers by 0.5
  offset_h, offset_w = 0.5, 0.5
  steps_h = 1.0 / in_height # Scaled steps in y-axis
  steps_w = 1.0 / in_width # Scaled steps in x-axis

  # Generate all center points for the anchor boxes
  center_h = (np.arange(in_height, ctx=device) + offset_h) * steps_h
  center_w = (np.arange(in_width, ctx=device) + offset_w) * steps_w
  shift_x, shift_y = np.meshgrid(center_w, center_h)
  shift_x, shift_y = shift_x.reshape(-1), shift_y.reshape(-1)

  # Generate `boxes_per_pixel` number of heights and widths that are later
  # used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
  w = np.concatenate((size_tensor * np.sqrt(ratio_tensor[0]),
            sizes[0] * np.sqrt(ratio_tensor[1:]))) \
            *