Math

CNN Output Size Calculator

Trace CNN feature map dimensions through conv and pooling layers

Add layers above to see output dimensions.

CNN Output Size Calculator

One of the most common friction points when building convolutional neural networks is keeping track of feature map dimensions as data flows through conv, pooling, and transposed conv layers. A single wrong padding or stride setting causes shape mismatches that crash training. This calculator traces the spatial dimensions through each layer so you can verify your architecture before writing a single line of model code.

How to Use This Calculator

  1. Input dimensions — enter your input height and width (e.g. 224 × 224 for ImageNet).
  2. Add layers — click "Add layer" and choose Conv2d, MaxPool2d, or Transposed Conv2d. Set kernel size, stride, padding, and dilation for each.
  3. Review the output — the table shows input and output dimensions for each layer, plus a cumulative receptive field estimate.

The Output Size Formula

For Conv2d and MaxPool2d:

output = floor((input + 2 × padding - dilation × (kernel - 1) - 1) / stride + 1)

For ConvTranspose2d (upsampling):

output = (input - 1) × stride - 2 × padding + dilation × (kernel - 1) + 1

Common Architectures for Reference

VGG-16 first block: Input 224×224. Two 3×3 conv layers (padding=1, stride=1) keep the size at 224×224. A 2×2 max pool (stride=2) reduces it to 112×112.

ResNet stem: 7×7 conv with stride=2 reduces 224×224 to 112×112. A 3×3 max pool with stride=2 reduces further to 56×56.

U-Net decoder: Transposed convolutions with stride=2 double spatial dimensions at each scale level.

Tips for Architecture Design

  • Use padding = kernel_size // 2 on odd kernels to maintain spatial size (same padding).
  • Pooling with stride=2 halves the spatial dimensions each time — plan the number of downsampling stages based on your minimum feature map size.
  • For segmentation and detection heads, ensure the feature map at the prediction layer matches your output stride requirements.
  • Dilated convolutions increase the receptive field without reducing spatial resolution — useful for dense prediction tasks.

Frequently Asked Questions

Why do I get a shape mismatch error in PyTorch?
Shape mismatches usually occur when a layer receives input dimensions it was not designed for. Use this calculator to trace dimensions through your architecture and identify which layer causes the size to drop to 0 or produce unexpected shapes.
What does "same padding" mean?
Same padding means padding is chosen so the output size equals the input size (assuming stride=1). For a kernel of size k, same padding is (k-1)/2. For example, a 3×3 conv with padding=1 produces the same spatial size as its input.
How does dilation affect output size?
Dilation expands the kernel by inserting gaps between kernel elements, effectively increasing the kernel size to dilation × (kernel-1) + 1. A 3×3 kernel with dilation=2 behaves like a 5×5 kernel for output size computation, but still has only 9 parameters.
What is the receptive field?
The receptive field is the region of the input image that a single output neuron "sees". It grows with each layer. Deeper layers have larger receptive fields. This calculator shows a cumulative estimate of the receptive field size after each layer.
What is ConvTranspose2d used for?
ConvTranspose2d (also called fractionally strided convolution or deconvolution) performs upsampling — it increases spatial dimensions. It is used in decoder paths of U-Nets, GANs generators, and segmentation networks to recover original resolution.
Does this calculator handle channels or batch size?
No. Conv and pool operations do not change spatial height and width based on channel count — channels are determined by the number of filters, which this tool does not track. Only spatial dimensions (H × W) are computed.