CNN Output Size Calculator
Trace CNN feature map dimensions through conv and pooling layers
CNN Output Size Calculator
One of the most common friction points when building convolutional neural networks is keeping track of feature map dimensions as data flows through conv, pooling, and transposed conv layers. A single wrong padding or stride setting causes shape mismatches that crash training. This calculator traces the spatial dimensions through each layer so you can verify your architecture before writing a single line of model code.
How to Use This Calculator
- Input dimensions — enter your input height and width (e.g. 224 × 224 for ImageNet).
- Add layers — click "Add layer" and choose Conv2d, MaxPool2d, or Transposed Conv2d. Set kernel size, stride, padding, and dilation for each.
- Review the output — the table shows input and output dimensions for each layer, plus a cumulative receptive field estimate.
The Output Size Formula
For Conv2d and MaxPool2d:
output = floor((input + 2 × padding - dilation × (kernel - 1) - 1) / stride + 1)
For ConvTranspose2d (upsampling):
output = (input - 1) × stride - 2 × padding + dilation × (kernel - 1) + 1
Common Architectures for Reference
VGG-16 first block: Input 224×224. Two 3×3 conv layers (padding=1, stride=1) keep the size at 224×224. A 2×2 max pool (stride=2) reduces it to 112×112.
ResNet stem: 7×7 conv with stride=2 reduces 224×224 to 112×112. A 3×3 max pool with stride=2 reduces further to 56×56.
U-Net decoder: Transposed convolutions with stride=2 double spatial dimensions at each scale level.
Tips for Architecture Design
- Use padding = kernel_size // 2 on odd kernels to maintain spatial size (same padding).
- Pooling with stride=2 halves the spatial dimensions each time — plan the number of downsampling stages based on your minimum feature map size.
- For segmentation and detection heads, ensure the feature map at the prediction layer matches your output stride requirements.
- Dilated convolutions increase the receptive field without reducing spatial resolution — useful for dense prediction tasks.