Softmax Calculator
Convert model logits to a probability distribution instantly
T < 1 sharper ยท T = 1 default ยท T > 1 flatter
Softmax Calculator
Softmax is the activation function used in the final layer of classification neural networks to convert raw scores (logits) into a probability distribution. All output probabilities sum to 1, with each value between 0 and 1. This calculator lets you paste logits, optionally set class labels, and inspect the resulting probability distribution along with entropy and the argmax class.
How to Use This Calculator
- Logits โ paste raw model output scores, comma-separated. These can be any real numbers.
- Temperature โ a scaling factor applied before softmax. Temperature < 1 sharpens the distribution (more confident); temperature > 1 flattens it (more uniform). Default is 1.
- Labels โ optionally enter class names, one per line or comma-separated, to label the output.
The Softmax Formula
For logits z = [z1, z2, ..., zK]:
softmax(zi) = ezi/T / ∑j ezj/T
Where T is the temperature. This calculator uses the numerically stable variant โ the max logit is subtracted before exponentiation to prevent overflow.
Temperature Scaling Explained
Temperature scaling is a post-hoc calibration technique. A neural network trained with cross-entropy loss tends to produce overconfident predictions. Dividing logits by a temperature T > 1 (learned on a validation set) produces better-calibrated probabilities without changing the argmax.
- T < 1: Sharpens the distribution โ the highest logit dominates even more.
- T = 1: Standard softmax โ no scaling.
- T > 1: Flattens the distribution โ makes it more uniform, more uncertain.
- T → ∞: Approaches a uniform distribution (maximum entropy).
- T → 0: Approaches a one-hot distribution (argmax only).
Entropy as a Confidence Measure
Shannon entropy H = -∑ pi log(pi) measures the uncertainty of the distribution. Minimum entropy (0 nats) means the model is completely certain. Maximum entropy is log(K) nats for K classes. High entropy often indicates the model is confused between classes and the prediction should be treated with caution.
Where Softmax Is Used
Multi-class classification: ResNet, ViT, BERT classifiers all end with a linear layer followed by softmax (or its implicit equivalent in cross-entropy loss).
Attention mechanisms: The attention score in Transformers is computed as softmax(QK&sup>T / √dk)) V. Temperature scaling corresponds to the 1/√dk factor.
Reinforcement learning: Policy networks output action probabilities via softmax over Q-values in actor-critic methods.