Math

Softmax Calculator

Convert model logits to a probability distribution instantly

Logits (comma-separated)

Temperature

T < 1 sharper · T = 1 default · T > 1 flatter

Class labels (optional, comma-separated)

Frequently Asked Questions

What are logits?

Logits are the raw, unnormalized outputs of the final linear layer in a classification network. They can be any real number. Softmax converts them to probabilities that sum to 1. The term comes from the logit function (inverse of sigmoid), though in deep learning it more broadly refers to pre-activation scores.

Why does probability always sum to 1?

Softmax divides each exponentiated logit by the total sum of all exponentiated logits. This normalization guarantees that all outputs are positive and sum to exactly 1, making them valid class probabilities.

What is temperature in softmax?

Temperature T divides all logits before applying softmax. T < 1 makes the distribution sharper (more peaked) — the model becomes more confident. T > 1 flattens the distribution — the model becomes more uncertain. Temperature is used for model calibration and to control diversity in text generation.

What is the numerically stable softmax?

Directly computing exp(z) can overflow for large logits. The stable version subtracts the maximum logit first: exp(z - max(z)). This does not change the output because the extra factor cancels in the numerator and denominator, but it keeps the exponent in a safe numerical range.

What is the difference between softmax and sigmoid?

Sigmoid is used for binary classification or multi-label classification — each output is independent and the probabilities do not sum to 1. Softmax is used for mutually exclusive multi-class classification — exactly one class is correct and the outputs sum to 1.

What does high entropy mean for my model?

High entropy means the model distributes probability fairly evenly across classes — it is uncertain. This can indicate the input is genuinely ambiguous, or that the model is poorly calibrated or has not seen similar examples during training. Low entropy means the model is confident about one class.