Utility

Missing Values Analyzer

Find missing data by cell, row, or column in any CSV

CSV data (paste with header row)

Missing tokens (comma-separated; empty string = blank cell)

Analysis mode

Frequently Asked Questions

What counts as a missing value?

By default: empty cells, "NA", "N/A", "null", "none", "undefined", and "NaN" (all case-insensitive). You can add custom tokens in the missing tokens field for values specific to your dataset, such as "?", "-", or "999".

What is the difference between cell, row, and column analysis modes?

Cell mode counts every missing cell in the grid. Row mode counts rows with at least one missing value. Column mode counts missing values per column and computes per-column missing rates. Each answers a different question about your data completeness.

What missing rate is considered "too high"?

A common rule of thumb: under 5% missing is generally safe for most imputation methods; 5–20% requires careful imputation; over 40–50% in a column often warrants dropping the column entirely, as imputed values would dominate. Context matters — in some domains, 30% missingness is acceptable with the right model.

Should I impute before or after splitting into train/test sets?

After splitting. Fit your imputation model on the training set only (computing means, medians, or regression weights from training data), then apply it to validation and test sets. Imputing on the full dataset before splitting leaks test set statistics into training.

What is listwise deletion?

Listwise deletion removes any row containing at least one missing value. It is the simplest approach but can discard large fractions of data and introduce bias if missingness is not MCAR.

How can I tell if my data is MCAR, MAR, or MNAR?

A simple check: create a binary missingness indicator (1 = missing) for each column and test whether it correlates with other observed columns. If yes, the data may be MAR. MNAR is harder to detect without domain knowledge — ask whether the reason data is missing is related to the value itself.

Does the tool handle CSVs with custom delimiters?

The tool expects standard comma-delimited CSV with a header row. Tab-delimited or semicolon-delimited files should be converted to CSV first.

Is my data safe to paste here?

Yes. All processing runs locally in your browser. No data is transmitted to any server, so the tool is safe for confidential or sensitive datasets.