Utility

Missing Values Analyzer

Find missing data by cell, row, or column in any CSV

Missing Values Analyzer

Missing data is one of the most common problems in real-world datasets. Before handling missing values through imputation or exclusion, you need to know which columns or rows are affected, how severely, and what pattern the missingness follows. This tool performs that analysis instantly on any CSV dataset, entirely in your browser.

How to Use This Tool

  1. Paste your CSV — the first row is treated as the header.
  2. Missing tokens — enter a comma-separated list of values that represent missingness in your data. Defaults: blank, NA, N/A, null, none, undefined. Add custom values like "?" or "-".
  3. Analysis mode — choose how to report missing data:
    • Cell: counts every individual missing cell across the entire dataset.
    • Row: counts and lists rows where at least one value is missing.
    • Column: counts missing values per column with rates and identifies the most affected columns.

Understanding the Analysis Modes

Cell mode gives you the total missing cell count and overall missing rate — useful for a single-number summary of data completeness.

Row mode identifies incomplete records. Useful when you plan to use listwise deletion (removing rows with any missing value) and need to know how many rows you would lose.

Column mode shows per-column missing rates, helping you decide which columns are candidates for imputation vs. removal.

Types of Missingness

MCAR (Missing Completely At Random): Missingness has no pattern — it is as likely in any row or column. Listwise deletion is valid but reduces sample size.

MAR (Missing At Random): Missingness depends on other observed variables but not on the missing value itself. Imputation based on other columns is appropriate.

MNAR (Missing Not At Random): The probability of a value being missing depends on the missing value itself (e.g., high earners skip income questions). This requires careful modeling and cannot be fixed with simple imputation.

Common Strategies for Handling Missing Data

  • Remove rows — if few rows have missing values and the data is MCAR.
  • Remove columns — if a column has >40–50% missing values and imputation would introduce too much noise.
  • Mean/median imputation — fill with column mean (numeric) or mode (categorical). Simple but introduces bias for MAR data.
  • Model-based imputation — predict missing values using a regression model trained on observed columns. More accurate for MAR.
  • Indicator variables — add a binary column flagging whether a value was imputed, letting the model learn the missingness pattern.

Frequently Asked Questions

What counts as a missing value?
By default: empty cells, "NA", "N/A", "null", "none", "undefined", and "NaN" (all case-insensitive). You can add custom tokens in the missing tokens field for values specific to your dataset, such as "?", "-", or "999".
What is the difference between cell, row, and column analysis modes?
Cell mode counts every missing cell in the grid. Row mode counts rows with at least one missing value. Column mode counts missing values per column and computes per-column missing rates. Each answers a different question about your data completeness.
What missing rate is considered "too high"?
A common rule of thumb: under 5% missing is generally safe for most imputation methods; 5–20% requires careful imputation; over 40–50% in a column often warrants dropping the column entirely, as imputed values would dominate. Context matters — in some domains, 30% missingness is acceptable with the right model.
Should I impute before or after splitting into train/test sets?
After splitting. Fit your imputation model on the training set only (computing means, medians, or regression weights from training data), then apply it to validation and test sets. Imputing on the full dataset before splitting leaks test set statistics into training.
What is listwise deletion?
Listwise deletion removes any row containing at least one missing value. It is the simplest approach but can discard large fractions of data and introduce bias if missingness is not MCAR.
How can I tell if my data is MCAR, MAR, or MNAR?
A simple check: create a binary missingness indicator (1 = missing) for each column and test whether it correlates with other observed columns. If yes, the data may be MAR. MNAR is harder to detect without domain knowledge — ask whether the reason data is missing is related to the value itself.
Does the tool handle CSVs with custom delimiters?
The tool expects standard comma-delimited CSV with a header row. Tab-delimited or semicolon-delimited files should be converted to CSV first.
Is my data safe to paste here?
Yes. All processing runs locally in your browser. No data is transmitted to any server, so the tool is safe for confidential or sensitive datasets.