Utility

CSV Summary Profiler

Instantly profile any CSV with column stats and missing value counts

CSV data (paste with header row)

Null tokens (comma-separated)

Max rows to profile

Frequently Asked Questions

What is a data profile?

A data profile is a statistical summary of a dataset's columns: their types, value counts, missing rates, and distributional statistics (min, max, mean, median, std for numeric columns; top values and cardinality for text columns). It is the recommended first step in any data analysis workflow.

What null tokens are supported by default?

The default null tokens are: empty string, "NA", "N/A", "null", "none", "undefined", and "NaN" (case-insensitive). You can add custom tokens in the null tokens field — useful for values like "?", "-", or "missing".

How does the tool detect numeric vs. text columns?

A column is classified as numeric if every non-null value in it parses as a JavaScript number. If even one non-null value cannot be parsed as a number, the column is classified as text.

What is cardinality in the context of a text column?

Cardinality is the number of distinct unique values in a column. A "country" column with 50 unique values has cardinality 50. High-cardinality text columns (thousands of unique values) may be IDs rather than categories.

Can I use this on large files?

The tool processes data in your browser. For files up to ~10MB (a few hundred thousand rows), it runs in seconds. For larger files, use the sample size limit to profile the first N rows as a representative sample.

Is my data uploaded anywhere?

No. All profiling runs entirely in your browser. Your CSV data is never sent to a server. This makes the tool safe for proprietary or sensitive datasets.

What does a high missing rate indicate?

More than 20–30% missing values in a column often means the column was optional in the data collection process or contains sparsely-populated information. Consider: imputing the missing values, dropping the column, or using a model that handles missing data natively.

Why does mean ≠ median for numeric columns?

When mean and median diverge significantly, the distribution is skewed. A mean much higher than the median indicates a right-skewed distribution (a few very high values pulling the mean up). This is common in revenue, income, and count data.