Utility

CSV Summary Profiler

Instantly profile any CSV with column stats and missing value counts

CSV Summary Profiler

The CSV Summary Profiler generates an instant data profile of any CSV dataset — right in your browser. It counts rows, detects column types, calculates summary statistics for numeric columns (mean, median, min, max, standard deviation), and reports missing value counts. This is exploratory data analysis (EDA) in seconds, with no Python, no R, and no upload required.

How to Use This Tool

  1. Paste your CSV — the first row must be a header row with column names.
  2. Null tokens — customize what counts as a missing value. Defaults are: blank, NA, N/A, null, none, undefined. Add custom tokens separated by commas.
  3. Sample size — optionally limit profiling to the first N rows (useful for very large files). Set to 0 to profile all rows.
  4. Review the profile report: row/column counts, per-column type, count, missing count, and numeric statistics.

What the Profile Report Shows

For each column, the profiler reports:

  • Type — numeric (all values parse as numbers) or text.
  • Count — number of non-missing values.
  • Missing — count of missing values (blank or null tokens).
  • For numeric columns: min, max, mean, median, and standard deviation.
  • For text columns: unique value count and the most frequent value.

Why Profile Your Data First?

Before any analysis or model training, profiling reveals:

  • Columns with high missing rates that need imputation or removal.
  • Unexpected data types (e.g., a numeric column that looks like text due to stray characters).
  • Outliers suggested by extreme min/max values.
  • Duplicate-heavy columns where cardinality is surprisingly low.
  • Scale imbalances between features that require normalization.

Real-World Examples

Before regression modeling: Profile your feature matrix to find missing values, confirm all inputs are numeric, and check for columns where min equals max (zero variance — no predictive information).

Data quality audit: A CSV exported from a legacy system may have inconsistent null representations ("N/A", "—", "?"). Add custom null tokens to catch all of them.

Quick summary report: Share the profiler output in a data team meeting to align on the dataset's structure before deeper analysis.

Frequently Asked Questions

What is a data profile?
A data profile is a statistical summary of a dataset's columns: their types, value counts, missing rates, and distributional statistics (min, max, mean, median, std for numeric columns; top values and cardinality for text columns). It is the recommended first step in any data analysis workflow.
What null tokens are supported by default?
The default null tokens are: empty string, "NA", "N/A", "null", "none", "undefined", and "NaN" (case-insensitive). You can add custom tokens in the null tokens field — useful for values like "?", "-", or "missing".
How does the tool detect numeric vs. text columns?
A column is classified as numeric if every non-null value in it parses as a JavaScript number. If even one non-null value cannot be parsed as a number, the column is classified as text.
What is cardinality in the context of a text column?
Cardinality is the number of distinct unique values in a column. A "country" column with 50 unique values has cardinality 50. High-cardinality text columns (thousands of unique values) may be IDs rather than categories.
Can I use this on large files?
The tool processes data in your browser. For files up to ~10MB (a few hundred thousand rows), it runs in seconds. For larger files, use the sample size limit to profile the first N rows as a representative sample.
Is my data uploaded anywhere?
No. All profiling runs entirely in your browser. Your CSV data is never sent to a server. This makes the tool safe for proprietary or sensitive datasets.
What does a high missing rate indicate?
More than 20–30% missing values in a column often means the column was optional in the data collection process or contains sparsely-populated information. Consider: imputing the missing values, dropping the column, or using a model that handles missing data natively.
Why does mean ≠ median for numeric columns?
When mean and median diverge significantly, the distribution is skewed. A mean much higher than the median indicates a right-skewed distribution (a few very high values pulling the mean up). This is common in revenue, income, and count data.