Question 1

How do I open a Parquet file without Python, Spark, or pandas?

Accepted Answer

Open the file in a Parquet viewer that runs in your browser. Drag a .parquet file into viewparquet and it loads instantly with DuckDB-WASM — no Python, pandas, Spark, or any install required. You can browse rows, read the schema, and run SQL right away.

Command-line alternatives also exist (parquet-tools, the pq CLI, or the DuckDB CLI), but they require an install and a terminal. A browser viewer is the fastest path for a quick look at an unfamiliar file.

Open the viewer Step-by-step: open a Parquet file

Question 2

How can I view a Parquet file online?

Accepted Answer

Use a browser-based Parquet viewer such as viewparquet. You drop the file into the page and it is parsed locally in your browser — the data is never uploaded to a server, so even online viewing stays private.

This is useful when you receive a .parquet file and just want to confirm its contents, column names, and row count before pulling it into a heavier tool.

Open the viewer

Question 3

How do I open a very large Parquet file?

Accepted Answer

Use a tool that reads Parquet column-by-column and paginates instead of loading every row into memory. viewparquet streams results and pages through them, so file size is limited mainly by your browser's available memory rather than a fixed cap.

Because Parquet is columnar, you rarely need the whole file — query only the columns and rows you care about with SQL (for example `SELECT a, b FROM file LIMIT 1000`) to keep memory low on large datasets.

How to open a Parquet file

Question 4

How do I preview just the first few rows of a Parquet file?

Accepted Answer

Run a `LIMIT` query such as `SELECT * FROM read_parquet('file.parquet') LIMIT 10`. In viewparquet the grid already paginates, so opening a file shows the first page of rows immediately without scanning the whole dataset.

Question 5

How do I run SQL on a Parquet file?

Accepted Answer

Query it with DuckDB, which can read Parquet directly without a separate import step. In viewparquet the loaded file is exposed as a table, so you can write `SELECT … FROM <table> WHERE …` in the SQL editor and get results in the grid. Full DuckDB SQL is supported, including joins, aggregates, and window functions.

With the DuckDB syntax you can also reference a file by path, e.g. `SELECT * FROM read_parquet('data.parquet')`, and combine multiple files with globs like `read_parquet('data/*.parquet')`.

Open the SQL workbench Accelerating analytics with DuckDB

Question 6

Can I query a Parquet file without loading it into a database first?

Accepted Answer

Yes. Engines like DuckDB query Parquet in place, reading only the columns and row groups a query needs thanks to the file footer metadata. There is no ETL or table-creation step — you point a SQL query at the file and it scans on demand.

Question 7

How do I count the rows in a Parquet file quickly?

Accepted Answer

Run `SELECT COUNT(*) FROM read_parquet('file.parquet')`. Parquet stores the row count per row group in its footer, so a count returns almost instantly without scanning the actual data pages.

Question 8

How do I see the schema and column names of a Parquet file?

Accepted Answer

Use `DESCRIBE SELECT * FROM read_parquet('file.parquet')` in DuckDB, or open the file in viewparquet and check the schema panel. Parquet is self-describing — column names, types, and nullability are stored in the file footer, so the schema is available without scanning the data.

Command-line equivalents include `parquet-tools inspect file.parquet`, `pq schema file.parquet`, and PyArrow’s `pyarrow.parquet.read_schema('file.parquet')`.

Question 9

How do I read nested, list, or struct columns in Parquet?

Accepted Answer

Parquet supports nested types (structs, lists, and maps) natively, and DuckDB can query into them with dot and bracket notation — for example `SELECT col.field`, `col[1]`, or `UNNEST(list_col)`. In viewparquet nested columns render in the grid and can be expanded or flattened with SQL.

Use `UNNEST` to explode a list column into rows, and dotted paths to project a single field out of a struct so you can filter or aggregate on it.

Question 10

Why do Parquet timestamps or decimals look wrong in some tools?

Accepted Answer

Parquet stores logical types (timestamp, decimal, date) on top of physical types (INT64, BYTE_ARRAY). When a reader ignores the logical type it shows the raw physical value — for example a timestamp as a large integer. Use a reader that honors logical types, such as DuckDB or Arrow, to display the correct value.

Timestamp precision (milliseconds vs microseconds vs nanoseconds) and timezone metadata are also stored as logical-type annotations, which is why the same column can look different across Pandas, Spark, and Athena.

Question 11

How do I inspect Parquet metadata like row groups and compression?

Accepted Answer

Parquet keeps file-level and row-group metadata in its footer: number of rows, number of row groups, per-column compression codec, encodings, and min/max statistics. Open the metadata panel in viewparquet, or use CLI tools — `parquet-tools inspect`, `pq inspect`, or `pyarrow.parquet.read_metadata` — to read it without scanning the data.

Row-group statistics (min, max, null count) are what let query engines skip data; inspecting them helps you understand why a query is fast or slow.

Question 12

My Parquet file won’t open — what are the common causes?

Accepted Answer

The most common causes are: the file is truncated or still being written (the footer at the end is missing), it is actually a folder of part-files rather than a single file, it uses a compression codec your reader lacks (e.g. LZ4, ZSTD, Brotli), or it is not really Parquet. Confirm the file is complete and try a reader with broad codec support like DuckDB or Arrow.

Parquet files end with a 4-byte "PAR1" magic number and a footer; if a write was interrupted, the footer is missing and readers report a corrupt or invalid file.

Spark and Hive often write a directory such as `data.parquet/` containing many `part-*.parquet` files — point your tool at the individual part file or use a glob like `data.parquet/*.parquet`.

Question 13

What is a Parquet row group and what size should it be?

Accepted Answer

A row group is a horizontal slice of the table stored together, and it is the unit query engines read and skip. A common target is 128 MB–512 MB (or roughly 100k–1M rows) per row group, balancing read parallelism against the per-row-group metadata overhead of having too many tiny groups.

Too many small files or tiny row groups (the "small files problem") hurt performance because engines spend more time on metadata than data. Compacting them into larger files helps.

Question 14

How do I convert a CSV (or JSON) file to Parquet?

Accepted Answer

With DuckDB it is a single statement: `COPY (SELECT * FROM read_csv_auto('data.csv')) TO 'data.parquet' (FORMAT PARQUET)`. In viewparquet you can load a CSV, TSV, JSON, or JSON Lines file and export the result as Parquet directly from the browser.

CLI tools such as `pq convert data.csv -o data.parquet` and `parquet-tools import` do the same conversion from a terminal.

Open the viewer

Question 15

How do I export SQL query results as a Parquet or CSV file?

Accepted Answer

Run your query, then export the result set. In viewparquet the results grid can be exported to Parquet or CSV. With the DuckDB CLI, wrap the query in `COPY (…) TO 'out.parquet' (FORMAT PARQUET)` or `(FORMAT CSV, HEADER)`.

Question 16

Parquet vs CSV: when should I use which?

Accepted Answer

Use Parquet for analytics and storage of large or wide datasets, and CSV for small, human-readable, interchange data. Parquet is columnar, compressed, and self-describing (it keeps types and statistics), so it is far smaller and faster to query; CSV is plain text with no types and must be fully scanned.

A practical workflow: keep raw exports as CSV/JSON, convert to Parquet for repeated querying, and inspect either format the same way in a viewer before trusting it.

Building scalable pipelines with Parquet Convert Parquet to CSV Open Parquet in Excel

Question 17

How do I read a Parquet file from S3 or cloud storage?

Accepted Answer

In viewparquet, click Open from S3 / URL in the viewer. Paste a public HTTPS or presigned URL, a public s3:// path, or connect a private bucket with in-browser access keys (AWS S3, Cloudflare R2, GCS HMAC, or MinIO). DuckDB streams byte ranges directly from the bucket — credentials stay in your browser.

For CLI workflows, DuckDB httpfs also supports read_parquet('s3://bucket/key.parquet') after configuring secrets in a local session.

Full S3 guide in viewparquet S3 Parquet viewer

Question 18

How can I analyze a sensitive Parquet file without uploading it anywhere?

Accepted Answer

Use a viewer that processes the file locally in your browser. viewparquet runs entirely client-side with DuckDB-WASM, so opening, querying, and exporting all happen on your device and the data never leaves the browser — you can confirm this in the Network panel, which shows zero uploads.

Question 19

How do I spot-check a Parquet training dataset before a run?

Accepted Answer

Open the shard in a viewer and check the things that break training: schema and column types, null counts, row count, and the shape of embedding or token columns. viewparquet lets you eyeball rows and run SQL (distincts, null checks, length of array columns) on each Parquet shard locally before kicking off a fine-tune or eval.

How to work with Parquet files

Opening & viewing