viewparquet/Blog

Blog

The viewparquet blog

Deep dives into modern data engineering, GeoParquet, DuckDB, and columnar analytics.

10 articles found

Parquet

Privacy

Featured

How to Open Parquet from S3 Privately in Your Browser

Query Parquet on AWS S3, Cloudflare R2, GCS, or MinIO without uploading files to a third-party viewer. viewparquet streams from the bucket with DuckDB-WASM — your credentials stay in the browser.

June 24, 2026

9 min read

Read Article

Parquet

Datasets

Open Data

Featured

Free Public Parquet Datasets You Can Download Right Now (Every Link Verified)

A curated list of 11 public Parquet files you can actually download — flights, NYC taxi trips, Hugging Face NLP sets, and GeoParquet samples. Every link was tested and validated with DuckDB, with sizes, row counts, schemas, and example SQL.

June 12, 2026

10 min read

Read Article

Parquet

DuckDB

Performance

Featured

Open Big Parquet Files Without a Worry: Why Your Browser Can Now Handle Gigabytes

Large Parquet files used to crash in-browser viewers with out-of-memory errors. Here's the engineering change — zero-copy streaming over materialized tables — that lets viewparquet open multi-gigabyte files instantly, and what it means for your workflow.

June 12, 2026

8 min read

Read Article

LLM

Parquet

Featured

Parquet for AI: Inspecting Embeddings, Tokens, and LLM Training Data

Why Parquet is the default format for fine-tuning and RAG datasets — and how to spot-check embedding columns, tokenized shards, and schema issues before a training run.

May 15, 2026

11 min read

Read Article

Data Pipeline

Parquet

Featured

The Modern AI Data Pipeline: From Ingestion to Fine-Tuning with Parquet

A practical walkthrough of ETL → feature store → eval sets for ML teams — with schema drift checks and in-browser spot-checks at every stage.

April 28, 2026

12 min read

Read Article

Apache Parquet

Data Engineering

Scalability

Building Scalable Data Pipelines with Parquet: Lessons from Industry Leaders

Discover how industry leaders leverage Apache Parquet to build scalable, cost-effective data platforms, with real-world examples and best practices for modern data engineering.

February 5, 2024

13 min read

Read Article

OpenStreetMap

OSM

Data Pipeline

Modernizing OpenStreetMap Data Handling with GeoParquet

Transform complex OpenStreetMap PBF data into analysis-ready GeoParquet format, making the world's largest collaborative geospatial dataset accessible to standard data tools and workflows.

February 1, 2024

10 min read

Read Article

MapLibre GL

Web Maps

Visualization

From GeoParquet to Web Maps: Visualizing Data with MapLibre GL

Explore how to build high-performance web maps by leveraging GeoParquet for backend processing and MapLibre GL for frontend visualization, creating responsive and scalable mapping applications.

January 25, 2024

11 min read

Read Article

DuckDB

GeoParquet

SQL

Featured

Accelerating GIS Analytics with DuckDB and GeoParquet

Learn how DuckDB's in-process analytical engine paired with GeoParquet creates a powerful, serverless stack for high-performance geospatial analytics without traditional database overhead.

January 20, 2024

14 min read

Read Article

GeoParquet

Apache Parquet

Geospatial

Featured

GeoParquet 101: Unlocking Geospatial Big Data with Apache Parquet

Discover how GeoParquet revolutionizes geospatial data storage by combining Apache Parquet's columnar efficiency with standardized geospatial metadata for unprecedented performance.

January 15, 2024

12 min read

Read Article