Blog
The viewparquet blog
Deep dives into modern data engineering, GeoParquet, DuckDB, and columnar analytics.
7 articles found
Parquet for AI: Inspecting Embeddings, Tokens, and LLM Training Data
Why Parquet is the default format for fine-tuning and RAG datasets — and how to spot-check embedding columns, tokenized shards, and schema issues before a training run.
The Modern AI Data Pipeline: From Ingestion to Fine-Tuning with Parquet
A practical walkthrough of ETL → feature store → eval sets for ML teams — with schema drift checks and in-browser spot-checks at every stage.
Building Scalable Data Pipelines with Parquet: Lessons from Industry Leaders
Discover how industry leaders leverage Apache Parquet to build scalable, cost-effective data platforms, with real-world examples and best practices for modern data engineering.
Modernizing OpenStreetMap Data Handling with GeoParquet
Transform complex OpenStreetMap PBF data into analysis-ready GeoParquet format, making the world's largest collaborative geospatial dataset accessible to standard data tools and workflows.
From GeoParquet to Web Maps: Visualizing Data with MapLibre GL
Explore how to build high-performance web maps by leveraging GeoParquet for backend processing and MapLibre GL for frontend visualization, creating responsive and scalable mapping applications.
Accelerating GIS Analytics with DuckDB and GeoParquet
Learn how DuckDB's in-process analytical engine paired with GeoParquet creates a powerful, serverless stack for high-performance geospatial analytics without traditional database overhead.
GeoParquet 101: Unlocking Geospatial Big Data with Apache Parquet
Discover how GeoParquet revolutionizes geospatial data storage by combining Apache Parquet's columnar efficiency with standardized geospatial metadata for unprecedented performance.