Blog

    The viewparquet blog

    Deep dives into modern data engineering, GeoParquet, DuckDB, and columnar analytics.

    7 articles found

    Why Parquet is the default format for fine-tuning and RAG datasets — and how to spot-check embedding columns, tokenized shards, and schema issues before a training run.

    May 15, 2026
    11 min read
    Read Article

    A practical walkthrough of ETL → feature store → eval sets for ML teams — with schema drift checks and in-browser spot-checks at every stage.

    April 28, 2026
    12 min read
    Read Article

    Discover how industry leaders leverage Apache Parquet to build scalable, cost-effective data platforms, with real-world examples and best practices for modern data engineering.

    February 5, 2024
    13 min read
    Read Article

    Transform complex OpenStreetMap PBF data into analysis-ready GeoParquet format, making the world's largest collaborative geospatial dataset accessible to standard data tools and workflows.

    February 1, 2024
    10 min read
    Read Article

    Explore how to build high-performance web maps by leveraging GeoParquet for backend processing and MapLibre GL for frontend visualization, creating responsive and scalable mapping applications.

    January 25, 2024
    11 min read
    Read Article

    Learn how DuckDB's in-process analytical engine paired with GeoParquet creates a powerful, serverless stack for high-performance geospatial analytics without traditional database overhead.

    January 20, 2024
    14 min read
    Read Article
    GeoParquet
    Apache Parquet
    Geospatial
    +1
    Featured

    GeoParquet 101: Unlocking Geospatial Big Data with Apache Parquet

    Discover how GeoParquet revolutionizes geospatial data storage by combining Apache Parquet's columnar efficiency with standardized geospatial metadata for unprecedented performance.

    January 15, 2024
    12 min read
    Read Article