delta-io / delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
See what the GitHub community is most excited about today.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache Spark - A unified analytics engine for large-scale data processing
♞ lichess.org: the forever free, adless and open source chess server ♞
Scala 2 compiler and standard library. For bugs, see scala/bug
CMAK is a tool for managing Apache Kafka clusters
Spark: The Definitive Guide's Code Repository
ZIO — A type-safe, composable library for async and concurrent programming in Scala
Scala language server with rich IDE features
A Scala API for Apache Beam and Google Cloud Dataflow.
The Scala 3 compiler, also known as Dotty.
Play Framework
Redshift data source for Apache Spark
Modern Load Testing as Code
Apache Spark Connector for SQL Server and Azure SQL
Simple and Distributed Machine Learning
Yet another JSON library for Scala
An SBT plugin for displaying a welcome message and commonly used tasks.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
Code formatter for Scala
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Open, Modular, Deep Learning Accelerator
Spark RAPIDS plugin - accelerate Apache Spark with GPUs