Trending repositories for language Scala
♞ lichess.org: the forever free, adless and open source chess server ♞
Apache Spark - A unified analytics engine for large-scale data processing
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Scala 2 compiler and standard library. Scala 2 bugs at https://github.com/scala/bug; Scala 3 at https://github.com/scala/scala3
Mill is a fast JVM build tool that supports Java, Scala, Kotlin and many other languages. 3-6x faster than Maven or Gradle for common workflows, Mill aims to make your project’s build process performa...
Hybrid search engine, combining best features of text and semantic search worlds
A Git platform powered by Scala with easy installation, high extensibility & GitHub API compatibility
Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
A eDSL framework based on Scala and MLIR, focusing on the Hardware design.
Compare binary sizes of canonical Hello World in 18 different languages
Hybrid search engine, combining best features of text and semantic search worlds
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
A language with lexical effect handlers and lightweight effect polymorphism
Cortex: a Powerful Observable Analysis and Active Response Engine
♞ lichess.org: the forever free, adless and open source chess server ♞
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Compiler for the Vale programming language - http://vale.dev/
♞ lichess.org: the forever free, adless and open source chess server ♞
Apache Spark - A unified analytics engine for large-scale data processing
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
Hybrid search engine, combining best features of text and semantic search worlds
Mill is a fast JVM build tool that supports Java, Scala, Kotlin and many other languages. 3-6x faster than Maven or Gradle for common workflows, Mill aims to make your project’s build process performa...
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
Cortex: a Powerful Observable Analysis and Active Response Engine
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Scala 2 compiler and standard library. Scala 2 bugs at https://github.com/scala/bug; Scala 3 at https://github.com/scala/scala3
A platform to build and run apps that are elastic, agile, and resilient. SDK, libraries, and hosted environments.
[VLDB 2025] Source code for T-Assess: An Efficient Data Quality Assessment System Tailored for Trajectory Data
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Multi-Agent Reinforcement learning for Satellite scheduling,基于stk11的多智能体强化学习卫星调度实验
A eDSL framework based on Scala and MLIR, focusing on the Hardware design.
Spark Structured Streaming Kinesis Data Streams connector supports both GetRecords and SubscribeToShard (Enhanced Fan-Out, EFO)
Hybrid search engine, combining best features of text and semantic search worlds
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
Besom - a Pulumi SDK for Scala. Also, incidentally, a broom made of twigs tied round a stick. Brooms and besoms are used for protection, to ward off evil spirits, and cleansing of ritual spaces.
The VerCors verification toolset for verifying parallel and concurrent software
Apache Spark - A unified analytics engine for large-scale data processing
♞ lichess.org: the forever free, adless and open source chess server ♞
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Hybrid search engine, combining best features of text and semantic search worlds
Mill is a fast JVM build tool that supports Java, Scala, Kotlin and many other languages. 3-6x faster than Maven or Gradle for common workflows, Mill aims to make your project’s build process performa...
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
[VLDB 2025] Source code for T-Assess: An Efficient Data Quality Assessment System Tailored for Trajectory Data
GiGL is an open-source library for training and inference of Graph Neural Networks at very large (billion) scale.
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Table utils, one line CSV imports - a table is a Iterator (or iterable) of a `Named Tuple` or `Product`
Multi-Agent Reinforcement learning for Satellite scheduling,基于stk11的多智能体强化学习卫星调度实验
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
Hybrid search engine, combining best features of text and semantic search worlds
A eDSL framework based on Scala and MLIR, focusing on the Hardware design.
Spark Structured Streaming Kinesis Data Streams connector supports both GetRecords and SubscribeToShard (Enhanced Fan-Out, EFO)
Cats Actors framework for building apps which are reactive. Cats actors uses a conceptual actor model as a higher level abstraction for concurrency.
S3HyperSync is a high-performance, memory-efficient, and cost-effective tool for synchronizing files between S3-compatible storage services.
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
GiGL is an open-source library for training and inference of Graph Neural Networks at very large (billion) scale.
Multi-Agent Reinforcement learning for Satellite scheduling,基于stk11的多智能体强化学习卫星调度实验
A eDSL framework based on Scala and MLIR, focusing on the Hardware design.
Apache Spark - A unified analytics engine for large-scale data processing
♞ lichess.org: the forever free, adless and open source chess server ♞
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
Mill is a fast JVM build tool that supports Java, Scala, Kotlin and many other languages. 3-6x faster than Maven or Gradle for common workflows, Mill aims to make your project’s build process performa...
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Hybrid search engine, combining best features of text and semantic search worlds
An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Build highly concurrent, distributed, and resilient message-driven applications using Java/Scala
GPGPU processor supporting RISCV-V extension, developed with Chisel HDL
S3HyperSync is a high-performance, memory-efficient, and cost-effective tool for synchronizing files between S3-compatible storage services.
Hybrid search engine, combining best features of text and semantic search worlds
GiGL is an open-source library for training and inference of Graph Neural Networks at very large (billion) scale.
Experimental Scala 3 library that allows to automatically derive instances of the smithy4s abstractions from scala constructs.
Reference applications for funding, operating, and incentivizing the use of a decentralized, public Canton synchronizer. Includes the Amulet reference application for creating native payment utilities...
The Lightning Catalog is an open-source data catalog designed for preparing data at any scale in ad-hoc analytics, data virtualization, data warehousing, lake houses, and ML projects.
Spark Accelerator framework ; It enables secondary indices to remote data stores.
A repository that implements Tywaves: enabling a type-based waveform debugging for Chisel and Tydi-Chisel. Mapping from Chisel level code to values dumped by simulators is now possible thanks to Tywav...
This repository goes over how to handle massive variety in data engineering