GitHub - apache/datafusion-java: Java bindings for Apache DataFusion
Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.
Early development: the API will change between releases. Bug reports and contributions welcome.
Install
Released to Maven Central. The JAR bundles the native library for Linux and macOS on x86_64 and aarch64. Windows users need to build from source.
Maven:
<dependency> <groupId>org.apache.datafusion</groupId> <artifactId>datafusion-java</artifactId> <version>0.1.0</version> </dependency>
Gradle:
implementation("org.apache.datafusion:datafusion-java:0.1.0")Arrow needs --add-opens=java.base/java.nio=ALL-UNNAMED on the JVM
command line. See the installation guide
for details and for building from source.
Quickstart
import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.ipc.ArrowReader; import org.apache.datafusion.DataFrame; import org.apache.datafusion.SessionContext; try (var allocator = new RootAllocator(); var ctx = new SessionContext()) { ctx.registerParquet("orders", "/path/to/orders.parquet"); try (DataFrame df = ctx.sql( "SELECT o_orderpriority, COUNT(*) AS n " + "FROM orders GROUP BY o_orderpriority"); ArrowReader reader = df.collect(allocator)) { while (reader.loadNextBatch()) { var batch = reader.getVectorSchemaRoot(); // ... } } }
SessionContext and DataFrame are AutoCloseable and not thread-safe.
Documentation
The full documentation lives under docs/source/
and is built with Sphinx (see docs/README.md for the
build steps):
- User guide — installation, the DataFrame and SQL APIs, Parquet ingestion.
- Contributor guide — build, test, code style, and how to bump the DataFusion version.
Requirements
JDK 17+. Building from source: see
docs/source/contributor-guide/development.md.
Contributing
Open an issue to discuss non-trivial changes before sending a PR. See the contributor guide.
License
Apache License 2.0. See LICENSE.txt and NOTICE.txt.