Roadmap — CSharpDB

Planned direction for CSharpDB — organized by timeframe and priority. Reflects the current v3.8.0 state.

Need the full source guide? The original long-form markdown version is preserved as Roadmap Source Reference.

Near-Term Completed

Recently completed improvements to query performance, storage behavior, provider/tooling compatibility, maintenance workflows, and developer ergonomics.

No-reflection, trim-safe typed collection API via CSharpDB.Generators with GetGeneratedCollectionAsync<T>(), GeneratedCollection<T>, generated field metadata, binary direct payloads for supported shapes, and NativeAOT-friendly model registration.

Separated collection write probes from the read-side B-tree routing-cache, reused traversal scratch during insert/replace, and buffered catalog mutation bookkeeping inside explicit transactions.

Recovered covered composite-index lookup optimization for queries that can be answered entirely from the index without touching the base table.

Configurable durable commit batch window to coalesce WAL fsync calls across concurrent transactions for higher write throughput.

Deduplicate SELECT output with DISTINCT. Multi-column indexes for broader query coverage.

Use indexes for <, >, <=, >=, BETWEEN — not just equality lookups.

Cache parsed ASTs and query plans to avoid re-parsing identical SQL statements.

Open a database fully in memory, load from disk, and save committed snapshots back to disk.

Nested scalar, array-element, nested array-object, Guid, temporal, and ordered text path indexes.

Merge underflowed pages on delete to reclaim space via borrow/merge with interior collapse.

Maintenance report, REINDEX, VACUUM/compact, fragmentation analysis, and database size report.

CSharpDB.Daemon host with full gRPC coverage for SQL, schema, procedures, collections, and maintenance.

Incremental/sliced auto-checkpointing to move work off the triggering commit path.

Lazy-resident durable storage with on-demand page loading and gRPC tunable file-cache.

ANALYZE command with persisted row counts, column NDV/min/max, and initial stats-guided index selection.

BackupAsync / RestoreAsync as first-class operations across direct, HTTP, gRPC, CLI, and Admin.

Native .csdbtable snapshots with fast Admin Import / Export, download or server-path destinations, CREATE EXTERNAL TABLE, sys.external_tables, read-only scans/joins, and embedded primary-key lookup indexes.

Validate/apply maintenance workflow that rewrites existing child tables with persisted FK metadata across direct, HTTP, gRPC, CLI, and Admin.

Visual banded-report designer with grouping, sorting, expressions, aggregate functions, page settings, and printable preview.

Mid-Term In Progress

SQL feature parity, provider/tooling compatibility, and ecosystem expansion.

Done for the trusted in-process model: host-registered C# scalar functions, common SQL/Admin built-ins, trusted commands, Admin Forms/Reports/pipeline hooks, declarative form action sequences, and local Admin Forms C# code modules. Untrusted sandboxed UDF execution is intentionally out of scope.

Opt-in writable external table registrations over mutable .csdbx files, backed by CSharpDB B+tree storage and limited to INSERT, UPDATE, and DELETE in v1 while .csdbtable archives remain read-only.

ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG() for analytical queries.

Default expressions in column definitions and arbitrary expression-based constraints per column or table.

v1 support for single-column, column-level REFERENCES with optional ON DELETE CASCADE, plus metadata/tooling surfaces.

CSharpDB.Daemon now hosts the existing REST/HTTP /api surface and gRPC from one long-running process backed by the same warm daemon-hosted client. Standalone CSharpDB.Api remains supported for REST-only hosting.

Opt-in API-key mode protects REST /api/* and daemon gRPC calls with constant-time key comparison while keeping default no-auth behavior for compatibility.

Authorization, protected admin endpoint scopes, JWT/RBAC options, and TLS/mTLS deployment helpers for remote HTTP and gRPC access.

CSharpDB.Daemon can be packaged as a persistent background service across systemd, Windows Service, and launchd.

Self-contained daemon archives and install scripts ship for Windows, Linux, and macOS; dotnet tool, Docker, Homebrew, and winget distribution remain future work.

DbConnection.GetSchema() now exposes standard metadata collections for tooling and ORM schema discovery.

BINARY, NOCASE, NOCASE_AI, and ICU:<locale> collation now work across SQL and collection indexes; dedicated ordered SQL text index optimization remains future work.

Scalar subqueries, IN/EXISTS (including correlated), UNION, INTERSECT, EXCEPT across SELECT results.

Admin query builder with source canvas, join editing, design grid, SQL preview, and saved layouts.

Long-Term Future

Advanced features and fundamental architecture enhancements, including long-range items that have since shipped.

Inverted index support with tokenization, stemming, and relevance ranking.

Current phase is complete: opt-in generated models provide GetGeneratedCollectionAsync<T>, generated descriptors/index bindings, binary direct payloads for supported shapes, JSON fallback for unsupported shapes, and trim/NativeAOT smoke coverage.

Streamline NuGet/analyzer packaging, templates, onboarding docs, and project setup for the opt-in generated collection path.

Expand generator support beyond the current scalar, scalar collection, nested scalar, and nested collection-scalar shapes.

Internal row-batch transport serves as the batch-first SQL execution foundation across batch-capable result boundaries, scans, joins, and generic aggregates.

Follow writable .csdbx storage with broader external-table indexes, planner costing, and multi-column lookup/range support beyond the current archive primary-key point-lookup path.

Deep engine/page compression remains planned; application-level payload compression is available as a sample/SDK pattern without changing the storage format.

Encrypt database and WAL files with passphrase-based key management and explicit plaintext/encrypted migration/export paths; implementation must meet the database-encryption plan entry criteria before shipping.

Current phase is complete: ANALYZE-driven stats-guided costing uses internal histograms, heavy hitters, composite-prefix summaries, skew-aware estimates, correlation-aware filters/joins, non-unique lookup costing, hash build-side choice, and bounded DP join reordering.

Current phase is complete: opt-in adaptive join execution can switch eligible index nested-loop joins to hash joins and flip inner hash build sides at safe pre-emission boundaries.

Stable SQL-first diagnostics expose sys.planner_histograms, sys.planner_heavy_hitters, sys.planner_index_prefix_stats, and EXPLAIN ESTIMATE FOR <query>.

Current phase is complete: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, reusable B-tree copy utilities, and the close-out audit cover the main storage and maintenance write paths.

Advisory planner-stat persistence can stay deferred without weakening committed-row durability, and sys.table_stats.row_count_is_exact makes exact versus estimated row-count semantics explicit.

Opt-in UseDurableCommitBatchWindow(...) batches durable WAL flushes across contending in-process transactions — an expert measure-first knob rather than default behavior.

Explicit WriteTransaction conflict-detected retry flow, shared auto-commit non-insert isolation, and opt-in ConcurrentWriteTransactions for shared implicit inserts.

Opt-in concurrent write transactions now reserve shared row-id ranges and rebase hot right-edge insert pages against pending WAL images for improved insert fan-in.

Route API/daemon requests across multiple warm CSharpDB database files so independent tenants or shard keys can use separate WAL and commit paths, with v1 focused on single-shard writes and point reads.

Retained commit-log change feeds and reactive query subscriptions for read replicas, live Admin views, and event-driven applications.

Current Limitations

Known simplifications in the current implementation:

Area	Limitation
Functions and automation	CSharpDB's UDF/command model is trusted and in-process by design. Current supported surfaces include host-registered scalar functions, common built-ins, trusted commands, form/report/pipeline hooks, declarative action sequences, and local Admin Forms C# modules; untrusted sandboxed execution is intentionally out of scope
Query	Scalar/IN/EXISTS subqueries are supported, including correlated cases in WHERE, non-aggregate projection, and UPDATE/DELETE expressions; correlated subqueries are not yet supported in JOIN ON, GROUP BY, HAVING, ORDER BY, or aggregate projections
Query	UNION, INTERSECT, and EXCEPT are supported; UNION ALL is not implemented yet
Query	No window functions
Schema	No SQL DEFAULT column values or CHECK constraints yet. Foreign keys are currently v1 only: single-column, column-level REFERENCES with optional ON DELETE CASCADE; table-level/composite/deferred foreign keys and ON UPDATE actions are not implemented
Indexes	Equality lookups support current INTEGER/TEXT indexes, but ordered range-scan pushdown is still limited to single-column INTEGER index paths
RowId	Legacy table schemas without persisted high-water metadata may pay a one-time key scan on first insert
Collections	`FindByIndexAsync` supports declared field-equality lookups; `FindByPathAsync` and `FindByPathRangeAsync` support path-based queries on indexed paths; `FindAsync` remains a full scan for unindexed predicates. Generated collections require registered descriptors for existing collection indexes; unsupported generated model shapes warn and use the source-generated JSON fallback instead of binary direct payloads
External Tables	Native `.csdbtable` archives can be registered and queried as read-only external tables. Writable external tables are planned as an opt-in `.csdbx` format; current archives remain read-only, and broader external indexes, range seeks, and deeper planner costing remain planned
Networking	`CSharpDB.Daemon` now hosts both REST and gRPC from one process; named pipes remain reserved but are not implemented end to end today
Security	Remote REST and daemon gRPC support opt-in API-key authentication, defaulting to `None` for compatibility. JWT, RBAC, mTLS helpers, TLS-specific configuration, and at-rest encryption are not implemented
Admin Forms	The Forms designer/runtime supports the core generated-form and data-entry path plus trusted command-backed automation, including lifecycle events, command buttons, selected-control events, conditional UI rules, domain formula helpers, declarative action sequences, and local C# code modules. It still needs Access-parity work for responsive runtime rendering, complete inferred validation, richer form modes, additional events, advanced filtering/sorting, report/query/import/export actions, macro loops/on-error/temp vars, and broader controls
Admin Reports	The Reports designer/runtime supports the core banded preview path plus trusted command-backed preview lifecycle events, but still needs Access-parity work for bounded saved-query previews, full report output/export, parameters, richer grouping and totals semantics, conditional formatting, subreports, and broader controls
Text / Multilingual	Text is stored as UTF-8 and supports all Unicode languages; default semantics remain ordinal, but opt-in `BINARY`, `NOCASE`, `NOCASE_AI`, and `ICU:<locale>` collation are implemented for SQL and collection indexes. Dedicated ordered SQL text index optimization remains planned
Concurrency	Physical WAL commit path is still serialized at the storage boundary. Initial multi-writer support is shipped, but observed gains depend on conflict shape and whether shared auto-commit INSERT is left on the default serialized path
Storage	No page-level compression; the compression SDK sample stores compressed payloads as ordinary application-managed `BLOB` values
Storage	No at-rest encryption for database/WAL files; on-disk storage is plaintext only
Storage	Memory-mapped reads are opt-in and currently apply only to clean main-file pages; WAL-backed reads still rely on the WAL/cache path
Storage	By default, durable auto-commit single-row writes still pay a physical WAL flush per commit; opt-in `UseDurableCommitBatchWindow(...)` can trade some commit latency for higher throughput
Query	Phase-2 cost-based planning is in place: `ANALYZE`, `sys.table_stats`, `sys.column_stats`, public planner-stat diagnostics, histogram/heavy-hitter/prefix estimates, and bounded small-chain join reordering now feed join/access-path costing. Opt-in adaptive join re-optimization can react to stale-stat or parameter-sensitive join cardinality misses, while broader runtime actuals, `EXPLAIN ANALYZE`, and full mid-plan reordering remain future work
Query	Internal row-batch transport is now the default scan-heavy execution foundation across batch-capable scans, joins, aggregates, and result boundaries; remaining work is broader kernel specialization and optional SIMD-style tuning rather than missing core batch coverage

Completed Milestones

Major features already implemented and shipped:

✓ Single-file database with 4 KB page-oriented storage

✓ B+tree-backed tables and secondary indexes

✓ Write-Ahead Log with crash recovery and auto-checkpoint

✓ Concurrent snapshot-isolated readers via WAL-based MVCC

✓ Full SQL pipeline: tokenizer, parser, planner, operators

✓ JOINs (INNER, LEFT, RIGHT, CROSS), aggregates, GROUP BY, HAVING, CTEs

✓ UNION, INTERSECT, EXCEPT set operations

✓ Scalar/IN/EXISTS subqueries (incl. correlated) in filters, projections, and UPDATE/DELETE

✓ Scalar TEXT(expr) for filter-friendly text coercion

✓ Composite (multi-column) indexes

✓ Ordered integer index range scans in the fast lookup path

✓ ANALYZE with persisted table/column stats and stale-aware refresh

✓ Phase-2 cost-based query planning: statistics-guided access paths, join method/reordering, histogram/cardinality estimation

✓ Public planner diagnostics with EXPLAIN ESTIMATE and sys.planner_* catalogs

✓ Opt-in adaptive join re-optimization for eligible stale-stat and parameter-sensitive joins

✓ SELECT DISTINCT and DISTINCT aggregates

✓ SQL statement and SELECT plan caching

✓ First-class IDENTITY / AUTOINCREMENT support for INTEGER PRIMARY KEY columns

✓ Persisted table NextRowId high-water mark with compatibility fallback

✓ Batch-first SQL row-batch execution across scans, joins, aggregates, and result boundaries

✓ Views and triggers (BEFORE/AFTER on INSERT/UPDATE/DELETE)

✓ Foreign key constraints: single-column REFERENCES with optional ON DELETE CASCADE

✓ Older-database foreign-key retrofit migration across direct, HTTP, gRPC, CLI, and Admin

✓ ADO.NET provider with connection pooling and GetSchema metadata collections

✓ In-memory database mode with explicit load/save APIs

✓ Shared/private in-memory ADO.NET connections with named shared-memory hosts

✓ Document Collection API with typed Put/Get/Delete/Scan/Find

✓ Collection secondary field indexes via EnsureIndexAsync / FindByIndexAsync

✓ Binary direct-payload collection storage with direct hydration and field/path extraction

✓ Collection path indexes: nested scalar, array-element, nested array-object, Guid, temporal, ordered text

✓ Collection path query APIs: FindByPathAsync and FindByPathRangeAsync

✓ Source-generated typed collection fast path with trim-safe NativeAOT-friendly access

✓ Full-text search with tokenization, stemming, and relevance ranking

✓ Hybrid storage mode with lazy-resident durable storage and gRPC tunable file-cache

✓ Client-wide BackupAsync / RestoreAsync across direct, HTTP, gRPC, CLI, and Admin

✓ Native .csdbtable table archives with Admin Import / Export and read-only external table registration

✓ ReplaceAsync for index stores

✓ Maintenance report, REINDEX, and VACUUM flows across client, CLI, API, and Admin UI

✓ Dedicated gRPC daemon host

✓ Remote host consolidation in CSharpDB.Daemon, with REST /api and gRPC sharing one warm hosted database client

✓ Opt-in API-key protection for REST /api/* and daemon gRPC calls

✓ Daemon service packaging with self-contained archives and service install assets

✓ Storage tuning presets, bounded WAL read caching, memory-mapped reads, and sliced background checkpointing

✓ SQL executor/read-path fast paths for compact projections, broader join/index coverage, and correlated subquery filters

✓ REST API with 34+ endpoints and OpenAPI/Scalar documentation

✓ Blazor Server admin dashboard with Forms and Reports designers

✓ Trusted C# callbacks, commands, Admin automation hooks, and local Admin Forms C# code modules

✓ Interactive CLI with meta-commands and file execution

✓ Package-driven ETL pipelines with validation, dry-run, execute/resume, and Admin visual designer

✓ VS Code extension with schema explorer

✓ MCP server for AI assistant integration

✓ NativeAOT C library for cross-language FFI

✓ B+tree delete rebalancing with underflow handling

✓ Reusable snapshot reader sessions for higher concurrent-read throughput

✓ Comprehensive benchmark suite (micro, macro, stress, scaling, in-memory, shared-memory)

✓ Collection write-path performance recovery with separated read/write B-tree routing

✓ Covered composite-index fast-path optimization

✓ Durable-write commit batching for higher concurrent write throughput