Add Brotli Compression to CoreFX
System.IO.Compression.Brotli
Introduction
Brotli is a generic-purpose lossless compression algorithm that compresses data
using a combination of a modern variant of the LZ77 algorithm, Huffman coding
and 2nd order context modeling, with a compression ratio comparable to the best
currently available general-purpose compression methods. It is similar in speed
to deflate but offers more dense compression.
The specification of the Brotli Compressed Data Format is defined in RFC 7932.
Brotli encoding is supported by most web browsers, major web servers, and some CDNs (Content Delivery Networks).
BrotliStream
Proposed API
The API surface area for BrotliStream is identical to that of DeflateStream but with added bufferSize constructors.
public partial class BrotliStream : System.IO.Stream { public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel); public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel, bool leaveOpen); public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel, bool leaveOpen, int bufferSize); public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode); public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode, bool leaveOpen); public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode, bool leaveOpen, int bufferSize); public System.IO.Stream BaseStream { get; } public override bool CanRead { get; } public override bool CanSeek { get; } public override bool CanWrite { get; } public override long Length { get; } public override long Position { get; set; } protected override void Dispose(bool disposing); public override void Flush(); public override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback asyncCallback, object asyncState); public override int EndRead(IAsyncResult asyncResult); public override int Read(byte[] array, int offset, int count); public override System.Threading.Tasks.Task<int> ReadAsync(byte[] array, int offset, int count, System.Threading.CancellationToken cancellationToken); public override long Seek(long offset, System.IO.SeekOrigin origin); public override void SetLength(long value); public override IAsyncResult BeginWrite(byte[] array, int offset, int count, AsyncCallback asyncCallback, object asyncState); public override void EndWrite(IAsyncResult asyncResult); public override void Write(byte[] array, int offset, int count); public override System.Threading.Tasks.Task WriteAsync(byte[] array, int offset, int count, System.Threading.CancellationToken cancellationToken); }
Example Usage
The BrotliStream behavior is the same as that of DeflateStream or GZipStream to allow easily converting DeflateStream/GZipStream code to use BrotliStream.
public static Stream Compress_Stream(Stream inputStream) { var outputStream = new MemoryStream(); var compressor = new BrotliStream(outputStream, CompressionMode.Compress, true); inputStream.CopyTo(compressor); compressor.Dispose(); return outputStream; } public static Stream Decompress_Stream(Stream inputStream) { var outputStream = new MemoryStream(); var decompressor = new BrotliStream(inputStream, CompressionMode.Decompress, true); decompressor.CopyTo(outputStream); decompressor.Dispose(); return outputStream; }
BrotliEncoder & BrotliDecoder
Proposed API
The goal of the streamless implementation is to provide a non-allocating, performant Brotli implementation free from Streams. It contains simple Compress/Decompress operations that return an enum indicating the success of the operation as well as static CompressFully/DecompressFully operations that allow single-pass compression/decompression without the need for a BrotliEncoder/BrotliDecoder instance.
public struct BrotliDecoder : System.IDisposable { public System.Buffers.OperationStatus Decompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } public static bool DecompressFully(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten) { bytesWritten = default(int); throw null; } public void Dispose() { } } public struct BrotliEncoder : System.IDisposable { public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } public static bool CompressData(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten) { bytesWritten = default(int); throw null; } public static bool CompressData(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten, int quality, int window) { bytesWritten = default(int); throw null; } public System.Buffers.OperationStatus CompressFinal(System.Span<byte> destination, out int bytesWritten) { bytesWritten = default(int); throw null; } public void Dispose() { } public static int GetMaximumCompressedSize(int inputSize) { throw null; } public void SetQuality(int quality) { } public void SetWindow(int window) { } }
Design Questions
Should we allow setting the Quality/Window via Set_ functions of make them constructor variables? They must be set before encoding either way.
BrotliEncoder SetQuality/SetWindows vs constructor overloads:
public struct BrotliEncoder : System.IDisposable { ... public void SetQuality(int quality) { } public void SetWindow(int window) { } } public struct BrotliEncoder : System.IDisposable { public BrotliEncoder() {} public BrotliEncoder(int quality, int window) {} ... }
Flush vs Finalize
Should there be an option for intermediate flushes or only for finalize? The main use case of an intermediate Flush is if you want to get more of the outputted bytes but aren’t yet done supplying input to the compressor.
// Allow Intermediate Flushes public partial struct BrotliEncoder : System.IDisposable { ... public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } public System.Buffers.OperationStatus CompressFinal(System.Span<byte> destination, out int bytesWritten, bool isFinished = true) { bytesWritten = default(int); throw null; } ... } // Disallow Intermediate Flushes public partial struct BrotliEncoder : System.IDisposable { ... public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } public System.Buffers.OperationStatus CompressFinal(System.Span<byte> destination, out int bytesWritten) { bytesWritten = default(int); throw null; } ... }
Allow input to Flush/Finalize?
I prefer the simpler Flush/Finalize that don’t take input, but the underlying call allows input if we decide that’s more usable. If we go that route then we could potentially just condense the API down to one function.
// Do not allow input to Finalize/Flush public partial struct BrotliEncoder : System.IDisposable { ... public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } public System.Buffers.OperationStatus CompressFinal(System.Span<byte> destination, out int bytesWritten) { bytesWritten = default(int); throw null; } ... } // Allow input to Finalize/Flush public partial struct BrotliEncoder : System.IDisposable { ... public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } public System.Buffers.OperationStatus CompressFinal(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } ... } // Allow finalization in the Compress method. public partial struct BrotliEncoder : System.IDisposable { ... public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten, bool isFinished = false) { bytesConsumed = default(int); bytesWritten = default(int); throw null; } ... }
Naming
// Static single-pass compress/decompress BrotliEncoder.TryCompress(...) vs BrotliEncoder.TryCompressData(...) vs BrotliEncoder.CompressFully(...) vs BrotliEncoder.CompressSingle vs BrotliEncoder.CompressData // Iterative compress/decompress BrotliEncoderInstance.Compress vs BrotliEncoderInstance.CompressSegment
Example Usage
public interface IOutput { Span<byte> Buffer { get; }; void Commit(int bytes); void Resize(int minimumSize); } // This code is very naive, but it does illustrate a pipe scenario public static void Compress_WithState(ReadOnlyMemory<byte>[] inputs, IOutput output) { BrotliEncoder encoder; for(int i=0; i<inputs.Length; i++) { var input = inputs[i]; while (!input.IsEmpty) { var buffer = output.Buffer; encoder.Compress(input, buffer, out int bytesConsumed, out int written); output.Commit(written); input = input.Slice(bytesConsumed); } } encoder.Flush(output, out int bytesWritten, isFinished: true); encoder.DIspo } public static void Decompress_WithState(ReadOnlySpan<byte>[] inputs, IOutput output) { BrotliDecoder decoder; for(int i=0; i<inputs.Length; i++) { var input = inputs[i]; while (!decoder.IsFinished() && !input.IsEmpty) { var buffer = output.Buffer; decoder.Decompress(input, buffer, out int bytesConsumed, out int written); output.Commit(written); input = input.Slice(bytesConsumed); } } decoder.Dispose(); } public static void Compress_WithoutState(ReadOnlySpan<byte> input, Span<byte> output) { BrotliEncoder.CompressFully(input, output, out int bytesWritten); } public static void Decompress_WithoutState(ReadOnlySpan<byte> input, Span<byte> output) { BrotliDecoder.DecompressFully(input, output, out int bytesWritten); }
Implementation
The implementation will be based around the c code provided by Google that will be inserted into our existing native Compression libraries (clrcompression (Windows) and System.IO.Compression.Native (Unix). In CoreFX we'll have a managed wrapper to pinvoke into the native brotli implementation and provide the above API around it, same as we do for zlib. See dotnet/corefxlab#1673 for a discussion on the pros of cons of a fully managed implementation and my justification for using the native approach (at least for now). Performance testing to come later, with the implementation PR.
This proposal is an evolution of the CoreFXLab implementation of Brotli.
This is a component of https://github.com/dotnet/corefx/issues/24826
PTAL: @joshfree @KrzysztofCwalina @GrabYourPitchforks @ViktorHofer @stephentoub @terrajobst @ahsonkhan @JeremyKuhne