DiffLine.rawContent() returns string instead of Buffer, causing non-UTF-8 encoding corruption
Description
Thanks to the nodegit maintainers for this excellent library!
This issue was debugged with the assistance of Cursor and Opus 4.5.
Current Behavior
DiffLine.rawContent() returns a JavaScript string type, but the underlying libgit2 git_diff_line.content is a raw byte pointer (const char *) that is not NUL-terminated and may contain non-UTF-8 encoded content (e.g., GBK, GB18030).
The current implementation in lib/diff_line.js:
var _rawContent = DiffLine.prototype.content; // Save original native method DiffLine.prototype.content = function() { // ... this._cache.content = Buffer.from(this.rawContent()) .slice(0, this.contentLen()) .toString("utf8"); return this._cache.content; }; DiffLine.prototype.rawContent = function() { return _rawContent.call(this); // Calls native binding };
The problem is that _rawContent (the native binding) already converts const char * to a JavaScript string, presumably using v8::String::NewFromUtf8() or similar, which assumes UTF-8 encoding.
Expected Behavior
rawContent() should return a Buffer containing the original bytes, allowing users to detect and decode the encoding themselves:
DiffLine.prototype.rawContent = function() { // Return Buffer instead of string return _rawContent.call(this); // Should return Buffer }; DiffLine.prototype.content = function() { // ... existing implementation return this.rawContent() .slice(0, this.contentLen()) .toString("utf8"); };