Override bash executable, defaulting to Git for Windows git bash over WSL bash#1791
Override bash executable, defaulting to Git for Windows git bash over WSL bash#1791emanspeaks wants to merge 8 commits into
Conversation
Byron
left a comment
There was a problem hiding this comment.
Thanks a lot for tackling this and for explaining the reasoning so very well!
@EliahKagan already prepared me for this case and I hope he can take a look at this PR as well to see if it fits into the big picture.
Sorry, something went wrong.
I will try to look at this in detail today or tomorrow. |
Sorry, something went wrong.
There was a problem hiding this comment.
Thank you for bearing with my delay in reviewing this, and I apologize for any inconvenience. I strongly agree with the idea here, and I think this pull request will be suitable for merging, after some changes.
As you say, running hook scripts in WSL is not a good default, and it often fails or causes incorrect behavior. I think the problems with the way GitPython currently does it are severe enough to justify changing the default in a bugfix release. In my opinion, using WSL automatically even when a Git Bash shell is available is a bug, is usually not what users intend, and should be changed even though doing so may break some existing code that relies on the current behavior. (In #1745 and #1777 I had expressed hope that this could somehow be fixed in a way that does not break backward compatibility at all, for anyone, but I realize that is not realistic.) The ability to restore the old behavior by setting GIT_PYTHON_BASH_EXECUTABLE to bash.exe should be clearly documented.
The case for using the Git for Windows bash.exe when available is actually even stronger than it seems. Unlike on other systems, path search on Windows differs substantially depending on whether a shell is used. That is, running a program in cmd.exe resolves commands differently from a direct call to CreateProcessW. One effect is that Popen (and subprocess.run, etc.) on Windows without shell=True searches a few directories prior to those listed in the PATH environment variable. One such directory is System32, even if it is not in PATH or no matter what is listed before it. So even when a "native" bash.exe is in a PATH directory listed before System32, the WSL bash.exe in System32 is usually what Popen runs. (I say "usually" as there are other subtleties.) Furthermore, the WSL bash.exe often exists in System32 even if no user has intentionally set up WSL and even if no WSL distributions are installed that would let it succeed.
Using the Git for Windows bash.exe whenever it is available would solve those problems, as well as the ones you described, and the further problem that using the WSL bash can hurt performance as it may need to start a WSL system. This is also much closer to the behavior that users are likely to expect, because Git for Windows itself runs hooks with a #!/bin/sh or #!/bin/bash shebang (and some others) with its bash.exe.
I also agree with much of the strategy this takes. However, there are some changes that should be made, to avoid introducing new untrusted search path vulnerabilities, similar to CVE-2023-40590 or CVE-2024-22190, that result in arbitrary code execution; to cover more of the common ways Git for Windows may be installed and run, while also avoiding using a bash.exe associated with a separate git from the one GitPython will use; and to ensure no code that uses GitPython without using hooks can break as a result of changes made here. That is all covered among my comments below. There are also two considerations not covered in review comments:
A fix for CVE-2024-22190 (#1792) was recently merged. That was after this pull request was initially opened, so this will have to be updated to get that bugfix, and to help prevent that security bug from being inadvertently reintroduced. This is also needed to resolve conflicts, which fortunately are very minor, and to get the changes from #1803 without which subsequent CI runs would fail (#1802). Although the guidelines don't express a preference on how PRs should be updated, based on #1659 (comment) and how minor the conflicts are, I recommend rebasing (maybe @Byron can speak to this more definitively). Updating this PR won't automatically fix new untrusted search path vulnerabilities the current code in this PR would add, but I've left review comments on the specific affected code about how that can be done.
The tests should be modified to ensure these changes work as expected. I am unsure if any new tests are needed. The main thing I'd suggest doing is to remove all WSL-related xfail mark decorators from test functions in test/test_index.py, and remove the "Setup WSL" step from the pythonpackage.yml workflow. Currently, the WSL bash.exe is being used on CI, even with these changes. This is because, as written, the changes here make assumptions Git for Windows does not always satisfy, including as it is used on the Windows runners for GitHub Actions. As commented below, that can be fixed. I suggest removing marks and the "Setup WSL" step, so all hook tests fail on CI due to the WSL bash.exe having no distribution to run bash in (see 9717b8d), before fixing the bug in how the Git for Windows bash.exe is found, and verifying that this makes the tests pass.
Sorry, something went wrong.
|
Thanks for all the comments! Been busy with the day job the last week or so, but planning to try to address these things this weekend so we can get another round of reviews going and get it merged. |
Sorry, something went wrong.
|
I believe I have addressed all the comments. I removed the xfails from the unit tests, and had assertion errors in |
Sorry, something went wrong.
There was a problem hiding this comment.
Thanks for the revisions! I agree that this addresses everything from the first review, and I think this is very close to being ready to merge.
I think the the only tricky remaining issue is what to do in test_uses_shell_not_from_cwd. I believe that one of the reasonable approaches is to leave it as it is, possibly removing the commented-out code altogether, and possibly adding a comment about future directions. But I'm also aware of two other reasonable approaches. I've left a review comment there that covers all three, and that also tries to answer the question of what that test is testing. Further information and context on what that test is checking for can be found in the "When hook scripts are run" subsection of CVE-2024-22190.
Aside from that, I recommend deleting the code that this PR currently comments out. You may have been intending to do that anyway, but I've posted review comments where applicable for it. It should not be necessary to install a WSL distribution on CI anymore, due to the improvements you've made in this PR. Furthermore, at least outside of tests related to security or the ability to signal a failed hook run, I don't think there is any need to cover WSL in the tests anymore (thus none of the WSL-related xfail decorations should be needed). If it turns out some such code will be needed again, it can be found in the commit history and restored as needed, with whatever modifications end up being required for it.
I've also noticed and included review comments about a few other small things.
Sorry, something went wrong.
+ Test a successful refresh with a relative path, which will be safer to do once the refresh tests restore changed state. This is incomplete, because while it is probably not necessary while running the test suite to preserve an old False value of git.GIT_OK (most tests can't work if that happens anyway), the value of Git.GIT_PYTHON_GIT_EXECUTABLE is not the only other global state that effects the behavior of subsequently run tests and that may be changed as a result of the refresh tests. 1. After the git.refresh function calls git.cmd.Git.refresh, it calls git.remote.FetchInfo.refresh, which rebinds the git.remote.FetchInfo._flag_map attribute. 2. Future changes to git.cmd.Git.refresh may mutate other state in the Git class, and ideally the coupling would be loose enough that the refresh tests wouldn't have to be updated for that if the behavior being tested does not change. 3. Future changes to git.refresh may perform other refreshing actions, and ideally it would be easy (and obvious) what has to be done to patch it back. In particular, it will likely call another Git method that mutates class-wide state due to gitpython-developers#1791, and for such state that is also of the Git class, ideally no further changes would have to be made to the code that restores state after the refresh tests. If we assume git.refresh is working at least in the case that it is called with no arguments, then the cleanup can just be a call to git.refresh(). Otherwise, sufficiently general cleanup may be more complicated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Nudge |
Sorry, something went wrong.
|
I believe this PR is waiting for the author, there is a lot of pending review comments and conflicts by now. |
Sorry, something went wrong.
Rather than hard-coding `bash` on all systems as the fallback interpreter when a fixture script cannot be run directly, this falls back in an operating system specific manner: - Except on Windows, always fall back to `bash`, as before. - On Windows, run `git --exec-path` to find the `git-core` directory. Then check if a `bash.exe` exists at the expected location relative to that. In Git for Windows installations, this will usually work. If so, use that path (with `..` components resolved away). - On Windows, if a specific `bash.exe` is not found in that way, then fall back to using the relative path `bash.exe`. This is to preserve the ability to run `bash` on Windows systems where it may have worked before even without `bash.exe` in an expected location provided by a Git for Windows installation. (The distinction between `bash` and `bash.exe` is only slightly significant: we check for the existence of the interpreter without initially running it, and that check requires the full filename. It is called `bash.exe` elsewhere for consistency both with the checked-for executable and for consistencey with how we run most other programs on Windows, e.g., the `git` vs. `git.exe`.) This fixes GitoxideLabs#1359. That bug is not currently observed on CI, but this change is verified to fix it on a local test system where it previously always occurred when running the test suite from PowerShell in an unmodified environment. The fix applies both with `GIX_TEST_IGNORE_ARCHIVES` unset, in which case there are now no failures, and with `GIX_TEST_IGNORE_ARCHIVES=1`, in which case the failures are now limited to the 15 cases tracked in GitoxideLabs#1358. Previously, fixture scripts had been run on Windows with whatever `bash` was found in a `PATH` search, which had two problems: - On most Windows systems, even if no WSL distribution is installed and even if WSL itself is not set up, the `System32` directory contains a `bash.exe` program associated with WSL. This program attempts to use WSL to run `bash` in an installed distribution. The `wsl.exe` program also provides this functionality and is favored for this purpose, but the `bash.exe` program is still present and is likely to remain for many years for compatibility. Even when this `bash` is usable, it is not suited for running most shell scripts meant to operate on the native Windows system. In particular, it is not suitable for running our fixture scripts, which need to use the native `git` to prepare fixtures to be used natively, among other requirements that would not be satisfied with WSL (except when the tests are actually running in WSL). Since some fixtures are `.gitignore`d because creating them on the test system (rather than another system) is part of the test, this has caused breakage in most Windows environments unless `PATH` is modified -- either explicitly or by testing in an MSYS2 environment, such as the Git Bash environment -- whether or not `GIX_TEST_IGNORE_ARCHIVES` is set. This was the cause of GitoxideLabs#1359. - Although using a Git Bash environment or otherwise adjusting the path *currently* works, the reasons it works are subtle and rely on non-guaranteed behavior of `std::process::Command` path search that may change without warning. On Windows, processes are created by calling the `CreateProcessW` API function. `CreateProcessW` is capable of performing a `PATH` search, but this `PATH` search is not secure in most uses, since it includes the current directory (and searches it before `PATH` directories) unless `NoDefaultCurrentDirectoryInExePath` is set in the caller's environment. While it is the most relevant to security, the CWD is not the only location `CreateProcessW` searches before searching `PATH` directories (and regardless of where, if anywhere, they may also appear in `PATH`). Another such location is the `System32` directory. This is to say that, even when another directory with `bash.exe` precedes `System32` in `PATH`, an executable search will still find the WSL-associated `bash.exe` in `System32` unless it deviates from the algorithm `CreateProcessW` uses. To avoid including the CWD in the search, `std::process::Command` performs its own path search, then passes the resolved path to `CreateProcessW`. The path search it performs is currently almost the same the algorithm `CreateProcessW` uses, other than not automatically including the CWD. But there are some other subtle differences. One such difference is that, when the `Command` instance is configured to create a modified child environment (for example, by `env` calls), the `PATH` for the child is searched early on. This precedes a search of the `System32` directory. It is done even if none of the customizations of the child environment modify its `PATH`. This behavior is not guaranteed, and it may change at any time. It is also the behavior we rely on inadvertently every time we run `bash` on Windows with a `std::process::Command` instance constructed by passing `bash` or `bash.exe` as the `program` argument: it so happens that we are also customizing the child environment, and due to implementation details in the Rust standard library, this manages to find a non-WSL `bash` when the tests are run in Git Bash, in GitHub Actions jobs, and in some other cases. If in the future this is not done, or narrowed to be done only when `PATH` is one of the environment variables customized for the child process, then putting the directory with the desired `bash.exe` earlier than the `System32` directory in `PATH` will no longer prevent `std::proces::Command` from finding the `bash.exe` in `System32` as `CreateProcessW` would and using it. Then it would be nontrivial to run the test suite on Windows. For references and other details, see GitoxideLabs#1359 and comments including: GitoxideLabs#1359 (comment) On the approach of finding the Git for Windows `bash.exe` relative to the `git-core` directory, see the GitPython pull request gitpython-developers/GitPython#1791. Two possible future enhancements are *not* included in this commit: 1. This only modifies how test fixture scripts are run. It only affects the behavior of `gix-testtools`, and not of any other gitoxide crates such as `gix-command`. This is because: - While gitoxide uses information from `git` to find out where it is installed, mainly so we know where to find installation level configuration, we cannot in assume that `git` is present at all. Unlike GitPython, gitoxide is usable without `git`. - We know our test fixture scripts are all (at least currently) `bash` scripts, and this seems likely for other software that currently uses this functionality of `gix-testtools`. But scripts that are run as hooks, or as custom commands, or filters, etc., are often written in other languages, such as Perl. (The fallback here does not examine leading `#!` lines.) - Although a `bash.exe` located at the usual place relative to (but outside of) the `git-core` directory is usually suitable, there may be scenarios where running an executable found this way is not safe. Limiting it to `gix-testtools` pending further research may help mitigate this risk. 2. As in other runs of `git` by `gix-testools`, this calls `git.exe`, letting `std::process::Command` do an executable search, but not trying any additional locations where Git is known sometimes to be installed. This does not find `git.exe` in as many situations as `gix_path::env::exe_invocation` does. The reasons for not (or not quite yet) including that change are: - It would add `gix-path` as a dependency of `gix-testtools`. - Finding `git` in a `std::process::Command` path search is an established (though not promised) approach in `gix-testtools`, including to run `git --exec-path` (to find `git-daemon`). - It is not immediately obvious that `exe_invocation` behavior is semantically correct for `gix-testtools`, though it most likely is reasonable. The main issue is that, in many cases where `git` itself runs scripts, it prepends the path to the `git-core` directory to the `PATH` environment variable for the script. This directory has a `git` (or `git.exe`) executable in it, so scripts run an equivalent `git` associated with the same installation. In contrast, when we run test fixture scripts with a `bash.exe` associated with a Git for Windows installation, we do not customize its path. Since top-level scripts written to use `git` but not to be used *by* `git` are usually written without the expectation of such an environment, prepending this will not necessarily be an improvement.
Rather than hard-coding `bash` on all systems as the fallback interpreter when a fixture script cannot be run directly, this falls back in an operating system specific manner: - Except on Windows, always fall back to `bash`, as before. - On Windows, run `git --exec-path` to find the `git-core` directory. Then check if a `bash.exe` exists at the expected location relative to that. In Git for Windows installations, this will usually work. If so, use that path (with `..` components resolved away). - On Windows, if a specific `bash.exe` is not found in that way, then fall back to using the relative path `bash.exe`. This is to preserve the ability to run `bash` on Windows systems where it may have worked before even without `bash.exe` in an expected location provided by a Git for Windows installation. (The distinction between `bash` and `bash.exe` is only slightly significant: we check for the existence of the interpreter without initially running it, and that check requires the full filename. It is called `bash.exe` elsewhere for consistency both with the checked-for executable and for consistencey with how we run most other programs on Windows, e.g., the `git` vs. `git.exe`.) This fixes GitoxideLabs#1359. That bug is not currently observed on CI, but this change is verified to fix it on a local test system where it previously always occurred when running the test suite from PowerShell in an unmodified environment. The fix applies both with `GIX_TEST_IGNORE_ARCHIVES` unset, in which case there are now no failures, and with `GIX_TEST_IGNORE_ARCHIVES=1`, in which case the failures are now limited to the 15 cases tracked in GitoxideLabs#1358. Previously, fixture scripts had been run on Windows with whatever `bash` was found in a `PATH` search, which had two problems: - On most Windows systems, even if no WSL distribution is installed and even if WSL itself is not set up, the `System32` directory contains a `bash.exe` program associated with WSL. This program attempts to use WSL to run `bash` in an installed distribution. The `wsl.exe` program also provides this functionality and is favored for this purpose, but the `bash.exe` program is still present and is likely to remain for many years for compatibility. Even when this `bash` is usable, it is not suited for running most shell scripts meant to operate on the native Windows system. In particular, it is not suitable for running our fixture scripts, which need to use the native `git` to prepare fixtures to be used natively, among other requirements that would not be satisfied with WSL (except when the tests are actually running in WSL). Since some fixtures are `.gitignore`d because creating them on the test system (rather than another system) is part of the test, this has caused breakage in most Windows environments unless `PATH` is modified -- either explicitly or by testing in an MSYS2 environment, such as the Git Bash environment -- whether or not `GIX_TEST_IGNORE_ARCHIVES` is set. This was the cause of GitoxideLabs#1359. - Although using a Git Bash environment or otherwise adjusting the path *currently* works, the reasons it works are subtle and rely on non-guaranteed behavior of `std::process::Command` path search that may change without warning. On Windows, processes are created by calling the `CreateProcessW` API function. `CreateProcessW` is capable of performing a `PATH` search, but this `PATH` search is not secure in most uses, since it includes the current directory (and searches it before `PATH` directories) unless `NoDefaultCurrentDirectoryInExePath` is set in the caller's environment. While it is the most relevant to security, the CWD is not the only location `CreateProcessW` searches before searching `PATH` directories (and regardless of where, if anywhere, they may also appear in `PATH`). Another such location is the `System32` directory. This is to say that, even when another directory with `bash.exe` precedes `System32` in `PATH`, an executable search will still find the WSL-associated `bash.exe` in `System32` unless it deviates from the algorithm `CreateProcessW` uses. To avoid including the CWD in the search, `std::process::Command` performs its own path search, then passes the resolved path to `CreateProcessW`. The path search it performs is currently almost the same the algorithm `CreateProcessW` uses, other than not automatically including the CWD. But there are some other subtle differences. One such difference is that, when the `Command` instance is configured to create a modified child environment (for example, by `env` calls), the `PATH` for the child is searched early on. This precedes a search of the `System32` directory. It is done even if none of the customizations of the child environment modify its `PATH`. This behavior is not guaranteed, and it may change at any time. It is also the behavior we rely on inadvertently every time we run `bash` on Windows with a `std::process::Command` instance constructed by passing `bash` or `bash.exe` as the `program` argument: it so happens that we are also customizing the child environment, and due to implementation details in the Rust standard library, this manages to find a non-WSL `bash` when the tests are run in Git Bash, in GitHub Actions jobs, and in some other cases. If in the future this is not done, or narrowed to be done only when `PATH` is one of the environment variables customized for the child process, then putting the directory with the desired `bash.exe` earlier than the `System32` directory in `PATH` will no longer prevent `std::proces::Command` from finding the `bash.exe` in `System32` as `CreateProcessW` would and using it. Then it would be nontrivial to run the test suite on Windows. For references and other details, see GitoxideLabs#1359 and comments including: GitoxideLabs#1359 (comment) On the approach of finding the Git for Windows `bash.exe` relative to the `git-core` directory, see the GitPython pull request gitpython-developers/GitPython#1791, its comments, and the implementation of the approach by @emanspeaks: https://github.com/gitpython-developers/GitPython/blob/f065d1fba422a528a133719350e027f1241273df/git/cmd.py#L398-L403 Two possible future enhancements are *not* included in this commit: 1. This only modifies how test fixture scripts are run. It only affects the behavior of `gix-testtools`, and not of any other gitoxide crates such as `gix-command`. This is because: - While gitoxide uses information from `git` to find out where it is installed, mainly so we know where to find installation level configuration, we cannot in assume that `git` is present at all. Unlike GitPython, gitoxide is usable without `git`. - We know our test fixture scripts are all (at least currently) `bash` scripts, and this seems likely for other software that currently uses this functionality of `gix-testtools`. But scripts that are run as hooks, or as custom commands, or filters, etc., are often written in other languages, such as Perl. (The fallback here does not examine leading `#!` lines.) - Although a `bash.exe` located at the usual place relative to (but outside of) the `git-core` directory is usually suitable, there may be scenarios where running an executable found this way is not safe. Limiting it to `gix-testtools` pending further research may help mitigate this risk. 2. As in other runs of `git` by `gix-testools`, this calls `git.exe`, letting `std::process::Command` do an executable search, but not trying any additional locations where Git is known sometimes to be installed. This does not find `git.exe` in as many situations as `gix_path::env::exe_invocation` does. The reasons for not (or not quite yet) including that change are: - It would add `gix-path` as a dependency of `gix-testtools`. - Finding `git` in a `std::process::Command` path search is an established (though not promised) approach in `gix-testtools`, including to run `git --exec-path` (to find `git-daemon`). - It is not immediately obvious that `exe_invocation` behavior is semantically correct for `gix-testtools`, though it most likely is reasonable. The main issue is that, in many cases where `git` itself runs scripts, it prepends the path to the `git-core` directory to the `PATH` environment variable for the script. This directory has a `git` (or `git.exe`) executable in it, so scripts run an equivalent `git` associated with the same installation. In contrast, when we run test fixture scripts with a `bash.exe` associated with a Git for Windows installation, we do not customize its path. Since top-level scripts written to use `git` but not to be used *by* `git` are usually written without the expectation of such an environment, prepending this will not necessarily be an improvement.
|
Closing due to age and conflicts. Please feel free to restart this with a new PR. |
Sorry, something went wrong.
Provides a mechanism to set a path to bash to use on Windows for executing commit hook commands. If none is set, the default behavior should be to prefer Git Bash from Git for Windows.
If a user is running in Windows, they probably are using Git for Windows, which includes Git Bash and should be associated with the configured Git command set in
refresh(). Search first for Git Bash relative to the configured Git command, and, if it exists, use it. Otherwise, fail back to the hardcoded default of"bash.exe".The original "default" bash without a path assumes that bash.exe is on the PATH. If a user chose not to make MSYS2 tools available on the PATH on Git for Windows install, another method must be used to find that bash if it should be preferred over one found on the PATH. Newer versions of Windows include a bash.exe in System32 that is then found by default, but this bash.exe is intended for use with WSL. If a user intended to invoke git in WSL, presumably this package would have been installed there and not registered as "Windows" and reduces the ambiguity. In most cases where running the python in "native" Windows, then, it is undesirable to invoke the WSL Bash that is not associated with the Git for Windows install.
Furthermore, direct access to bash.exe in System32 is reportedly deprecated in lieu of commands that explicitly invoke WSL.
This addresses issues where git hooks are intended to run assuming the "native" Windows environment as seen by git.exe rather than inside the git sandbox of WSL, which is likely configured independently of the Windows Git as well. A noteworthy example are repos with Git LFS, where Git LFS may be installed in Windows but not in WSL, causing commit hooks for LFS to fail despite being installed alongside the configured Git command in this package.