Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsmonitor/darwin: fix hangs for submodules #1802

Conversation

KojiNakamaru
Copy link

@KojiNakamaru KojiNakamaru commented Sep 28, 2024

cc: Ramsay Jones [email protected]

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awesome to see someone working on FSMonitor fixes.

By the way, you can ignore the osx-* failures: They are happening due to a known issue with libcurl v8.10.0 and will go away once the macos-13 runners are updated to include v8.10.1 instead (I expected this to be happening on Sep 23rd or soon thereafter, but it seems that there are delays).

Comment on lines 1319 to 1322
buf = state.path_gitdir_watch.buf;
len = state.path_gitdir_watch.len;
if (len >= 2 && buf[len - 2] == '/' && buf[len - 1] == '.')
strbuf_setlen(&state.path_gitdir_watch, len - 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using strbuf_strip_suffix() instead?

strbuf_strip_suffix(&state.path_gitdir_watch, "/.");

Also, did you want to add tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll update the commit.

@KojiNakamaru KojiNakamaru force-pushed the fix/fsmonitor-darwin-hangs-for-submodules branch from 142c518 to 1889cbb Compare September 28, 2024 15:41
@KojiNakamaru
Copy link
Author

KojiNakamaru commented Sep 29, 2024

/preview

Copy link

gitgitgadget bot commented Sep 29, 2024

Preview email sent as [email protected]

Copy link

gitgitgadget bot commented Sep 29, 2024

Preview email sent as [email protected]

@KojiNakamaru
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Sep 29, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v1

To fetch this version to local tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v1

Copy link

gitgitgadget bot commented Sep 30, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Koji Nakamaru via GitGitGadget" <[email protected]> writes:

> From: Koji Nakamaru <[email protected]>
>
> fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
> has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
> the value with trailing "/." (as repo_get_git_dir(the_repository) on
> Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
> IS_OUTSIDE_CONE.
>
> In this case, fsevent_callback() doesn't update cookie_list so that
> fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
> not invoked.
>
> As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
> that with_lock__mark_cookies_seen() should unlock, the whole daemon
> hangs.

The above very nicely describes the cause, the mechansim that leads
to the end-user observable effect, and the (bad) effect the bug has.

I wish everybody wrote their proposed commit messages like this ;-)

> Remove trailing "/." from state->path_gitdir_watch.buf for submodules
> and add a corresponding test in t7527-builtin-fsmonitor.sh.
>
> Helped-by: Johannes Schindelin <[email protected]>
> Signed-off-by: Koji Nakamaru <[email protected]>
> ---
>     fsmonitor/darwin: fix hangs for submodules

> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index 730f3c7f810..7acd074a97f 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -82,6 +82,28 @@ have_t2_data_event () {
>  	grep -e '"event":"data".*"category":"'"$c"'".*"key":"'"$k"'"'
>  }
>  
> +start_git_in_background () {
> +	git "$@" &
> +	git_pid=$!
> +	nr_tries_left=10
> +	while true
> +	do
> +		if test $nr_tries_left -eq 0
> +		then
> +			kill $git_pid
> +			exit 1
> +		fi
> +		sleep 1
> +		nr_tries_left=$(($nr_tries_left - 1))
> +	done > /dev/null 2>&1 &

So, the command is allowed to run for 10 seconds and then a signal
is sent to the process (by the way, we do not write the SP between
">" and "/dev/null").

> +	watchdog_pid=$!
> +	wait $git_pid

And the process to ensure the command gets killed in 10 seconds is
called the "watchdog".  We let the command run for completion (and
we'd be happy if it did without watchdog needing to forcibly kill
it).

Which means that even after the test finishes normally (e.g., the
command completes without getting killed by the watchdog, because it
is on a fast box and finishes in 0.5 second), we have leftover
watchdog process hanging around for 10 seconds, which might interfere
with the removal of the $TRASH_DIRECTORY at the end of the test.

There is a helper function to kill both (below), which probably is
used to avoid it.  Let's keep reading.

> +}
> +
> +stop_git_and_watchdog () {
> +	kill $git_pid $watchdog_pid
> +}

This sends a signal and let the process die.  Without waiting to
make sure they indeed died, at which point we can safely remove the
$TRASH_DIRECTORY on filesystems that refuse to remove a directory
when a process still has it as its current working directory.

Shouldn't it loop, like

	for pid in $git_pid $watchdog_pid
	do
                until kill -0 $pid
                do
                        kill $pid
                done
	done

or something?  Or is there a mechanism already to ensure that we
return after they get killed that I am failing to find?

>  test_expect_success 'explicit daemon start and stop' '
>  	test_when_finished "stop_daemon_delete_repo test_explicit" &&
>  
> @@ -907,6 +929,23 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" '
>  	test_subcommand git fsmonitor--daemon start <super-sub.trace
>  '
>  
> +test_expect_success "submodule implicitly starts daemon by pull" '
> +	test_atexit "stop_git_and_watchdog" &&

Hmph, this is _atexit and not _when_finished because...?

> +	test_when_finished "rm -rf cloned; \
> +			    rm -rf super; \
> +			    rm -rf sub" &&

Makes me wonder why it is not written like so:

	test_when_finished "rm -rf cloned super sub" &&

which is short enough to still fit on a line.  Is there something I
am missing that these directories must be removed separately and in
this order?

> +	create_super super &&
> +	create_sub sub &&
> +
> +	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
> +	git -C super commit -m "add sub" &&
> +	git clone --recurse-submodules super cloned &&
> +
> +	git -C cloned/dir_1/dir_2/sub config core.fsmonitor true &&
> +	start_git_in_background -C cloned pull --recurse-submodules
> +'

Other than that, very nicely done.

Thanks.

>  # On a case-insensitive file system, confirm that the daemon
>  # notices when the .git directory is moved/renamed/deleted
>  # regardless of how it is spelled in the FS event.
>
> base-commit: 3857aae53f3633b7de63ad640737c657387ae0c6

Copy link

gitgitgadget bot commented Oct 1, 2024

On the Git mailing list, Koji Nakamaru wrote (reply to this):

Thank you very much for carefully checking the patch and suggesting
better ways. I'll later revise it and submit a new one.

> > +}
> > +
> > +stop_git_and_watchdog () {
> > +     kill $git_pid $watchdog_pid
> > +}
>
> This sends a signal and let the process die.  Without waiting to
> make sure they indeed died, at which point we can safely remove the
> $TRASH_DIRECTORY on filesystems that refuse to remove a directory
> when a process still has it as its current working directory.
>
> Shouldn't it loop, like
>
>         for pid in $git_pid $watchdog_pid
>         do
>                 until kill -0 $pid
>                 do
>                         kill $pid
>                 done
>         done
>
> or something?  Or is there a mechanism already to ensure that we
> return after they get killed that I am failing to find?

I agree that we have to wait for pids. I also realized that we should
run git in another process group and kill the group for killing all git
child processes. I'll fix the code.

> >  test_expect_success 'explicit daemon start and stop' '
> >       test_when_finished "stop_daemon_delete_repo test_explicit" &&
> >
> > @@ -907,6 +929,23 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" '
> >       test_subcommand git fsmonitor--daemon start <super-sub.trace
> >  '
> >
> > +test_expect_success "submodule implicitly starts daemon by pull" '
> > +     test_atexit "stop_git_and_watchdog" &&
>
> Hmph, this is _atexit and not _when_finished because...?

This is because README describes _atexit to run unconditionally to clean
up before the test script exits, e.g. to stop (kill) a daemon. More
appropriately, we should kill git before "rm -rf cloned super sub" in
_when_finished and kill watchdog in _atexit. I'll adjust the code.

> > +     test_when_finished "rm -rf cloned; \
> > +                         rm -rf super; \
> > +                         rm -rf sub" &&
>
> Makes me wonder why it is not written like so:
>
>         test_when_finished "rm -rf cloned super sub" &&
>
> which is short enough to still fit on a line.  Is there something I
> am missing that these directories must be removed separately and in
> this order?

There is no special reason, I simply followed the style used in
t7527-builtin-fsmonitor.sh. I'll fix this part.


Koji Nakamaru

@KojiNakamaru KojiNakamaru force-pushed the fix/fsmonitor-darwin-hangs-for-submodules branch from 1889cbb to decf684 Compare October 1, 2024 04:21
@KojiNakamaru
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Oct 1, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v2

To fetch this version to local tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v2

Copy link

gitgitgadget bot commented Oct 1, 2024

This patch series was integrated into seen via git@f8f9e71.

@gitgitgadget gitgitgadget bot added the seen label Oct 1, 2024
Copy link

gitgitgadget bot commented Oct 1, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Koji Nakamaru via GitGitGadget" <[email protected]> writes:

> From: Koji Nakamaru <[email protected]>
>
> fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
> has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
> the value with trailing "/." (as repo_get_git_dir(the_repository) on
> Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
> IS_OUTSIDE_CONE.
>
> In this case, fsevent_callback() doesn't update cookie_list so that
> fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
> not invoked.
>
> As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
> that with_lock__mark_cookies_seen() should unlock, the whole daemon
> hangs.
>
> Remove trailing "/." from state->path_gitdir_watch.buf for submodules
> and add a corresponding test in t7527-builtin-fsmonitor.sh.
>
> Suggested-by: Johannes Schindelin <[email protected]>
> Suggested-by: Junio C Hamano <[email protected]>

In none of the changes described above, I have any input to deserve
such credit, though.

> +start_git_in_background () {
> +	git "$@" &
> +	git_pid=$!
> +	git_pgid=$(ps -o pgid= -p $git_pid)
> +	nr_tries_left=10
> +	while true
> +	do
> +		if test $nr_tries_left -eq 0
> +		then
> +			kill -- -$git_pgid
> +			exit 1
> +		fi
> +		sleep 1
> +		nr_tries_left=$(($nr_tries_left - 1))
> +	done >/dev/null 2>&1 &
> +	watchdog_pid=$!
> +	wait $git_pid
> +}
> +
> +stop_git () {
> +	while kill -0 -- -$git_pgid
> +	do
> +		kill -- -$git_pgid
> +		sleep 1
> +	done
> +}

On the "git" side you use process group because you expect that
"git" would spawn subprocesses and you want to catch all of them,
...

> +stop_watchdog () {
> +	while kill -0 $watchdog_pid
> +	do
> +		kill $watchdog_pid
> +		sleep 1
> +	done
> +}

... but "watchdog" you know is a single process, so you'd only need
a single process id, is that the idea?

What is the motivation behind the change in this iteration to use
process group?  Was it observed that leftover processes hang around
if we killed only the $git_pid, or something?

> +test_expect_success "submodule implicitly starts daemon by pull" '
> +	test_atexit "stop_watchdog" &&
> +	test_when_finished "stop_git && rm -rf cloned super sub" &&

If stop_git ever returns with non-zero status, "rm -rf" will be
skipped, which I am not sure is a good idea.

The whole test_when_finished would fail in such a case, so you would
notice the problem right away, which is a plus, though.

> +	create_super super &&
> +	create_sub sub &&
> +
> +	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
> +	git -C super commit -m "add sub" &&
> +	git clone --recurse-submodules super cloned &&
> +
> +	git -C cloned/dir_1/dir_2/sub config core.fsmonitor true &&
> +	set -m &&

I have to wonder how portable (and necessary) this is.

POSIX says it shall be supported if the implementation supports the
User Portability Utilities option.  It also says that it was added
to apply only to the UPE because it applies primarily to interactive
use, not shell script applications.  And our test scripts are of
course not interactive.

Thanks.

Copy link

gitgitgadget bot commented Oct 1, 2024

On the Git mailing list, Koji Nakamaru wrote (reply to this):

On Tue, Oct 1, 2024 at 8:57 PM Junio C Hamano <[email protected]> wrote:

> > fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
> > has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
> > the value with trailing "/." (as repo_get_git_dir(the_repository) on
> > Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
> > IS_OUTSIDE_CONE.
> >
> > In this case, fsevent_callback() doesn't update cookie_list so that
> > fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
> > not invoked.
> >
> > As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
> > that with_lock__mark_cookies_seen() should unlock, the whole daemon
> > hangs.
> >
> > Remove trailing "/." from state->path_gitdir_watch.buf for submodules
> > and add a corresponding test in t7527-builtin-fsmonitor.sh.
> >
> > Suggested-by: Johannes Schindelin <[email protected]>
> > Suggested-by: Junio C Hamano <[email protected]>
>
> In none of the changes described above, I have any input to deserve
> such credit, though.

Your points are very helpful :)

> > +start_git_in_background () {
> > + git "$@" &
> > + git_pid=$!
> > + git_pgid=$(ps -o pgid= -p $git_pid)
> > + nr_tries_left=10
> > + while true
> > + do
> > + if test $nr_tries_left -eq 0
> > + then
> > + kill -- -$git_pgid
> > + exit 1
> > + fi
> > + sleep 1
> > + nr_tries_left=$(($nr_tries_left - 1))
> > + done >/dev/null 2>&1 &
> > + watchdog_pid=$!
> > + wait $git_pid
> > +}
> > +
> > +stop_git () {
> > + while kill -0 -- -$git_pgid
> > + do
> > + kill -- -$git_pgid
> > + sleep 1
> > + done
> > +}
>
> On the "git" side you use process group because you expect that
> "git" would spawn subprocesses and you want to catch all of them,
> ...
>
> > +stop_watchdog () {
> > + while kill -0 $watchdog_pid
> > + do
> > + kill $watchdog_pid
> > + sleep 1
> > + done
> > +}
>
> ... but "watchdog" you know is a single process, so you'd only need
> a single process id, is that the idea?

Yes, that is the idea.

> What is the motivation behind the change in this iteration to use
> process group?  Was it observed that leftover processes hang around
> if we killed only the $git_pid, or something?

Yes, if the issue occurs, three processes remains:

  git fetch --update-head-ok --recurse-submodules=on

  git fetch --no-prune --no-prune-tags --update-head-ok
    --recurse-submodules --recurse-submodules-default yes
    --submodule-prefix=dir_1/dir_2/sub/

  git fsmonitor--daemon run --detach --ipc-threads=8

If there is no issue, only the fsmonitor process remains.

> > +test_expect_success "submodule implicitly starts daemon by pull" '
> > + test_atexit "stop_watchdog" &&
> > + test_when_finished "stop_git && rm -rf cloned super sub" &&
>
> If stop_git ever returns with non-zero status, "rm -rf" will be
> skipped, which I am not sure is a good idea.
>
> The whole test_when_finished would fail in such a case, so you would
> notice the problem right away, which is a plus, though.

t/README discusses that test_when_finished and test_atexit differ about
the "--immediate" option. As git and its subprocesses are the test
target, I moved stop_git to the current place. This might be however
confusing when someone later reads this test. Should we simply put
stop_git and stop_watchdong in test_atexit?

> > + create_super super &&
> > + create_sub sub &&
> > +
> > + git -C super submodule add ../sub ./dir_1/dir_2/sub &&
> > + git -C super commit -m "add sub" &&
> > + git clone --recurse-submodules super cloned &&
> > +
> > + git -C cloned/dir_1/dir_2/sub config core.fsmonitor true &&
> > + set -m &&
>
> I have to wonder how portable (and necessary) this is.
>
> POSIX says it shall be supported if the implementation supports the
> User Portability Utilities option.  It also says that it was added
> to apply only to the UPE because it applies primarily to interactive
> use, not shell script applications.  And our test scripts are of
> course not interactive.

How about the following modification? It still utilizes $git_pgid to
filter processes, but avoids "set -m".

  diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
  index 2dd1ca1a7b..23d9a7c953 100755
  --- a/t/t7527-builtin-fsmonitor.sh
  +++ b/t/t7527-builtin-fsmonitor.sh
  @@ -916,7 +916,7 @@ start_git_in_background () {
          do
                  if test $nr_tries_left -eq 0
                  then
  -                       kill -- -$git_pgid
  +                       kill $git_pid
                          exit 1
                  fi
                  sleep 1
  @@ -927,10 +927,13 @@ start_git_in_background () {
   }

   stop_git () {
  -       while kill -0 -- -$git_pgid
  +       for p in $(ps -o pgid=,pid=,comm= | grep "^$git_pgid .*git"
| sed 's/^[0-9][0-9]* \([0-9][0-9]*\) .*/\1/')
          do
  -               kill -- -$git_pgid
  -               sleep 1
  +               while kill -0 $p
  +               do
  +                       kill $p
  +                       sleep 1
  +               done
          done
   }

  @@ -954,7 +957,6 @@ test_expect_success "submodule implicitly starts
daemon by pull" '
          git clone --recurse-submodules super cloned &&

          git -C cloned/dir_1/dir_2/sub config core.fsmonitor true &&
  -       set -m &&
          start_git_in_background -C cloned pull --recurse-submodules
   '


Koji Nakamaru

Copy link

gitgitgadget bot commented Oct 1, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Koji Nakamaru <[email protected]> writes:

>> > +test_expect_success "submodule implicitly starts daemon by pull" '
>> > + test_atexit "stop_watchdog" &&
>> > + test_when_finished "stop_git && rm -rf cloned super sub" &&
>>
>> If stop_git ever returns with non-zero status, "rm -rf" will be
>> skipped, which I am not sure is a good idea.
>>
>> The whole test_when_finished would fail in such a case, so you would
>> notice the problem right away, which is a plus, though.
>
> t/README discusses that test_when_finished and test_atexit differ about
> the "--immediate" option. As git and its subprocesses are the test
> target, I moved stop_git to the current place. This might be however
> confusing when someone later reads this test. Should we simply put
> stop_git and stop_watchdong in test_atexit?

That is not what I meant.

I was merely questioning the &&-chaining that stops "rm -fr" from
running if stop_git ever fails (and your earlier iteration you had
multiple "rm -fr" ;-chained, not &&-chained---not using && is often
more appropriate in a when_finished handler).

>> > + set -m &&
>>
>> I have to wonder how portable (and necessary) this is.
>>
>> POSIX says it shall be supported if the implementation supports the
>> User Portability Utilities option.  It also says that it was added
>> to apply only to the UPE because it applies primarily to interactive
>> use, not shell script applications.  And our test scripts are of
>> course not interactive.
>
> How about the following modification? It still utilizes $git_pgid to
> filter processes, but avoids "set -m".

Nah, your original reads much better, and the code is grabbing and
using the process group information anyway (and my question about
"-m" was more about "should we be relying on process group features
in this test to kill them all?").

I am OK with the idea that we can assume, at least among the
platforms that support fsmonitor, that sending a signal to a process
group would cause the signal delivered to the member processes just
as we expect.

Thanks.

Copy link

gitgitgadget bot commented Oct 1, 2024

On the Git mailing list, Koji Nakamaru wrote (reply to this):

On Wed, Oct 2, 2024 at 3:04 AM Junio C Hamano <[email protected]> wrote:
>>> > +test_expect_success "submodule implicitly starts daemon by pull" '
>>> > + test_atexit "stop_watchdog" &&
>>> > + test_when_finished "stop_git && rm -rf cloned super sub" &&
>>>
>>> If stop_git ever returns with non-zero status, "rm -rf" will be
>>> skipped, which I am not sure is a good idea.
>>>
>>> The whole test_when_finished would fail in such a case, so you would
>>> notice the problem right away, which is a plus, though.
>>
>> t/README discusses that test_when_finished and test_atexit differ about
>> the "--immediate" option. As git and its subprocesses are the test
>> target, I moved stop_git to the current place. This might be however
>> confusing when someone later reads this test. Should we simply put
>> stop_git and stop_watchdong in test_atexit?
>
> That is not what I meant.
>
> I was merely questioning the &&-chaining that stops "rm -fr" from
> running if stop_git ever fails (and your earlier iteration you had
> multiple "rm -fr" ;-chained, not &&-chained---not using && is often
> more appropriate in a when_finished handler).

I see. I'll fix this part.

>>> > + set -m &&
>>>
>>> I have to wonder how portable (and necessary) this is.
>>>
>>> POSIX says it shall be supported if the implementation supports the
>>> User Portability Utilities option.  It also says that it was added
>>> to apply only to the UPE because it applies primarily to interactive
>>> use, not shell script applications.  And our test scripts are of
>>> course not interactive.
>>
>> How about the following modification? It still utilizes $git_pgid to
>> filter processes, but avoids "set -m".
>
> Nah, your original reads much better, and the code is grabbing and
> using the process group information anyway (and my question about
> "-m" was more about "should we be relying on process group features
> in this test to kill them all?").
>
> I am OK with the idea that we can assume, at least among the
> platforms that support fsmonitor, that sending a signal to a process
> group would cause the signal delivered to the member processes just
> as we expect.

Thank you for the clarification and the support.

Koji Nakamaru

@KojiNakamaru KojiNakamaru force-pushed the fix/fsmonitor-darwin-hangs-for-submodules branch from decf684 to aabc1c2 Compare October 1, 2024 18:47
@KojiNakamaru
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Oct 1, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v3

To fetch this version to local tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v3

Copy link

gitgitgadget bot commented Oct 1, 2024

This patch series was integrated into seen via git@923a789.

Copy link

gitgitgadget bot commented Oct 2, 2024

On the Git mailing list, Koji Nakamaru wrote (reply to this):

Although I submitted [PATCH v3], it was incomplete about the following:

On Wed, Oct 2, 2024 at 3:04 AM Junio C Hamano <[email protected]> wrote:
> I am OK with the idea that we can assume, at least among the
> platforms that support fsmonitor, that sending a signal to a process
> group would cause the signal delivered to the member processes just
> as we expect.

On windows, there is no process group so the test cannot run
correctly. As hangs corrected with the patch occur only for darwin, I
would like to skip MINGW in the test. Is it okay?

Koji Nakamaru

Copy link

gitgitgadget bot commented Oct 2, 2024

This branch is now known as kn/osx-fsmonitor-with-submodules-fix.

Copy link

gitgitgadget bot commented Oct 2, 2024

This patch series was integrated into seen via git@99031f8.

Copy link

gitgitgadget bot commented Oct 2, 2024

There was a status update in the "Cooking" section about the branch kn/osx-fsmonitor-with-submodules-fix on the Git mailing list:

macOS with fsmonitor daemon can hang forever when a submodule is
involved, which has been corrected.

Expecting a reroll.
cf. <CAOTNsDygwBkNdFX4K_ixL5kP-AvDxWWVXkvfkmV4Kh04ozdYkA@mail.gmail.com>
source: <[email protected]>

Copy link

gitgitgadget bot commented Oct 2, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Koji Nakamaru <[email protected]> writes:

> On windows, there is no process group so the test cannot run
> correctly. As hangs corrected with the patch occur only for darwin, I
> would like to skip MINGW in the test. Is it okay?

Surely.  But can we do so without spelling MINGW or WINDOWS out?

That is, if your test requires process group features available, can
we come up with a lazy prerequisite and use that to decide if we
skip the test?

Earlier in the discussion, you said who are left behind if we did so
on systems with process groups, but I wonder what happens when we
throw a signal at the top-level "git" process on Windows, and if it
behaves better, perhaps we can implement stop_git differently where
process groups are not available, instead of skipping the tests
altogether?

Thanks.

Copy link

gitgitgadget bot commented Oct 3, 2024

On the Git mailing list, Koji Nakamaru wrote (reply to this):

On Thu, Oct 3, 2024 at 3:14 AM Junio C Hamano <[email protected]> wrote:
>> On windows, there is no process group so the test cannot run
>> correctly. As hangs corrected with the patch occur only for darwin, I
>> would like to skip MINGW in the test. Is it okay?
>
> Surely.  But can we do so without spelling MINGW or WINDOWS out?
>
> That is, if your test requires process group features available, can
> we come up with a lazy prerequisite and use that to decide if we
> skip the test?

I tweaked fsm-listen-win32.c to cause hangs and tested on windows. I'm
sorry that simply saying "there is no process group" was not quite
correct. Each mingw process has a process group and its win32
subprocesses can be killed by "kill -- -$pgid"

For example, when a hang occurs, the following processes remain.

      PID    PPID    PGID     WINPID   TTY         UID    STIME
        COMMAND
    56782   40923   56782       9484  pty0     1052296 16:23:22
        /mingw64/bin/git
        # mingw git process
    78100       0       0      12564  ?              0 16:23:23
        C:\git-sdk-64\mingw64\libexec\git-core\git.exe
        # win32 process
    86108       0       0      20572  ?              0 16:23:23
        C:\git-sdk-64\mingw64\libexec\git-core\git.exe
        # win32 subprocess
    73328       0       0       7792  ?              0 16:23:23
        C:\git-sdk-64\mingw64\libexec\git-core\git.exe
        # win32 fsmonitor

> Earlier in the discussion, you said who are left behind if we did so
> on systems with process groups, but I wonder what happens when we
> throw a signal at the top-level "git" process on Windows, and if it
> behaves better, perhaps we can implement stop_git differently where
> process groups are not available, instead of skipping the tests
> altogether?

If we do "kill 56782" or "kill -- -56782" for the above example, most of
processes are terminated except the win32 fsmonitor. This is because the
win32 fsmonitor is detached by FreeConsole().

I also tried "git fsmonitor--daemon stop". It was able to communicate
with the win32 fsmonitor and the internal status of the win32 fsmonitor
changed, but the win32 daemon didn't terminate.

Because it's getting complicated, how about the following:

* specify MINGW
* note in the commit log:
  The test is disabled for MINGW because hangs treated with this patch
  occur only for Darwin and there is no simple way to terminate the
  win32 fsmonitor daemon that hangs.

Koji Nakamaru

fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
the value with trailing "/." (as repo_get_git_dir(the_repository) on
Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
IS_OUTSIDE_CONE.

In this case, fsevent_callback() doesn't update cookie_list so that
fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
not invoked.

As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
that with_lock__mark_cookies_seen() should unlock, the whole daemon
hangs.

Remove trailing "/." from state->path_gitdir_watch.buf for submodules
and add a corresponding test in t7527-builtin-fsmonitor.sh. The test is
disabled for MINGW because hangs treated with this patch occur only for
Darwin and there is no simple way to terminate the win32 fsmonitor
daemon that hangs.

Suggested-by: Johannes Schindelin <[email protected]>
Suggested-by: Junio C Hamano <[email protected]>
Signed-off-by: Koji Nakamaru <[email protected]>
Copy link

gitgitgadget bot commented Oct 3, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Koji Nakamaru <[email protected]> writes:

> Because it's getting complicated, how about the following:
>
> * specify MINGW
> * note in the commit log:
>   The test is disabled for MINGW because hangs treated with this patch
>   occur only for Darwin and there is no simple way to terminate the
>   win32 fsmonitor daemon that hangs.

Sounds good to me.

Thanks.

Copy link

gitgitgadget bot commented Oct 3, 2024

This patch series was integrated into seen via git@951056a.

@KojiNakamaru KojiNakamaru force-pushed the fix/fsmonitor-darwin-hangs-for-submodules branch from aabc1c2 to 7b7224e Compare October 3, 2024 23:02
@KojiNakamaru
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Oct 4, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v4

To fetch this version to local tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v4

Copy link

gitgitgadget bot commented Oct 4, 2024

On the Git mailing list, Koji Nakamaru wrote (reply to this):

On Fri, Oct 4, 2024 at 2:44 AM Junio C Hamano <[email protected]> wrote:
> > Because it's getting complicated, how about the following:
> >
> > * specify MINGW
> > * note in the commit log:
> >   The test is disabled for MINGW because hangs treated with this patch
> >   occur only for Darwin and there is no simple way to terminate the
> >   win32 fsmonitor daemon that hangs.
>
> Sounds good to me.

Thank you. I've submitted [PATCH v4].

Koji Nakamaru

Copy link

gitgitgadget bot commented Oct 4, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Koji Nakamaru via GitGitGadget" <[email protected]> writes:

> ... The test is
> disabled for MINGW because hangs treated with this patch occur only for
> Darwin and there is no simple way to terminate the win32 fsmonitor
> daemon that hangs.
> ...
>      @@ t/t7527-builtin-fsmonitor.sh: test_expect_success "submodule absorbgitdirs impli
>       +	done
>       +}
>       +
>      -+test_expect_success "submodule implicitly starts daemon by pull" '
>      ++test_expect_success !MINGW "submodule implicitly starts daemon by pull" '
>       +	test_atexit "stop_watchdog" &&
>       +	test_when_finished "stop_git; rm -rf cloned super sub" &&
>       +

Let me update !MINGW to !WINDOWS while queuing.

Copy link

gitgitgadget bot commented Oct 4, 2024

This patch series was integrated into seen via git@41ed0ab.

Copy link

gitgitgadget bot commented Oct 4, 2024

On the Git mailing list, Ramsay Jones wrote (reply to this):

On 04/10/2024 18:44, Junio C Hamano wrote:
> "Koji Nakamaru via GitGitGadget" <[email protected]> writes:
> 
>> ... The test is
>> disabled for MINGW because hangs treated with this patch occur only for
>> Darwin and there is no simple way to terminate the win32 fsmonitor
>> daemon that hangs.
>> ...
>>      @@ t/t7527-builtin-fsmonitor.sh: test_expect_success "submodule absorbgitdirs impli
>>       +	done
>>       +}
>>       +
>>      -+test_expect_success "submodule implicitly starts daemon by pull" '
>>      ++test_expect_success !MINGW "submodule implicitly starts daemon by pull" '
>>       +	test_atexit "stop_watchdog" &&
>>       +	test_when_finished "stop_git; rm -rf cloned super sub" &&
>>       +
> 
> Let me update !MINGW to !WINDOWS while queuing.
> 

While this won't hurt, this test file is skipped on cygwin:

[23:19:33] t7527-builtin-fsmonitor.sh ......................... skipped: fsmonitor--daemon is not supported on this platform

(my eternal TODO list has an 'fsmonitor on cygwin?' item ...)

Thanks.

ATB,
Ramsay Jones


Copy link

gitgitgadget bot commented Oct 4, 2024

User Ramsay Jones <[email protected]> has been added to the cc: list.

Copy link

gitgitgadget bot commented Oct 4, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ramsay Jones <[email protected]> writes:

> While this won't hurt, this test file is skipped on cygwin:
>
> [23:19:33] t7527-builtin-fsmonitor.sh ......................... skipped: fsmonitor--daemon is not supported on this platform
>
> (my eternal TODO list has an 'fsmonitor on cygwin?' item ...)

Thanks, then let's not bother.

Copy link

gitgitgadget bot commented Oct 4, 2024

This patch series was integrated into seen via git@e8b19c8.

Copy link

gitgitgadget bot commented Oct 4, 2024

This patch series was integrated into next via git@5a9a877.

@gitgitgadget gitgitgadget bot added the next label Oct 4, 2024
Copy link

gitgitgadget bot commented Oct 5, 2024

This patch series was integrated into seen via git@2ab53b5.

Copy link

gitgitgadget bot commented Oct 5, 2024

This patch series was integrated into master via git@2ab53b5.

Copy link

gitgitgadget bot commented Oct 5, 2024

This patch series was integrated into next via git@2ab53b5.

@gitgitgadget gitgitgadget bot added the master label Oct 5, 2024
@gitgitgadget gitgitgadget bot closed this Oct 5, 2024
Copy link

gitgitgadget bot commented Oct 5, 2024

Closed via 2ab53b5.

Copy link

gitgitgadget bot commented Oct 5, 2024

On the Git mailing list, Koji Nakamaru wrote (reply to this):

On Sat, Oct 5, 2024 at 2:44 AM Junio C Hamano <[email protected]> wrote:
> > ... The test is
> > disabled for MINGW because hangs treated with this patch occur only for
> > Darwin and there is no simple way to terminate the win32 fsmonitor
> > daemon that hangs.
> > ...
> >      @@ t/t7527-builtin-fsmonitor.sh: test_expect_success "submodule absorbgitdirs impli
> >       +       done
> >       +}
> >       +
> >      -+test_expect_success "submodule implicitly starts daemon by pull" '
> >      ++test_expect_success !MINGW "submodule implicitly starts daemon by pull" '
> >       +       test_atexit "stop_watchdog" &&
> >       +       test_when_finished "stop_git; rm -rf cloned super sub" &&
> >       +
>
> Let me update !MINGW to !WINDOWS while queuing.

I see, Thank you.

Koji Nakamaru

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants