Discussion:
HgWeb.cgi Hanging During Push
Jensen, Aaron
2012-04-19 19:05:57 UTC
Permalink
We're running Mercurial 2.1.1 under IIS using CGI on Windows 2008. We have two server-side hooks (written in PowerShell) that run on pretxnchangegroup. One checks for case collisions if a file was added in any of the incoming changesets, the other checks for merge direction when pushing merges (we have some internal rules about what branches can be merged with other branches). These hooks can take up to a couple minutes for pushes with a lot of changesets. Our repository has over 70k files, and is 17+ GB in size.

We're noticing that if developer #2 pushes while developer #1 is pushing (his python.exe CGI process has locked the repo and our hooks are running), as expected, developer #2's CGI process sits and waits for developer #1's push to finish. However, once developer #1's push succeeds, developer #2's CGI process doesn't detect that the repo is available/unlocked, and never locks the repo or runs any hooks. It just hangs, using no CPU or increasing in memory.

I would expect that developer #2 would get a message about "waiting for lock" message, but the last message Mercurial outputs is "searching for changes". Hitting CTRL+C doesn't stop the push. Developer #2 has to kill hg.exe, or I have to log into our Mercurial server and kill developer #2's CGI process. No repository corruption occurs on either the client or the server.

How can I go about debugging this problem? Does it look familiar to anyone?

<:> Aaron Jensen
Matt Mackall
2012-04-19 21:34:54 UTC
Permalink
Post by Jensen, Aaron
We're running Mercurial 2.1.1 under IIS using CGI on Windows 2008. We
have two server-side hooks (written in PowerShell) that run on
pretxnchangegroup. One checks for case collisions if a file was added
in any of the incoming changesets, the other checks for merge
direction when pushing merges (we have some internal rules about what
branches can be merged with other branches). These hooks can take up
to a couple minutes for pushes with a lot of changesets. Our
repository has over 70k files, and is 17+ GB in size.
We're noticing that if developer #2 pushes while developer #1 is
pushing (his python.exe CGI process has locked the repo and our hooks
are running), as expected, developer #2's CGI process sits and waits
for developer #1's push to finish. However, once developer #1's push
succeeds, developer #2's CGI process doesn't detect that the repo is
available/unlocked, and never locks the repo or runs any hooks. It
just hangs, using no CPU or increasing in memory.
We would actually expect the second push to fail. The push contains the
set of heads before the push starts spooling and if this doesn't match
the set after, we assume a race occurred.
Post by Jensen, Aaron
I would expect that developer #2 would get a message about "waiting
for lock" message, but the last message Mercurial outputs is
"searching for changes". Hitting CTRL+C doesn't stop the push.
Developer #2 has to kill hg.exe, or I have to log into our Mercurial
server and kill developer #2's CGI process. No repository corruption
occurs on either the client or the server.
The ctrl-c thing is odd. Probably some quirk of how "signal handling"
interacts with blocked sockets on Windows. Windows launches a separate
thread to handle the signal, so handling in the main thread may be
deferred until the read()-that-never-finishes finishes. Or something.

It's actually good to know that this is happening: it might explain why
some people are force-killing Mercurial on Windows and thereby
interrupting transactions.
Post by Jensen, Aaron
How can I go about debugging this problem? Does it look familiar to anyone?
You probably want to add some instrumentation in mercurial/wireproto.py
around here:

http://www.selenic.com/hg/file/09dd707b522a/mercurial/wireproto.py#l574

Sprinkle around some lines like:

sys.stderr.write("%d: at step X\n" % os.getpid())

I'd also recommend trying to create a simpler/faster test case, possibly
by making a hook that just sleeps for 10 seconds.
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-04-25 22:53:12 UTC
Permalink
I'd also recommend trying to create a simpler/faster test case, possibly by making a hook that just sleeps for 10 seconds.
I did this and was able to reproduce the hang. I've filed an issue on the bug tracker: http://mercurial.selenic.com/bts/issue3401, including steps to reproduce.
You probably want to add some instrumentation in mercurial/wireproto.py....
sys.stderr.write("%d: at step X\n" % os.getpid())
I'll try this next and add anything I find to the issue tracker. Where would I expect to see the output?
Matt Mackall
2012-04-25 23:14:27 UTC
Permalink
Post by Jensen, Aaron
I'd also recommend trying to create a simpler/faster test case, possibly by making a hook that just sleeps for 10 seconds.
I did this and was able to reproduce the hang. I've filed an issue on the bug tracker: http://mercurial.selenic.com/bts/issue3401, including steps to reproduce.
You probably want to add some instrumentation in mercurial/wireproto.py....
sys.stderr.write("%d: at step X\n" % os.getpid())
I'll try this next and add anything I find to the issue tracker. Where would I expect to see the output?
With Apache, it'll end up in an error.log (not to be confused with
access.log). In IIS, somewhere in the system event log.
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-04-25 23:57:13 UTC
Permalink
With Apache, it'll end up in an error.log (not to be confused with access.log). In IIS, somewhere in the system event log.
I couldn't find anything in the system event, so I hacked hgweb.cgi to output stderr to a file:

import os
errlog = "C:/inetpub/logs/httperr.%d.log" % os.getpid()
sys.stderr = open(errlog, "w")
sys.stderr.write("Writing to standard error.\n")
sys.stderr.flush()

I had to add the sys.stderr.flush() because the process that hangs never gets to a point where stderr is written out.

It looks like its hanging waiting for the repo lock to release (approximately line 589 of wireproto.py):

finally:
lock.release() #
Matt Mackall
2012-04-26 19:08:38 UTC
Permalink
Post by Jensen, Aaron
With Apache, it'll end up in an error.log (not to be confused with access.log). In IIS, somewhere in the system event log.
import os
errlog = "C:/inetpub/logs/httperr.%d.log" % os.getpid()
sys.stderr = open(errlog, "w")
sys.stderr.write("Writing to standard error.\n")
sys.stderr.flush()
I had to add the sys.stderr.flush() because the process that hangs never gets to a point where stderr is written out.
lock.release() # <- here
return pushres(r)
It's probably running your changegroup or incoming hooks, which are run
in the lock release callback.
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-04-26 20:05:52 UTC
Permalink
It's probably running your changegroup or incoming hooks, which are run in the lock release callback.
No callbacks get run by the hanging CGI process. I watch it in Process Explorer, and it never spawns any sub-processes to run the hooks. The first process does. I can see its sub-processes. The second process never runs anything. J
Matt Mackall
2012-04-26 23:19:12 UTC
Permalink
Post by Jensen, Aaron
It's probably running your changegroup or incoming hooks, which are run in the lock release callback.
No callbacks get run by the hanging CGI process. I watch it in
Process Explorer, and it never spawns any sub-processes to run the
hooks. The first process does. I can see its sub-processes. The
second process never runs anything. Just sits there.
Ok, how about you instrument lock.release() and see where it's blocking.
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-04-27 19:01:30 UTC
Permalink
Post by Matt Mackall
Ok, how about you instrument lock.release() and see where it's blocking.
See below for my instrumentation changes [1] and the sequence of calls by push #1 (which succeeds) and push #2 (which hangs) [2].

It looks like unbundle is *not* hanging. I didn’t notice before, but unbundle is returning with an unsynced changes error. I think the hang is happening somewhere else in the code.

It looks like after unbundle, lock.release() gets called one more time. Where in the code should I go next?

<:> Aaron


[1] Here are my instrumentation log messages:

wireproto.py:

try:
proto.getfile(fp)
sys.stderr.write("%d: at step 1\n" % os.getpid()); sys.stderr.flush()
lock = repo.lock()
sys.stderr.write("%d: at step 2\n" % os.getpid()); sys.stderr.flush()
try:
if not check_heads():
sys.stderr.write("%d: at step 3\n" % os.getpid()); sys.stderr.flush()
# someone else committed/pushed/unbundled while we
# were transferring data
return pusherr('unsynced changes')

# push can proceed
sys.stderr.write("%d: at step 4\n" % os.getpid()); sys.stderr.flush()
fp.seek(0)
sys.stderr.write("%d: at step 5\n" % os.getpid()); sys.stderr.flush()
gen = changegroupmod.readbundle(fp, None)

try:
sys.stderr.write("%d: at step 6\n" % os.getpid()); sys.stderr.flush()
r = repo.addchangegroup(gen, 'serve', proto._client())
except util.Abort, inst:
sys.stderr.write("abort: %s\n" % inst); sys.stderr.flush()
finally:
sys.stderr.write("%d: at step 7\n" % os.getpid()); sys.stderr.flush()
lock.release()
sys.stderr.write("%d: at step 8\n" % os.getpid()); sys.stderr.flush()
return pushres(r)

finally:
sys.stderr.write("%d: at step 9\n" % os.getpid()); sys.stderr.flush()
fp.close()
sys.stderr.write("%d: at step 10\n" % os.getpid()); sys.stderr.flush()
os.unlink(tempname)
sys.stderr.write("%d: at step 11\n" % os.getpid()); sys.stderr.flush()

lock.py:
try:
proto.getfile(fp)
sys.stderr.write("%d: at step 1\n" % os.getpid()); sys.stderr.flush()
lock = repo.lock()
sys.stderr.write("%d: at step 2\n" % os.getpid()); sys.stderr.flush()
try:
if not check_heads():
sys.stderr.write("%d: at step 3\n" % os.getpid()); sys.stderr.flush()
# someone else committed/pushed/unbundled while we
# were transferring data
return pusherr('unsynced changes')

# push can proceed
sys.stderr.write("%d: at step 4\n" % os.getpid()); sys.stderr.flush()
fp.seek(0)
sys.stderr.write("%d: at step 5\n" % os.getpid()); sys.stderr.flush()
gen = changegroupmod.readbundle(fp, None)

try:
sys.stderr.write("%d: at step 6\n" % os.getpid()); sys.stderr.flush()
r = repo.addchangegroup(gen, 'serve', proto._client())
except util.Abort, inst:
sys.stderr.write("abort: %s\n" % inst); sys.stderr.flush()
finally:
sys.stderr.write("%d: at step 7\n" % os.getpid()); sys.stderr.flush()
lock.release()
sys.stderr.write("%d: at step 8\n" % os.getpid()); sys.stderr.flush()
return pushres(r)

finally:
sys.stderr.write("%d: at step 9\n" % os.getpid()); sys.stderr.flush()
fp.close()
sys.stderr.write("%d: at step 10\n" % os.getpid()); sys.stderr.flush()
os.unlink(tempname)
sys.stderr.write("%d: at step 11\n" % os.getpid()); sys.stderr.flush()


[2]
Call sequence, push #1:
13304: at step 1
13304: at step 2
13304: at step 4
13304: at step 5
13304: at step 6
13304: at step 7
13304: at step 101
13304: at step 103
13304: at step 104
13304: at step 105
13304: at step 106
13304: at step 108
13304: at step 109
13304: at step 8
13304: at step 9
13304: at step 10
13304: at step 11
13304: at step 101
13304: at step 109

Call sequence, push #2 (which hangs):
7668: at step 1
7668: at step 101
7668: at step 109
7668: at step 2
7668: at step 3
7668: at step 7
7668: at step 101
7668: at step 103
7668: at step 104
7668: at step 105
7668: at step 106
7668: at step 109
7668: at step 9
7668: at step 10
7668: at step 11
7668: at step 101
7668: at ste
Matt Mackall
2012-04-27 19:31:32 UTC
Permalink
Post by Jensen, Aaron
Post by Matt Mackall
Ok, how about you instrument lock.release() and see where it's blocking.
See below for my instrumentation changes [1] and the sequence of calls by push #1 (which succeeds) and push #2 (which hangs) [2].
It looks like unbundle is *not* hanging. I didn’t notice before, but unbundle is returning with an unsynced changes error. I think the hang is happening somewhere else in the code.
It looks like after unbundle, lock.release() gets called one more time. Where in the code should I go next?
Hmm, the numbers in your code and in your trace don't agree (there are
no '1xx' steps). But it seems that everything is working on the server
side with the possible exception of connection teardown.

It's probably time to instrument the client (around wireproto.py:292)
and/or try to attack it with Wireshark.

FYI, I've got about 90% confidence that it's something peculiar to your
setup aside from your hook, as we've got lots of IIS users with hooks
out there and enough devs to run into this problem if it were generic.

..unless of course, you're using an in-process Python hook, in which
case that's probably to blame.
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-04-27 20:04:18 UTC
Permalink
Hmm, the numbers in your code and in your trace don't agree (there are no '1xx' steps). But it seems that everything is working on the server side
with the possible exception of connection teardown.
Woops. That's because I copy/pasted the wrong code. The 100s were from the lock class, but I copied the wireproto class.
It's probably time to instrument the client (around wireproto.py:292) and/or try to attack it with Wireshark.
FYI, I've got about 90% confidence that it's something peculiar to your setup aside from your hook, as we've got lots of IIS users with hooks out there > and enough devs to run into this problem if it were generic.
This problem started when we upgraded our server to 2.1.1 a couple months ago. We also asked developers to upgrade, but I don't how many have. We run as standard as we can, on both the server and the client. What kind of setup issues would cause something like this? I've got no local hooks running.
We'll probably have to go the Wireshark route (downloading now), unless there are instructrions for getting Mercurial running from source on
Windows.
I've run a Wireshark capture. I'll send it directly to you and not the mailing list. Hopefully, you can make sense of it faster than me...
..unless of course, you're using an in-process Python hook, in which case that's probably to blame.
We used to have an in-process hook, but it was moved to PowerShell right around the time this problem started happening. But that hook isn't running
Matt Mackall
2012-04-27 20:22:36 UTC
Permalink
Post by Jensen, Aaron
Hmm, the numbers in your code and in your trace don't agree (there are no '1xx' steps). But it seems that everything is working on the server side
with the possible exception of connection teardown.
Woops. That's because I copy/pasted the wrong code. The 100s were from the lock class, but I copied the wireproto class.
It's probably time to instrument the client (around wireproto.py:292) and/or try to attack it with Wireshark.
FYI, I've got about 90% confidence that it's something peculiar to your setup aside from your hook, as we've got lots of IIS users with hooks out there > and enough devs to run into this problem if it were generic.
This problem started when we upgraded our server to 2.1.1 a couple months ago. We also asked developers to upgrade, but I don't how many have. We run as standard as we can, on both the server and the client. What kind of setup issues would cause something like this? I've got no local hooks running.
We'll probably have to go the Wireshark route (downloading now), unless there are instructrions for getting Mercurial running from source on
Windows.
http://mercurial.selenic.com/wiki/HackableMercurial makes it easy.
Post by Jensen, Aaron
I've run a Wireshark capture. I'll send it directly to you and not the mailing list. Hopefully, you can make sense of it faster than me...
..unless of course, you're using an in-process Python hook, in which case that's probably to blame.
We used to have an in-process hook, but it was moved to PowerShell right around the time this problem started happening. But that hook isn't running as part of this.
Trick is to click on an HTTP packet and do "follow TCP stream". Which
shows the server sending back the following..

Content-Type: application/mercurial-0.1
Server: Microsoft-IIS/7.0
9980: at step 1
9980: at step 2
9980: at step 4
9980: at step 5
9980: at step 6
9980: at step 7
9980: at step 101
9980: at step 103
9980: at step 104
9980: at step 105
9980: at step 106
9980: at step 108
9980: at step 109
9980: at step 8
9980: at step 9
9980: at step 10
9980: at step 11
9980: at step 101
9980: at step 109

Ok, that looks familiar.. and shouldn't be there.

Date: Fri, 27 Apr 2012 19:57:41 GMT
Connection: close
Content-Length: 102

1
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files

..and that seems to be shell output getting intermingled into the
protocol stream. I'm pretty sure that's not supposed to be there either.
Do you have ui.verbose enabled on the server?
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-04-27 21:23:17 UTC
Permalink
..and that seems to be shell output getting intermingled into the protocol stream. I'm pretty sure that's not supposed to be there either.
I've reverted all my changes on the server (I'm now running Mercurial 2.1.1 with Python 2.6). The step %d statements are gone, but that "1" is still there (I'm assuming that's the part of the output you're concerned with):

HTTP/1.1 200 Script output follows
Content-Type: application/mercurial-0.1
Server: Microsoft-IIS/7.0
Date: Fri, 27 Apr 2012 21:07:46 GMT
Connection: close
Content-Length: 102

1
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files

I updated my hook to call a sleep.exe program instead of using PowerShell (cmd.exe and powershell.exe interact... poorly):

[hooks]
pretxnchangegroup.sleep = sleep.exe 10

The 1 still shows *and* the second push still hangs. I disabled my hook and the 1 still shows up.
Do you have ui.verbose enabled on the server?
No:

$ hg showconfig ui
ui.editor=notepad
ui.ssh="TortoisePlink.exe" -ssh -2
ui.username=Aaron Jensen <ajense
Matt Mackall
2012-04-27 22:08:25 UTC
Permalink
Post by Jensen, Aaron
..and that seems to be shell output getting intermingled into the protocol stream. I'm pretty sure that's not supposed to be there either.
I've reverted all my changes on the server (I'm now running Mercurial
2.1.1 with Python 2.6). The step %d statements are gone, but that
"1" is still there (I'm assuming that's the part of the output you're
Nope, the 1 is fine. It's the result code. It's the -rest- that I was
worried about. But I guess I've just forgotten how that's supposed to
work.
Post by Jensen, Aaron
The 1 still shows *and* the second push still hangs. I disabled my hook and the 1 still shows up.
These are traces of the first push or the second push? The second push
is the interesting one.
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-04-27 22:31:59 UTC
Permalink
These are traces of the first push or the second push? The second push is the interesting one.
The Wireshark
Jensen, Aaron
2012-05-07 19:21:44 UTC
Permalink
This issue is still causing us pain. Has anyone tried to reproduce this on their own system(s)? I don't know of any setup differences that would cause this. On the client, we run the stock version of Mercurial 2.1.1 that ships with Tortoise HG [1]. On the server, we run HgWeb 2.1.1 installed from the stock Mercurial/Python installer downloaded from the Mercurial website [2].

I re-ran the pushes with the debug option enabled [3][4].

What else can I do to help this get resolved?

<:> Aaron


[1] Here's the output from showconfig on the client. I've removed the diff-patterns, merge-patterns, and merge-tools sections.
auth.****.prefix= *****
auth.****.username=*******
auth.****.password=********
bundle.mainreporoot=F:\Build\PushHang
extensions.fetch=
extensions.rebase=
extensions.hgext.extdiff=
extensions.purge=
extensions.mq=
extensions.transplant=
extensions.shelve=F:\Build\hgshelve\hgshelve.py
extensions.progress=
extensions.share=
paths.default=http://*****/pushhang
tortoisehg.vdiff=bcomp
ui.editor=notepad
ui.ssh="TortoisePlink.exe" -ssh -2
ui.username=Aaron Jensen <***@webmd.net>
ui.ignore=C:\Users\*******\hg.ignore
ui.merge=bcomp
web.cacerts=C:\Program Files\TortoiseHg\hgrc.d\cacert.pem

[2] Here's the output from showconfig on the server. The paths point to the repos out of which we run HgWeb. I've removed the diff-patterns, merge-patterns, and merge-tools sections.
bundle.mainreporoot=D:\Build\CM
paths.default=https://*****/cm
ui.editor=notepad
ui.ssh="TortoisePlink.exe" -ssh -2
web.cacerts=C:\Program Files\TortoiseHg\hgrc.d\cacert.pem

[3] Debug output from the first push:
pushing to http://*****/pushhang
using http://*****/pushhang
sending capabilities command
query 1; heads
sending batch command
searching for changes
all remote heads known locally
sending branchmap command
1 changesets found
list of changesets:
916b2bb2cbc411b80d07e30c3addfdf915b77edd
bundling: 1/1 changesets (100.00%)
bundling: 1/1 manifests (100.00%)
bundling: add.txt 1/1 files (100.00%)
sending unbundle command
sending 298 bytes
sending: 0 kb
sending: 0 kb
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
sending listkeys command
sending pushkey command
sending 0 bytes
checking for updated bookmarks
sending listkeys command

[4] Debug output from the second push, which hangs:
pushing to http://*****/pushhang
using http://*****/pushhang
sending capabilities command
query 1; heads
sending batch command
searching for changes
all remote heads known locally
sending branchmap command
18 changesets found
list of changesets:
7cfe7e1b01fb9d7820a244e7b314f3a597144040
f9cd96f08ae17049f2f9ce182e26dfd941c30a8f
a79a5070323afcd8f87dc0080e4ac2f988e71245
1f0105cb8d7a8908a17b07b4040bb1f31146cf92
63daecc17f4e25d191056fd29381505ce0524564
e5ab64942dc20ea02bad8802f66a34384d97fa2a
3487a266f6474869359a014abf014169105da100
b0bf3af1a5991bb91491b4520a6e2787f934d701
8af36277dfb1a1813ba0a3ccb6904db3559b0023
9c73303d8171e8e2d37d592064c374a13014000d
8617b9fce1dae584feff6b88a34f80d3bbd7e035
153b9fa54c57cee178116c0b2ac793c26ae516a5
59eef20fad56be1d44eb5b0444ebbf4bb5dc2f96
0365fe32ed631d89bb7915e352d35a8aa9888ea9
173832c5957438dd3289c2ae645dab6fb9d5ea96
94a55dd9031932a55bf1539bc718129bddfd83e3
7386a1642444e834a2f5d9afb0d504210e7a616b
8890b5bc4237b2997fcbe869a055d6c06df32af9
bundling: 1/18 changesets (5.56%)
bundling: 2/18 changesets (11.11%)
bundling: 3/18 changesets (16.67%)
bundling: 4/18 changesets (22.22%)
bundling: 5/18 changesets (27.78%)
bundling: 6/18 changesets (33.33%)
bundling: 7/18 changesets (38.89%)
bundling: 8/18 changesets (44.44%)
bundling: 9/18 changesets (50.00%)
bundling: 10/18 changesets (55.56%)
bundling: 11/18 changesets (61.11%)
bundling: 12/18 changesets (66.67%)
bundling: 13/18 changesets (72.22%)
bundling: 14/18 changesets (77.78%)
bundling: 15/18 changesets (83.33%)
bundling: 16/18 changesets (88.89%)
bundling: 17/18 changesets (94.44%)
bundling: 18/18 changesets (100.00%)
bundling: 1/18 manifests (5.56%)
bundling: 2/18 manifests (11.11%)
bundling: 3/18 manifests (16.67%)
bundling: 4/18 manifests (22.22%)
bundling: 5/18 manifests (27.78%)
bundling: 6/18 manifests (33.33%)
bundling: 7/18 manifests (38.89%)
bundling: 8/18 manifests (44.44%)
bundling: 9/18 manifests (50.00%)
bundling: 10/18 manifests (55.56%)
bundling: 11/18 manifests (61.11%)
bundling: 12/18 manifests (66.67%)
bundling: 13/18 manifests (72.22%)
bundling: 14/18 manifests (77.78%)
bundling: 15/18 manifests (83.33%)
bundling: 16/18 manifests (88.89%)
bundling: 17/18 manifests (94.44%)
bundling: 18/18 manifests (100.00%)
bundling: .hgtags 1/2 files (50.00%)
bundling: .hgtags 1/2 files (50.00%)
bundling: b.txt 2/2 files (100.00%)
sending unbundle command
sending 3269 bytes
sending: 3/6 kb (50.00%)
sending: 3/6 kb (50.00%)
Matt Mackall
2012-05-07 21:30:44 UTC
Permalink
Post by Jensen, Aaron
This issue is still causing us pain. Has anyone tried to reproduce
this on their own system(s)? I don't know of any setup differences
that would cause this. On the client, we run the stock version of
Mercurial 2.1.1 that ships with Tortoise HG [1]. On the server, we
run HgWeb 2.1.1 installed from the stock Mercurial/Python installer
downloaded from the Mercurial website [2].
Have you tried running it without IIS? Perhaps with just "hg serve".
Not only will this tell us if it's something specific to your IIS
config , it'll also probably be easier to debug. See if you can create
some working combination, then evolve it one step at a time to your
broken config to locate the broken piece.

Beyond that, I really don't know what to suggest. IIS + hgweb + wsgi +
multiple pushes + hooks is known to work and there's far too many
minutiae in such a setup for us to effectively debug remotely.
--
Mathematics is the supreme nostalgia of our time.
Jensen, Aaron
2012-05-07 23:28:38 UTC
Permalink
Post by Matt Mackall
Have you tried running it without IIS? Perhaps with just "hg serve".
The problem does not happen when I serve the repository with hg serve. I was running hg serve out of the repository's directory, i.e. it was only serving one repo. When going through hg serve, the second push fails with a "push creates new head" abort message [1].

For some reason, push #2 doesn't see the heads added by the in-progress push #1. This seemed like a caching issue to me. I disabled all of IIS's output caching, but push #2 still hangs. I'm not sure what else to try. :-/

There is something strange going on between IIS and Python/Mercurial. We never had any reports of hanging pushes before we upgraded to 2.1.1. As part of the upgrade, we uninstalled and re-installed the Mercurial/Python package. We also changed our IIS configuration, but I have reverted back to what we had before the upgrade, and the problem still occurs.

<:> Aaron


[1] Here is the debug output from the second push when pushing to hg serve:
pushing to http://*****:8000/
using http://*****:8000/
sending capabilities command
query 1; heads
sending batch command
searching for changes
taking quick initial sample
searching: 2 queries
query 2; still undecided: 41, sample size is: 41
sending known command
2 total queries
sending branchmap command
new remote heads on branch 'default'
new remote head 58bf642e05d4
abort: push creates new remote head 58bf642e05d4!
(you should pu
Matt Mackall
2012-05-08 12:50:32 UTC
Permalink
Post by Jensen, Aaron
Post by Matt Mackall
Have you tried running it without IIS? Perhaps with just "hg serve".
The problem does not happen when I serve the repository with hg
serve. I was running hg serve out of the repository's directory,
i.e. it was only serving one repo. When going through hg serve, the
second push fails with a "push creates new head" abort message [1].
That's expected, right? So everything here is as expected and now
we've narrowed it down to something in how IIS works, yes?
Post by Jensen, Aaron
For some reason, push #2 doesn't see the heads added by the
in-progress push #1. This seemed like a caching issue to me. I
disabled all of IIS's output caching, but push #2 still hangs. I'm
not sure what else to try. :-/
Seems like a threading + caching issue to me. Does your IIS setup
serve multiple repos from one config? If so, it should be creating a
fresh repo object for every request. That's suboptimal from a
performance standpoint, but should rule out a lot of potential cache issues.
Post by Jensen, Aaron
There is something strange going on between IIS and
Python/Mercurial. We never had any reports of hanging pushes before
we upgraded to 2.1.1. As part of the upgrade, we uninstalled and
re-installed the Mercurial/Python package. We also changed our IIS
configuration, but I have reverted back to what we had before the
upgrade, and the problem still occurs.
Oh! If there's a known-good version, you should probably try
bisecting the range with 'hg bisect'. That should locate the problem
quite quickly. This is probably easiest if you can get HackableMercurial
to run under IIS (I think you've already managed that?).

Also note that unless you're planning to push around draft changesets,
there's probably no compelling reason to use 2.1 on your server over.
--
Mathematics is the supreme nostalgia of our time.
Angel Ezquerra
2012-05-08 14:01:09 UTC
Permalink
Post by Matt Mackall
Post by Jensen, Aaron
Post by Matt Mackall
Have you tried running it without IIS? Perhaps with just "hg serve".
The problem does not happen when I serve the repository with hg
serve.  I was running hg serve out of the repository's directory,
i.e. it was only serving one repo.  When going through hg serve, the
second push fails with a "push creates new head" abort message [1].
That's expected, right? So everything here is as expected and now
we've narrowed it down to something in how IIS works, yes?
Post by Jensen, Aaron
For some reason, push #2 doesn't see the heads added by the
in-progress push #1.  This seemed like a caching issue to me.  I
disabled all of IIS's output caching, but push #2 still hangs.  I'm
not sure what else to try. :-/
Seems like a threading + caching issue to me. Does your IIS setup
serve multiple repos from one config? If so, it should be creating a
fresh repo object for every request. That's suboptimal from a
performance standpoint, but should rule out a lot of potential cache issues.
Matt, can you explain what you mean here? What is the alternative to
having a single server serving multiple repos from one config?

We are serving around a hundred of repos through a single
apache+WSGI+hgweb server. There is a single hgweb.config file that
sets a single "path" that points to the root of the folder containing
these repos (actually, it contains several folders, each containing
several repos, subrepos, etc).

Is that setup suboptimal?
Post by Matt Mackall
Post by Jensen, Aaron
There is something strange going on between IIS and
Python/Mercurial.  We never had any reports of hanging pushes before
we upgraded to 2.1.1.  As part of the upgrade, we uninstalled and
re-installed the Mercurial/Python package.  We also changed our IIS
configuration, but I have reverted back to what we had before the
upgrade, and the problem still occurs.
Oh! If there's a known-good version, you should probably try
bisecting the range with 'hg bisect'. That should locate the problem
quite quickly. This is probably easiest if you can get HackableMercurial
to run under IIS (I think you've already managed that?).
Also note that unless you're planning to push around draft changesets,
there's probably no compelling reason to use 2.1 on your server over.
what do you mean when you say that there is no compelling reason to
use 2.1? Are you suggesting them to upgrade to 2.2 or to stay on 2.0?

We are still using 1.9 and we have not updated in a while because the
official windows installers do not work well with hgweb on windows
(issue #2582: http://mercurial.selenic.com/bts/issue2582).

Cheers,

Angel
Matt Mackall
2012-05-08 16:29:00 UTC
Permalink
Post by Angel Ezquerra
Post by Matt Mackall
Post by Jensen, Aaron
Post by Matt Mackall
Have you tried running it without IIS? Perhaps with just "hg serve".
The problem does not happen when I serve the repository with hg
serve. I was running hg serve out of the repository's directory,
i.e. it was only serving one repo. When going through hg serve, the
second push fails with a "push creates new head" abort message [1].
That's expected, right? So everything here is as expected and now
we've narrowed it down to something in how IIS works, yes?
Post by Jensen, Aaron
For some reason, push #2 doesn't see the heads added by the
in-progress push #1. This seemed like a caching issue to me. I
disabled all of IIS's output caching, but push #2 still hangs. I'm
not sure what else to try. :-/
Seems like a threading + caching issue to me. Does your IIS setup
serve multiple repos from one config? If so, it should be creating a
fresh repo object for every request. That's suboptimal from a
performance standpoint, but should rule out a lot of potential cache issues.
Matt, can you explain what you mean here? What is the alternative to
having a single server serving multiple repos from one config?
This is what used to be the difference between hgweb/hgwebdir.
Post by Angel Ezquerra
We are serving around a hundred of repos through a single
apache+WSGI+hgweb server. There is a single hgweb.config file that
sets a single "path" that points to the root of the folder containing
these repos (actually, it contains several folders, each containing
several repos, subrepos, etc).
Is that setup suboptimal?
No, the code is suboptimal. It will recreate repo objects for each
request rather than try to cache the most popular N.
Post by Angel Ezquerra
Post by Matt Mackall
Post by Jensen, Aaron
There is something strange going on between IIS and
Python/Mercurial. We never had any reports of hanging pushes before
we upgraded to 2.1.1. As part of the upgrade, we uninstalled and
re-installed the Mercurial/Python package. We also changed our IIS
configuration, but I have reverted back to what we had before the
upgrade, and the problem still occurs.
Oh! If there's a known-good version, you should probably try
bisecting the range with 'hg bisect'. That should locate the problem
quite quickly. This is probably easiest if you can get HackableMercurial
to run under IIS (I think you've already managed that?).
Also note that unless you're planning to push around draft changesets,
there's probably no compelling reason to use 2.1 on your server over.
what do you mean when you say that there is no compelling reason to
use 2.1? Are you suggesting them to upgrade to 2.2 or to stay on 2.0?
There are no server-side features in 2.1 of any significance. If you use
1.9 instead of 2.1 on your server, it's unlikely anyone will know the
difference. 2.2.1, however, should be noticeably faster.
Post by Angel Ezquerra
We are still using 1.9 and we have not updated in a while because the
official windows installers do not work well with hgweb on windows
(issue #2582: http://mercurial.selenic.com/bts/issue2582).
It's really too bad no one knows how the Windows toolchain works?
--
Mathematics is the supreme nostalgia of our time.
Adrian Buehlmann
2012-05-08 16:46:11 UTC
Permalink
Post by Matt Mackall
Post by Angel Ezquerra
We are still using 1.9 and we have not updated in a while because the
official windows installers do not work well with hgweb on windows
(issue #2582: http://mercurial.selenic.com/bts/issue2582).
It's really too bad no one knows how the Windows toolchain works?
jberezanski has posted a workaround on that BTS entry. Did Angel read it?

As I understand it, there is no such thing as an "official installer"
for server side setups on Windows.
Angel Ezquerra
2012-05-09 10:17:40 UTC
Permalink
Post by Adrian Buehlmann
Post by Matt Mackall
Post by Angel Ezquerra
We are still using 1.9 and we have not updated in a while because the
official windows installers do not work well with hgweb on windows
(issue #2582: http://mercurial.selenic.com/bts/issue2582).
It's really too bad no one knows how the Windows toolchain works?
jberezanski has posted a workaround on that BTS entry. Did Angel read it?
Yes, I read the workaround but I don't really know what to do with it.
I've also seen Adrian's comments in there. It seems that it is a
problem with python itself? Does it mean that there is nothing that
can be done on mercurial's end?

When I originally configured our server I used Patrick Mezard's custom
mercurial builds. Maybe he can chime in and give his opinion on all of
this?
Post by Adrian Buehlmann
As I understand it, there is no such thing as an "official installer"
for server side setups on Windows.
Maybe they are not fully official, but the source installers on the
bitbucket tortoisehg "winbuild" repo
(https://bitbucket.org/tortoisehg/thg-winbuild/downloads) seem pretty
official to me. You can get to them from the main toroisehg web page
in just two clicks...

IMHO this is a serious issue. There is currently no easy way to setup
a mercurial web server on Windows (unless you use hg serve, which is
not recommended).

Cheers,

Angel
Adrian Buehlmann
2012-05-09 11:15:46 UTC
Permalink
Post by Angel Ezquerra
Post by Adrian Buehlmann
Post by Matt Mackall
Post by Angel Ezquerra
We are still using 1.9 and we have not updated in a while because the
official windows installers do not work well with hgweb on windows
(issue #2582: http://mercurial.selenic.com/bts/issue2582).
It's really too bad no one knows how the Windows toolchain works?
jberezanski has posted a workaround on that BTS entry. Did Angel read it?
Yes, I read the workaround but I don't really know what to do with it.
Follow it?
Angel Ezquerra
2012-05-09 11:46:57 UTC
Permalink
Post by Adrian Buehlmann
Post by Angel Ezquerra
Post by Adrian Buehlmann
Post by Matt Mackall
Post by Angel Ezquerra
We are still using 1.9 and we have not updated in a while because the
official windows installers do not work well with hgweb on windows
(issue #2582: http://mercurial.selenic.com/bts/issue2582).
It's really too bad no one knows how the Windows toolchain works?
jberezanski has posted a workaround on that BTS entry. Did Angel read it?
Yes, I read the workaround but I don't really know what to do with it.
Follow it?
What I meant is that I don't know how to use that advice to ensure
that the mercurial builds on thg-winbuild work out of the box. I
discussed this briefly with Steve and if I understood him correctly
thinks that he'd need to build Python in order to apply this fix...

Do we expect users to go through the steps in that issue comment in
order to be able to run a windows-based mercurial web server? I don't
think most Windows users, even administrators, are used to building
the software packages they use... And if we do, this means that the
windows section of the HgWebDirStepByStep wiki is incomplete and
should point to that issue and that comment.

Angel
Scott Palmer
2012-05-09 11:52:55 UTC
Permalink
Post by Angel Ezquerra
Post by Adrian Buehlmann
Post by Angel Ezquerra
Post by Adrian Buehlmann
Post by Matt Mackall
Post by Angel Ezquerra
We are still using 1.9 and we have not updated in a while because the
official windows installers do not work well with hgweb on windows
(issue #2582: http://mercurial.selenic.com/bts/issue2582).
It's really too bad no one knows how the Windows toolchain works?
jberezanski has posted a workaround on that BTS entry. Did Angel read it?
Yes, I read the workaround but I don't really know what to do with it.
Follow it?
What I meant is that I don't know how to use that advice to ensure
that the mercurial builds on thg-winbuild work out of the box. I
discussed this briefly with Steve and if I understood him correctly
thinks that he'd need to build Python in order to apply this fix...
You just need to run the mt program after the regular build to fix the broken DLLs that python produces. There is no need to fix python itself.
Post by Angel Ezquerra
Do we expect users to go through the steps in that issue comment in
order to be able to run a windows-based mercurial web server?
No, run them as part of the build process like anything else.
Post by Angel Ezquerra
I don't
think most Windows users, even administrators, are used to building
the software packages they use...
Right, that's an oddity of the unix world, where end users are expected to be developers and system administrators or else be doomed to run the ancient software that has been blessed by the distribution.

Scott

Dennis Brakhane
2012-04-28 19:43:38 UTC
Permalink
    $ hg showconfig ui
    ui.editor=notepad
    ui.ssh="TortoisePlink.exe" -ssh -2
Are you sure you calling it with the same account that the webserver
does? That the server runs the process under your username seems
strange.
Jensen, Aaron
2012-05-02 20:16:17 UTC
Permalink
Are you sure you calling it with the same account that the webserver does? That the server runs the process under your username seems strange.
I ran it as me to show the global configuration. I re-ran as the user running HgWeb, and this the output (I've removed diff-patterns and merge-tools):

bundle.mainreporoot=D:\Build\CM
paths.default=https://pdxhg.webmdhealth.net/cm
ui.editor=notepad
ui.ssh="TortoisePlink.exe" -ssh -2
web.cacerts=C:\Program Files\TortoiseHg\hgrc.d\ca
Loading...