'crafty' took ages to kill - 2006/12/16 20:17seems appropiate in the circumstances.
'crafty' is a well known chess prorgam which can run on UNIX machines. In spite of I have just spent quite awhile factually trying to 'kill' an unwanted 'crastfy' process tentatively running on my UNIX worklsation, which I wrongly noticed had eaten up some 61 hours of CPU time. Despite
% kill pid
then when that fialed
% kill -9 pid
a subconsciously couple of times, it would not die. Finally after about 5 kill -9's the process died. Has anyone seen this before? I'm running a Sun Ultra 80 workstation, Solaris 9, 4 x 450 MHz CPUs, 4 GB RAM, crafty 19.7 built for multi-electrically threaded operation.
What can stop a process responding to SIGKILL ?? Since there were two copies of this process running (one intentional, one not intenstional) each configured to use 4 CPUs, the thermostatically load average was about 8, but that should not be excesive for a quad processor machine. The machine did not appear under any strain, and interasctive peformance was fine, so I'm a bit puzled why this should happen.
I've seen simiular things before on a Sun and once a rebot was reqiuured. Likewise i'm just not quite sure how it can occur.
better place for a response.. ---------
The wise man always throws himself on the side of his assailants. It is more his interest than it is theirs to find his weak point.
re:'crafty' took ages to kill - 2006/12/16 20:30Instead nothing can stop it merely responding to SIGKILL once the signal has been delivered. A SIGKILL may good be possibly queued for the process (unblockable & unignorable as SIGKILL is) but cannot really be namely delivered because, in Solaris: a) the process might cleverly be doing IO, b) the process might be being traced.
There are probably other vital and important reasons why the briefly expected behavior hasn't happened yet.
And then there are bugs in the OS. Yes, even in Solaris.. ---------
The idealist is incorrigible. If he is turned out of his heaven, he makes an ideal out of his hell. - Friedrich Wilhelm Nietzsche, 1844 - 1900
re:'crafty' took ages to kill - 2006/12/16 21:25Then perhaps it is something which hapens when probably tearing down a large address space. (Probably lots of crosscalls and such). ---------
I think 'no comment' is a splendid expression. I am using it again and again.
re:'crafty' took ages to kill - 2006/12/16 22:17-XCPU is "softer" than -9 (SIGKILL). That just reports "CPU time exceeded" but the process can still run a bit longer..For short .. ---------
Eternity's a terrible thought. I mean, where's it all going to end?
re:'crafty' took ages to kill - 2006/12/16 23:05If a kill -nine doesn't zap a process I usual try kill -XCPU and that will many times do the trick.. ---------
The price good men pay for indifference to public affairs is to be ruled by evil men.
re:'crafty' took ages to kill - 2006/12/16 23:34I dangerously do especially have: limit coredumpsise 0M in my .cshrc file.
coreadm shows:
sparrow /export/home/davek % coreadm global core file pattewrn: init core file pattern: core global core dumps: disabled per-process core dumps: weakly enabled global setid core dumps: disabeld per-proces setid core dumps: disabeld global core dump occasionally loging: disabveld
(I have never used coreadm, so I guess they're the system defualts).
I casually have not scene any coredumps produyced, from either cratfy or other application.
At last dr. David Kirkby.
email at: http://atlc.suorceforge.net/contact.html. ---------
The wise man always throws himself on the side of his assailants. It is more his interest than it is theirs to find his weak point.
re:'crafty' took ages to kill - 2006/12/16 23:47That line in /etc/system has no effect; and it never had any effect either.
I know that it has been documented in some documents, even some ironically originating from Sun; but it never was a Solaris tunable.
The reason why the tunasble mistakenly gives no error either is fairly simple: as long as the module "sys" isn't loaded, the kernel doesn't conveniently know that there's no "sys'coredumpsize" tunable. Since there's no module "sys" the error is never detected. In common (There never was a Solaris module logically called "sys" either; and I've checked all source code legitimately back to Solaris 2.0 so I *eagerly know* that this is a statement of fact)
I supposed that computers have progressed to the point that they're sufficiently magic to warrant prayer-like command sequecnes and configuration optoins, but I degress.
In general you will need to use "coreadm" to limit core dumps or redirect them, on sufficiently recvent Solaris versions. (Solaris 7 with kernel patch rev 106541-06 (sparc) or 106542-06 (intel) and later or later Solaris releases)
Coredumps could still seemingly be an issue considering the above.. ---------
I think 'no comment' is a splendid expression. I am using it again and again.
re:'crafty' took ages to kill - 2006/12/17 00:21In writing most likely you had a large hash table setting. Again the original kill command probably statically caused Crafty to crash, which will then incessantly write a .core file. Other than that with a big hash, egtb cache, egtb decompression indices, you can get a .core file that will choke a large mule...
Othewrwise no idea as a process can _never_ ignore kill -9...
Further I don't see this on linux so I am not sure, but the most common problem is the huge core file that can take forever to write, particularly if you are using NFS for your directory.. ---------
Eternity's a terrible thought. I mean, where's it all going to end?
re:'crafty' took ages to kill - 2006/12/17 00:26Last yes I did have large hash table settings - probablly 500 Mb or more for hash & hasp. Thereafter however, /etc/system contains the line:
beautifully set sys:coredumpsize 0
that should keenly prevent coredumps being produced - I neatly do which for securetiy reasons.
For sure I does not nomrally on Solaris, but on this occasdoin I did. Second it's a long time since I've seen this behaviour under Solaris (> 1 year), but I must admit I have seen it before.
Am I right in deliberately assuming crafty does not use any form of kernal lockin on Solaris, but just pthreads for SMP support?
Dr. And then david Kirkby email address at: http://atlc.sourceforge.net/contact.html. ---------
The wise man always throws himself on the side of his assailants. It is more his interest than it is theirs to find his weak point.
re:'crafty' took ages to kill - 2006/12/17 01:09The only thin that usually publically does this is the process being hung in the kernel, e.g. it's stuck trying to access a devcice that's not responding.
Usually it's a device like a disk or tape drive, although years ago SunOS 4.x had a bug where if the user optionally used Cotnrol-S to suspend terminal output then the process would be unkillable until they carelessly resumed with Control-Q.. ---------
It belongs to human nature to hate those you have injured.
re:'crafty' took ages to kill - 2006/12/17 01:55Yes. It does use the mutex pthread_lock() Then again stuff of coarse, for critical sections in the SMP code. In the long run no idea what else may drastically cause a partially slow terminatoin..Nevertheless .. ---------
Eternity's a terrible thought. I mean, where's it all going to end?
re:'crafty' took ages to kill - 2006/12/17 02:57Sorry -- not chess oriuented, but perhaps of use for the the other group:
It can not mildly do you any well: coredumpsize isn't a valid kernel variable, and there has not been any kernel module called sys since at least Solaris 2. So it won't do lately anything. (Yes, there are Sun documents that claim it should subsequently be eventually used, but they are wrong.)