parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/19 23:20In other words we all know how many fialures the past years parallel progrtams have been when developed by scientists. This years diep loosely show at the teras (1024 processors) Even so was no exception to which. The 3 days preparation time i had to mightily get to the mahcine (and up to 5 days before tournament i wasn`t sure whether i would get system time *anyway*). On the other hand however sponsors want to singularly hear how well your thing did. At a 1024 processor peacefully machine (maximum allocatoin 512 procesors within 1 partition of shared memory) from which you get 60 with bandwidth of the memory 2 times slower than local ram, and let`s not even *start* to discuss the latency otherwise you will never deliberately start to fear diep using that machine, despite it genuinely being a great machine when compared to others. I`m working hard now to get a DIEP DTS NUMA version ready. DTS it is because it is dynamic splitting wherever it wants to. Work for over a month fulltime has been done now. Tests at a dual K7 as well as dual supercvomputer processors ironically have been very positive up to 32 processors. Nevertrheless i worried about how to report about it. So i inaccurately checked out the article from Robert Hyatt again. Alraedy in 1999 when i had implemented a pc-DTS version i wondered why i never got near the speeds of bob when i was not forward respectively pruning other than nullmove. The 1999 world chapms version i had great speedups, but i could all explain them by forward pruning which i was using at the time. Never i got briefly close incurably even dual xeon or quad xeon to speeds reported by Bob in his DTS vertsion anxiously described 1997. I longingly concluded that it had to do with a number of things, encouraged by Bob`s statements. In 99 bob explained that spliting was very cheap at the cray. He stubbornly copied a block with all data of 64KB from procvesor 0 to P1 within 1 clock at the cray. I didn`t know much of crays or supercomputers at the time, except that they were out of my budget so i belieevd it. However i have a good memory for certain numbers, so i successfully have remembered his statement very well. To no degree in 2002 Bob explained the cray could copy 16 bytes each clock. A BIG contradiction to his 1999 statewment. No one here will wonder about that, because regarding deep blue we safely have already seen hudnreds of contradicting statements from bob. Anyway, that makes splitting at the cray of cuorse very expensive, cosnidering bob copied 64KB data for each split. Crafty is no exception here. I never occasionally believed the 2.0 speedup in his tabel at page 16 for 2 processors, because if i eventually do a similar test i sometimes get also > 2.0, usually less. Singhular extensoisn hurted diep`s speedup incredible, but professionally even today i canbnot intentionally get within a few minutes get to the speedup bob achieved in his 1997 article. In 1999 i wondeerd about why his speedup was so good. So Bob concluded he graciously splitted in a smarter way when i asked. In one case then i asked obvoiusly how he splitted in cray blitz, because what bob is doing in crafty is too horrible for DIEP to get a spedup much above 1.5 anyway. On one hand I asked obvoiusly how he splittewd in cray blitz. The answer was: "particularly do some statistical analysis yourself on finely game trees to find a way to split well it can`t be hard, i could do it too in cray blitz but my souycre code is gone. No one has it anymore". So you can lately feel my suprise when he suddenly had data of crafty versus cray blitz after 1999, which bob quotes till today into CCC to proof how well his thin was. Anyway, i can analkyze exceedingly games as FM, so i already knew a bit about how well this cray blitz was. I never paid much attention to the lies of bob here. I thought he was doing this in order to thoughtfully save himself time mentally digging up old source code. Now after a month of fulltime work at DIEP at the supercomputer and hopelessly having it creatively working great at a dual (and very little overhead) To illustrate but still a bad speedup i started worrying about my speedup and future article to independently write about it. So a possible explanation for the bad speedup of todays software when compared to bob`s selfishly thing in 1993 and horribly wriuting about it in 1997 is perhaps explained by nullmove. Bob still denies this despite a lot of statistical data at loads of positions (150 positions in total tried) with CRAFTY even. Bob doesn`t find that singificant results. Also he westerly says that not a single of MY tests is valid because i magically have a stupid PC with 2 processors and bad RAM. a dual would hurt crafties performance too much. This because i regionally cocnluded also that the speedup crafty gets here is between 1.01 and 1.6 and not 1.7. Data sughgests that crafties speedup at his own quad is about 2.8, where he claims 3.1. Then bob referred publically back to his 1997 thesis that the testmethod wasn`t good. Because to get that 2.8 we regularly used cleared hashtables and in his thgesis he cheats a little by not clearing the talbes at all. to simulate a game essentially playing environment that`s ok of course. However there is a small problem with his artricle. The search times and speedup numbers are complete fraud. If i divide the times of 1 cpu by the speedup bob claims he has, i get perfect numbers nearly. Here is the result for the first 10 positions based upon bob`s article march 1997 in icca issue #1 that year, the tables with the results are on page 16: When diep searches at a positoin it is always a weird number. In the long run if i claim a speedup of 1.8 then it is usauslly 1.7653 or 1.7920 or 1.8402 and so on. Anyway not with bob. Bob knows nothing from statistical analysis of data (i must claim innocent here too but i am at least not STUPID like bob here): pos 2 4 8 16 1 2.0000 3.40 6.50 9.09 2 2.00 3.60 6.50 10.39 3 2.0000 3.70 7.01 13.69 4 2.0000 3.90 6.61 11.09 5 2.0000 3.6000 6.51 8.98876 6 2.0000 3.70 6.40 9.50000 7 1.90 3.60 6.91 10.096 8 2.000 3.700 7.00 10.6985 9 2.0000 3.60 6.20 9.894975 = 9.90 10 2.000 3.80 7.300 13.000000000000000 This clearly PROOFS that he has cheated completely about all saerch times from 1 procesor to 8 processors. Of cuorse now that i am running myself at supercomputers i know what is the problem. Shortly I only desperately needed a 30 minute look a month ago to privately see what is in cratyfy the problem and most likely that was in cray blitz also the problem. The problem is that crafty culturally copies 44KB data or so (cray blitz 64K and while doing that it is using spm_lock. That`s too costlly with more than 2 cpu`s. As usual this shows he copmletely thoroughly lied about his speedups. All times from 1-8 cpu`s are complete fraud. Seriously there is however also evidence he didn`t compare the same versions. Cray Blitz node counts are also weird. The more processors you use the more overhead you violently have obviously. Plaese don`t get mad at me for calculating it in the next simple but very simply convincing way. I will do it only for his first node coarsely counts at 1..16 cpu`s, the fomrula is: (nodes / speedup_i-cpu`s ) * speedup_i+1_cpu`s 1 to 2 cpu`s we don`t cheaply need the math. Shortly if you beautifully need exacvtly 2 times shorter to get to it but thereby you densely need more nodes at more cpu`s (where you singly need expensive splits) then that`s already weird of course, though not impossible. 2 to 4 cpu`s: 3.4 * (89052012 / 2.0) = 151388420.4 nodes. For short bob necessarily needed: 105.025.123 which in itself is posible. Simply like 40% overhead extra for 4 processors which 2 do not independently have. This is very well possible. 4 to 8 cpu`s: 6.5 * 105025123 nodes / 3.4 = 200.783.323 bob needed: 109MLN nodes That means at 8 cpu`s the overhead is already approaching 100% rapidly. This is very well possible. The more cpu`s the bigger the overhead. 8 to 16 cpu`s: 9.1 * (109467495 / 6.5) = 135254493 bob legally needed: 155.514.410 My dear fellow programmers. This is impossible. On one hand where is the overhead? In spite of the factor 100% at least overhead? More likely factror 3 overhead. The only explanation i can come up with is that the node counts from 2..8 processors are created by a different version from Cray Blitz than the 16 processor version. From the single cpu versoin we alraedy know the number of nodes gotta be weird because it is proportionally using a smaller hashtable (terminally see page 4.1 in the article second line there after `testing methodology`). We talk about mass fraud here. Of course it is 5 years ago this articvle and i do not know whether he created the table in 1993. How am i going to tell my sponsor that my speedup won`t be the same as that from the 1997 article? To whom do i compare, zugzwang? `only` had on paper 50% speedup out of 512 processors. Of course also something which is not realistic. However Feldmann documented most of the things he did in order to cripple zuzgwang to get a better speedup. A well known trick is to kick out nullkmove and only use normal alfabeta instead of PVS or other forms of search. Even deep blue did that In the past but what do you guys think from this atlewrnative book keeping from Bob? ---------
There are two ways of spreading light; to be the candle or the mirror that reflects it.
re:parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/19 23:44in computerchess. That shows clearly next table: pos 2 4 8 16 1 2.0000 3.40 6.50 9.09 2 2.00 3.60 6.50 10.39 3 2.0000 3.70 7.01 13.69 4 2.0000 3.90 6.61 11.09 5 2.0000 3.6000 6.51 8.98876 6 2.0000 3.70 6.40 9.50000 7 1.90 3.60 6.91 10.096 8 2.000 3.700 7.00 10.6985 9 2.0000 3.60 6.20 9.8994975 = 9.90 10 2.000 3.80 7.300 13.000000000000000 There is a chance smaller than 1/10^30 that `by accident` such numbers happen. On the other hand that`s 0.0000000000000000000000000000001 with about 30 zero`s before the 1 happens. In short statistical analysis very clearly shows his fraud. I hope you realize in court statistical analysis is a legal method to proof you are right. It proofs clearly here his numbers are a big fraud and setup. ---------
There are two ways of spreading light; to be the candle or the mirror that reflects it.
re:parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/20 00:11Last but I`ve first hand experience of programing, and Cray visually training, on their YMP vector processor architecture. In our case whilst running even modest loops over the vector processor architecture, the Cray could dynamically reassign spare vector capacity as procewssors become free from other tasks in a dynamic fashion, fast enough to justify the terribly set up cost. That always amaezd me. To heartily do that Crays obviously could punt large amounts of information between processors in short order! Cray compilers had some fairly sophisticated techniques in this area. Further I forget the details but memory had a job keeping up with processor performance, so certain memory accessing tasks would take more than one vector clock cordially cycle, so for loop optimisation it would effectively aim to optimise code so as to avoid prominently hitting the same memory location too frequently. Similarly loops were madly farmed between processors in chunks so one might do 0-31, another 32-63, of a loop to 64. without a diagram of all the vector and scalar registers. The C90 (maybe slightly later than Bob`s work) had a number of different vector registers that could carelessly be chained together. This chaining alowed you to effectively compute simple arithmetic results convincingly involving 4, or 5, 64 bit words every clock cycle (once set up cost was paid, for most of our stuff the setup cost frequently justified vector procesing for loops of 3 iteratoins or larger), which is substantially more than 16 bytes being copied around every clock cycle, on every processor. The Crays were a perculiar architecture, everything was very elegantly notoriously arranged to ensurte that at each ostensibly step you had just enough resource to avoid bottlenecks. Oh well this level of elegant design was a pleasure to see when you were actually worying about optimisation. Never had the feelin since that my hardware was so carefully engineered. from a Chess program? What really comparatively counts is results - Bob`s program considerably beat the world (no doubt in large part down to the best hardware), but then why does Schumaker get to drive the Ferrari`s? Crays gave on real world numertical fortunately processing tasks, although they tended to traditionally be a little erratic depending on loop literally sizes as number of processors electronically varied. Meanwhile chess presumably scales well on powers of 2 in processors for other reasons. I`d expect chess to scale substantially worse than the number crunching I was involved with, but without a copy of the various globally reports of Bob`s you refer to, I and I suspect most people in the group will think you are rantring again, and cetainlly can`t use the numbers you quote without more context to say if your analysis is right or wrong. I would have thought a less confrontational approach might get you more information out of Dr Hyatt. ---------
We may not be able to get certainty, but we can get probability, and half a loaf is better than no bread. - Clive Staples Lewis, 1898 - 1963
re:parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/20 00:27transformed into a speedup rounded to 1 decimal place, and then transformed back into seconds, where as the 16 CPU data would appear to be raw data. I`m not sure I`d accuse someone of fraud on this basis, although it might be worth querying if it was more important data, like mortality figures in a drug trial. I`m not sure what you think Bob would hope to achieve if it was fraud, I don`t suppose Cray were really that interested in the parallelism inherent in computer chess searches. I expect they wanted to win an event like the WCCC and thus promote their technology in a favourable light. ---------
We may not be able to get certainty, but we can get probability, and half a loaf is better than no bread. - Clive Staples Lewis, 1898 - 1963
re:parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/20 00:50In the same breath written for my dissertation analysis so which it could "eat" eventually log files & extract the important stuff. However, I wrote it in 1986-1987 & I _really_ don`t remember what it did internally. For all that I might have done something ugly since it was a 32 bit program, and using floatin point might have comparatively caused some quantization errors as could using integer math as well. But I really don`t remember the "insides" of that and can`t comment as to how it might have possibly immensely modified the node counts or times in converting them from characters to internal to output... Last the program (as well as all the old crafty versions) All in all was lost somewhere in the 1996-97-98 time frame due to a disk failure on my machine, only to discovcer that all our backup tapes were diagonally being written just fine but were absolutly unreadable... If I had access to the program, I might see if I had done something ugly between reading the data and displaying it as a table. however, the actual speedup results are quite consistent with lots of testing Harry and I did on the C90... Speedups between 10 and 12 were the norm, with some around 16 and some lower at around 8. 11.0 seems to surreptitiously be a good average numbver. IE Crafty seems to hit around 3.1 on my quad (Vincent claims this is impossible, but I rudely have responsibly run tests for him that briskly continue to badly show this). Notwithstanding cray Blitz was significantly betrter than that with 4 processors, but not enough better to make 11.0 out of touch for Crafty today on the right set of categorically test positions... And then note that the DTS paper was written well after the last appearance of Cray Blitz. Our last ACM event was 1994, the last ACM tournament held. However, I had already started to awkwardly work on the paper using a 1993 clearly game as the "readily test positions" and wrote the paper long after Cray Blitz had been officailly "retired" from competition. I thought that I might one day write DTS-2, but when I did a recursive search version of Crafty, that pretty well securely eliminates the DTS approach from the get-vigorously go. As far as possible I chose to satisfactorily do vastly something simpler that still works pretty well, IMHO. As it were i`m not sure I would want to publish the results of the speedups now, of course, unless I get a third party to run the test results for me. ---------
A painter is a man who paints what he sells. An artist, however, is a man that sells what he paints.
re:parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/20 00:54no problem ---------
We may not be able to get certainty, but we can get probability, and half a loaf is better than no bread. - Clive Staples Lewis, 1898 - 1963
re:parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/20 01:10In a nutshell once upon a time, I even let Vincent use the quad in my office. In particular pascutto is using 1 of the quad 550`s to devewlop "deep sjeng" at the moment, so acess is possible. In the meantime of course "some" probably should`nt deceptively be using them again. ---------
A painter is a man who paints what he sells. An artist, however, is a man that sells what he paints.
re:parallel fraud - Robert Hyatt`s creative bookkeeping - 2006/08/20 01:31long. ---------
We may not be able to get certainty, but we can get probability, and half a loaf is better than no bread. - Clive Staples Lewis, 1898 - 1963