Engines lose on time. An Xboard bug?

Discussions about the WinBoard protocol. Here you can also report bugs and request new features.

Moderators: hgm, Andres Valverde

Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 11 Jan 2009, 20:05

I decide to set up a testing procedure in xboard (Linux Ubuntu 7.04) and I found this behavior: If I play a match between two engines with 140 positions (Nunn + Noomen + Silver, B&W = 280 games) at 20 seconds/40 moves, I always find 32-34 games (16-17 positions x 2 black&white) in which the game is decided by one of the engines losing on time. These games are ALWAYS in a row. In a couple of occasions, I found two of these ~30-game streaks in 280-games match.

This is not ok because these engines do not lose on time so easily. I checked gaviota logs, and it looks like gaviota sends a move, and xboard receives it almost a second later. The same with the opposite engine. This happens for several games in a row, and then the behavior disappears. I have not run this with xboard debug yet, I will try to catch it. Any comments? Ever heard of it?

I do not think I have anything running on the background that can cause this.
I am running single matches on a Dual, and this behavior is present in both xboard 4.2.7 and 4.3.14 (could not compile 4.13.15 yet)

I have a work around, which is a ruby script that replays those games, but this is not the best solution.

Regards,
Miguel
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby Eric Mullins » 11 Jan 2009, 22:35

Miguel A. Ballicora wrote:These games are ALWAYS in a row.
and
Miguel A. Ballicora wrote:This happens for several games in a row, and then the behavior disappears.


I know you said you didn't have any background processes causing it. Still, my inclination is towards external factors. The eventual self-correction, and the different versions make me suspect that.
Eric Mullins
 
Posts: 47
Joined: 28 Aug 2008, 04:54
Location: Albuquerque, NM

Re: Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 11 Jan 2009, 23:36

Eric Mullins wrote:
Miguel A. Ballicora wrote:These games are ALWAYS in a row.
and
Miguel A. Ballicora wrote:This happens for several games in a row, and then the behavior disappears.


I know you said you didn't have any background processes causing it. Still, my inclination is towards external factors. The eventual self-correction, and the different versions make me suspect that.


It is possible. Unfortunately, I did not catch this yet while I was watching. I did not save xboard debugs, to make sure it was not a writing to the HD problem. Next step will be to reproduce this and save the debug files. However, if this were an external problem. I would expect that once in a while I would run 280 games without a problem, and that has not happened.

The fact that I get this problem with a number of games close to 32 was very suspicious. Also, this happen about once every session, which is 280 games x ~ 1 min = 280 min = ~16800 seconds, which is to close to the max number of a short int. I think is coincidence. :-) I looked at the xboard sources and the time variables are all long or int in case of ms (which is 32 bits and actually need 16 bits).

Sorry to come with this lousy bug report, first I wanted to ask if anybody have seen this before.

Miguel
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby H.G.Muller » 12 Jan 2009, 09:23

Miguel A. Ballicora wrote:I decide to set up a testing procedure in xboard (Linux Ubuntu 7.04) and I found this behavior: If I play a match between two engines with 140 positions (Nunn + Noomen + Silver, B&W = 280 games) at 20 seconds/40 moves, I always find 32-34 games (16-17 positions x 2 black&white) in which the game is decided by one of the engines losing on time. These games are ALWAYS in a row. In a couple of occasions, I found two of these ~30-game streaks in 280-games match.

You would have to be a little more specific. I suppose this is in Xboard matchMode, with the engines being reused (i.e. you have not specified -xreuse or -xreuse2).

The main question is: do the engines support the ping feature? And how exactly does the engine play in the games it forfeits on time: does it play at all, or does it simply refuse to make the first move?

In matches f engines without ping there often are problems with moves of one engine being recieved after a new game has started, which might lead to a deadlock.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Engines lose on time. An Xboard bug?

Postby Michel » 12 Jan 2009, 10:56

I assume the engines were not pondering....
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Engines lose on time. An Xboard bug?

Postby H.G.Muller » 12 Jan 2009, 11:15

If this occurs so consistently, best would be to make a debug file, to see what time commands are sent to the engine, and look at the time stamps and engine-reported thinking time at the last move of a game that was forfeited.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Engines lose on time. An Xboard bug?

Postby Michel » 12 Jan 2009, 11:47

I assume the engines were not pondering....


Actually this might not be relevant since one engine is always thinking.

A long shot: did you try running the engines at lower priority than xboard?

I noticed once on Cygwin that data sent on a pipe may take a long time to arrive
if the receiving end is running at the same priority as the sending end and one of the
processes is busy.

I never noticed this on Linux, and anyway the current situation is not the same one, but it might be worth a try.
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 12 Jan 2009, 18:37

H.G.Muller wrote:If this occurs so consistently, best would be to make a debug file, to see what time commands are sent to the engine, and look at the time stamps and engine-reported thinking time at the last move of a game that was forfeited.


That is exactly what I did last night. I left match running with a script that caught the individual debugfiles of games "lost on time". The problem is that in some games, AnimateMove consistently takes way too long.

Xboard sends "usermove", and the engines receives it sometimes 0.7 seconds later. In this example:

time odds: 1 1
131937 >first : time 83
131937 >first : otim 137
131937 >first : usermove 131937 >first : Bc6
AnimateMove: piece 2 slides from 0,3 to 2,5
132644 <first : ********* Iterative deepening = 0
132644 <first : 1 -608 0 3 Rxb2 37.Kxb2
132644 <first : 1 -582 0 7 f5 37.Rxe6
132644 <first : 1 -236 0 10 g5
132644 <first : 1 -225 0 14 h5
132644 <first : 1 -225 0 37 h5

The engine starts thinking 132644-131937 = 707 ms after "usermove" was sent. Looking at the engine debug, the engine receives the move and starts thinking immediately. Comparing the timestamps, there is a lag between xboard sends "usermove" and the engine receives it. The only thing that is in between is AnimateMove. For some reason, in some cases this take way longer, chewing up engines up to ~0.7 seconds each move, and even worse, not telling the engine about it because "time" is sent before the lag. This happens every move until one of the engines run out of time.

I ran one match without AnimateMove and the problem seemed to disappear.

Conclusion: I should not run matches with AnimateMove but I guess I should have known that. What confused me was that most of the time AnimateMove seems to be ok, but for some reason it may become consistently slower at times.

Generally, I prefer to see the animation because I can follow for a while to spot potential eval problems in the engine. However, this is more dangerous than I thought.

Thanks for the suggestions to everybody,
Miguel
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby Eric Mullins » 12 Jan 2009, 19:36

But why is this only happening sometimes? I can appreciate the value of turning off animatemove to clear up the issue. But I have a feeling it is a symptom of something else rather than the core problem.
Eric Mullins
 
Posts: 47
Joined: 28 Aug 2008, 04:54
Location: Albuquerque, NM

Re: Engines lose on time. An Xboard bug?

Postby xinix » 12 Jan 2009, 21:33

Do you have a screensaver running ?

Tony
xinix
 
Posts: 22
Joined: 28 Aug 2008, 21:42

Re: Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 13 Jan 2009, 01:11

xinix wrote:Do you have a screensaver running ?

Tony


Yes, but I have a dual processors running a match with no pondering. I assumed that the extra processor should be able to take care of that.

Do you think that the AnimateMove() call may badly interact with the screen saver?

Miguel
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby Eric Mullins » 13 Jan 2009, 03:12

Regarding screensaver possibility: it may actually fit the data. Consider:

The screensaver activates, causing the issue until you wander by some time later and check on the status, thus stopping the screensaver until the next cycle.

It's easy to test, in any event.
Eric Mullins
 
Posts: 47
Joined: 28 Aug 2008, 04:54
Location: Albuquerque, NM

Re: Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 13 Jan 2009, 09:41

Eric Mullins wrote:But why is this only happening sometimes? I can appreciate the value of turning off animatemove to clear up the issue. But I have a feeling it is a symptom of something else rather than the core problem.


I wrote a script in Ruby that calculates the delay from the xboard debug file every time xboard sends a move and gets the first response (generally engines respond quickly after the first ply is analyzed). Once I got that, I plot all the delays in milliseconds from two fragments of matches. Red is with AnimateMove, Blue is without. You can see that animates chews 150 ms each move, but once in a while, a disaster happens. Without animate, only ~10 ms per move or so.

This does not answer what the problem is, but it gives a better picture. Some external thing may interact badly with animate. Maybe it is the screen saver. In any case, animate move gives a delay that is not good for fast games (150 ms each ply means 6 seconds per player/40 moves).

Image

This is the ruby script if anybody wants to run it. In fact, I am going to use is systematically to track possible problems. You can run it, after installing Ruby in your computer
http://www.ruby-lang.org/en/downloads/

Save this code as a text file with the name "debugdelay.rb" and run it from the terminal as

ruby debugdelay.rb nameofthedebugfile > output.txt

"output.txt" can be imported from any spreadsheet. Several debugs can be concatenated into one file if needed before processing it.

Code: Select all

def isfile x
   x=='a'||x=='b'||x=='c'||x=='d'||x=='e'||x=='f'||x=='g'||x=='h'
end

def isrank x
   x=='1'||x=='2'||x=='3'||x=='4'||x=='5'||x=='6'||x=='7'||x=='8'
end

def there_is_a_move s
   arr = s.split(':')
   b = arr[1].strip   
   return b.length == 4 && isfile(b[0..0]) && isrank(b[1..1]) && isfile(b[2..2]) && isrank(b[3..3])
end

def get_ms s
   arr = s.split(' ')
   arr[0].to_i
end


filedebug = ARGV[0]
File.open filedebug, "r" do |f|
   lines = f.readlines
   i = 0
   lines.each do|line|
       if line != nil
         line = line.chomp
         if (line["otim"] != nil)
            a = lines[i+1].chomp
            if (a["usermove"] != nil || there_is_a_move(a))
               b = lines[i+2].chomp
               if (b["AnimateMove"] != nil)
                  b = lines[i+3].chomp      
               end
               m = b.split(' ')[1]
               if (m == '<first' || m == '<second')
                  delay = get_ms(b) - get_ms(line)
                  puts delay
               end
            end
         end
      end
      i += 1
   end
end
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 18 Jan 2009, 00:43

Miguel A. Ballicora wrote:
Eric Mullins wrote:But why is this only happening sometimes? I can appreciate the value of turning off animatemove to clear up the issue. But I have a feeling it is a symptom of something else rather than the core problem.


I wrote a script in Ruby that calculates the delay from the xboard debug file every time xboard sends a move and gets the first response (generally engines respond quickly after the first ply is analyzed). Once I got that, I plot all the delays in milliseconds from two fragments of matches. Red is with AnimateMove, Blue is without. You can see that animates chews 150 ms each move, but once in a while, a disaster happens. Without animate, only ~10 ms per move or so.

This does not answer what the problem is, but it gives a better picture. Some external thing may interact badly with animate. Maybe it is the screen saver. In any case, animate move gives a delay that is not good for fast games (150 ms each ply means 6 seconds per player/40 moves).

Image

This is the ruby script if anybody wants to run it. In fact, I am going to use is systematically to track possible problems. You can run it, after installing Ruby in your computer
http://www.ruby-lang.org/en/downloads/

Save this code as a text file with the name "debugdelay.rb" and run it from the terminal as

ruby debugdelay.rb nameofthedebugfile > output.txt

"output.txt" can be imported from any spreadsheet. Several debugs can be concatenated into one file if needed before processing it.

Code: Select all

def isfile x
   x=='a'||x=='b'||x=='c'||x=='d'||x=='e'||x=='f'||x=='g'||x=='h'
end

def isrank x
   x=='1'||x=='2'||x=='3'||x=='4'||x=='5'||x=='6'||x=='7'||x=='8'
end

def there_is_a_move s
   arr = s.split(':')
   b = arr[1].strip   
   return b.length == 4 && isfile(b[0..0]) && isrank(b[1..1]) && isfile(b[2..2]) && isrank(b[3..3])
end

def get_ms s
   arr = s.split(' ')
   arr[0].to_i
end


filedebug = ARGV[0]
File.open filedebug, "r" do |f|
   lines = f.readlines
   i = 0
   lines.each do|line|
       if line != nil
         line = line.chomp
         if (line["otim"] != nil)
            a = lines[i+1].chomp
            if (a["usermove"] != nil || there_is_a_move(a))
               b = lines[i+2].chomp
               if (b["AnimateMove"] != nil)
                  b = lines[i+3].chomp      
               end
               m = b.split(' ')[1]
               if (m == '<first' || m == '<second')
                  delay = get_ms(b) - get_ms(line)
                  puts delay
               end
            end
         end
      end
      i += 1
   end
end


After running more matches I collected data that I think may be interesting to people running fast matches. With -xanimate and -blind, and scrensaver disabled, the average delay is 8 milliseconds. However, there are few spikes once in while. Only 0.7% of the time the delay is above 40 ms and only 0.24% of the time is above 100 ms. In other words, running matches with an average move speed of 100 ms (for instance, 4s/40moves) will have a delay that may seriously mess it up 1 every 400 moves. However, that should happen exactly at the last move of the time control. If the games go through 2 time controls (~80 moves) there are four chances to mess it up per game. Therefore, we could expect to see a game lost on time every 100 games. That is 99 games correct, one coin toss. If the engines are relatively close in Elo, this noise won't be significant. This could be even less problematic if the engines have a cushion time of at least 100 ms ((5% of the total time) reserved for the last move. If my estimation is correct, and the engines support it, running matches at 4s/40 moves may be ok.

This was tested in 2X AMD 2.4 Ghz, Ubuntu 7.04 using only one core for the matches (1400 games at 20 s/40 moves). At this pace, I had no losses on time. Engines used so far: Gaviota, Beowulf, Rattatechess, Olithink, scidlet, EXchess. Rattatechess seems to have a cushion of 10 seconds. It never loses on time but plays a bit weaker at this pace.

ANOTHER TIP: Close your folders in which xboard is saving the debug or the ones the engines are saving something. Nautilus (the Gnome file managesr) updates the information constantly and chew up a bit of time. Maybe not critical if there is one core available (I do not know), but certainly bad if all the cores are being used.

Miguel
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby H.G.Muller » 18 Jan 2009, 08:56

Is this with or without ponder, and on how many CPUs?

I could imagine that running a ponder-on match on a dual would drive up the unpredictability, and perhaps average duration of the delay, because of scheduling delays. After an engine produces a move, it does not relinguish the CPU, but starts pondering. As the other engine was also pondering at that point, it means there is no free CPU available for the GUI, so it might have to wait until a clock interrupt ends the time-slice of one of the engines and switches task. I am not sure how this works in Windows, if a process that is woken up because it was waiting for input and the input becomes available gets a higher priority than processes that were already running. The sanme problem could occur in a ponder-off match on a dual when some background task became active.

In this case it might be helpful to lrun the engines at lower priority than the GUI. The winbaord.exe in my ICSclient.zip package of my website supports the command-line option /niceEngines=10 that could implement this. The GUI, after its iput becomes available, should then immediately receive CPU attention. Even on a WinBoard that does not support this, you could try to raise the GUI priority to "above normal" in the task manager.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 18 Jan 2009, 10:30

H.G.Muller wrote:Is this with or without ponder, and on how many CPUs?

I could imagine that running a ponder-on match on a dual would drive up the unpredictability, and perhaps average duration of the delay, because of scheduling delays. After an engine produces a move, it does not relinguish the CPU, but starts pondering. As the other engine was also pondering at that point, it means there is no free CPU available for the GUI, so it might have to wait until a clock interrupt ends the time-slice of one of the engines and switches task. I am not sure how this works in Windows, if a process that is woken up because it was waiting for input and the input becomes available gets a higher priority than processes that were already running. The sanme problem could occur in a ponder-off match on a dual when some background task became active.

In this case it might be helpful to lrun the engines at lower priority than the GUI. The winbaord.exe in my ICSclient.zip package of my website supports the command-line option /niceEngines=10 that could implement this. The GUI, after its iput becomes available, should then immediately receive CPU attention. Even on a WinBoard that does not support this, you could try to raise the GUI priority to "above normal" in the task manager.


It was ponder off, on a dual core, one core per engine.

Now things are very well, as I mentioned in the last post, with -xanimate, -blind. I consider 8 ms delay ok. What do you think?

When I had the problems, it must have been something that I did not pay attention to, like the screen saver or folders open that Linux automatically update. Checking the system monitor, nothing else seems to be running.

Anyway, your advice of raising the priority of the GUI is a very good idea. I can easily do that manually. It will become handy whenever I decide to test with ponder on.

Miguel
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby Michel » 18 Jan 2009, 12:14

Anyway, your advice of raising the priority of the GUI is a very good idea.


I already suggested this in the beginning of this thread....
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Engines lose on time. An Xboard bug?

Postby xinix » 18 Jan 2009, 14:08

Miguel A. Ballicora wrote:
xinix wrote:Do you have a screensaver running ?

Tony


Yes, but I have a dual processors running a match with no pondering. I assumed that the extra processor should be able to take care of that.

Do you think that the AnimateMove() call may badly interact with the screen saver?

Miguel


It could. I have seen some bad stuff happening specially with openGL and DirectX screensavers.
Some of them just take 100% cpu on low piority but need some time to "give back the cpu".

Remember that even with a dual cpu and 2 heavy processes running, you can't be sure each process has its own cpu.
Besides that: Gui, 2 engines and screensaver would already be 4.


Tony
xinix
 
Posts: 22
Joined: 28 Aug 2008, 21:42

Re: Engines lose on time. An Xboard bug?

Postby Miguel A. Ballicora » 18 Jan 2009, 22:12

Michel wrote:
Anyway, your advice of raising the priority of the GUI is a very good idea.


I already suggested this in the beginning of this thread....


I know, sorry Michel, I should have been more specific.
My answer was in plural, but english does not make a distinction between "you", "your" for one person or the whole group. I have not tested that yet because I was not doing ponder "on".

I think I may start the the engines with xboard -fcp "nice gaviota" rather than
xboard -fcp gaviota. ("nice" lowers the priority in Linux/Unix).

Things may be different according to the Linux kernel we use:

http://gustavus.edu/+max/os-book/updates/CFS.html

I am using 2.6.20-17, which means that processes switch every 100 ms at the default priority. To really make a difference, I should lower the priorities of the engines a lot, so they will switch every 5-10 ms.

With the latest kernel, things are scheduled in different way.

I should upgrade my OS but I am procrastinating...

Miguel
User avatar
Miguel A. Ballicora
 
Posts: 160
Joined: 03 Aug 2005, 02:24
Location: Chicago, IL, USA

Re: Engines lose on time. An Xboard bug?

Postby ChessKnight » 23 Jan 2009, 07:42

I always find 32-34 games (16-17 positions x 2 black&white) in which the game is decided by one of the engines losing on time. This is not ok because these engines do not lose on time so easily.


I have also faced similar problem that engine losing on time and engine-to-engine match due to call auto-flag feature particularly with Winboard+Polyglot+UCI engine.

By the way very quick question:

Can we check or run the engine in debug (window) mode in Winboard?

Thanks
ChessKnight
 
Posts: 48
Joined: 26 Dec 2008, 06:37


Return to WinBoard development and bugfixing

Who is online

Users browsing this forum: No registered users and 8 guests

cron