Winboard Forum

by **Jaco van Niekerk** » 01 Aug 2006, 20:11

Hello all

I regard myself as new to the chess engine world, but enjoy it very much. I've decided on writing my engine in Java. I know Java well and once the Engine is good enough (giving Crafty a go, for example, he-he-he), I'll port it to C for that extra ooomph!

I have an interesting idea for learning, but I do not know if it has been implemented before. As a game is played, I will evaluate the opponent to judge whether he/she/it is playing better equal or much worse than the engine. (Who wins and does the opponent usually choose good moves, etc).

The series of board positions will then be stored in secondary storage to be used in future games (positions to avoid, etc). With a lot of repetition, the board positions that results in the most wins, will be chosen more often. Furthermore, a single hard-disk lookup per move will hardly require a lot of time.

Probably the first 10 moves or so. A sort of self-adapting, evolving opening book. Is this a stupid idea? Please pull this apart and tell me the drawbacks??

by **Ilari Pihlajisto** » 01 Aug 2006, 22:04

The series of board positions will then be stored in secondary storage to be used in future games (positions to avoid, etc). With a lot of repetition, the board positions that results in the most wins, will be chosen more often. Furthermore, a single hard-disk lookup per move will hardly require a lot of time.

At first this may sound like a good idea, but in the end it would fail in my opinion. If your engine plays against a weaker opponent and keeps winning, it will give a high score to pretty much all of its moves. And if the opponent is a lot stronger, your engine will give a bad score to every move.

To me it's very hard to reliably decide whether a move is good or bad based on wins/losses statistics, so I just generate my opening book from a big set of grandmaster games. The moves are then given a score based on the number of times they were played.

I'm also planning of implementing a simple learning book to prevent my engine from losing the same way twice. I haven't yet figured out how exactly I'm going to do that, but prof. Hyatt's publication on book learning has given me a lot of direction: http://www.cis.uab.edu/info/faculty/hyatt/learning.html

by **Jaco van Niekerk** » 02 Aug 2006, 08:50

Thanks for your input. I do not only use the fact that I win/lost against my opponent. I was thinking of the following:

Each time my opponent makes a move, I will check whether that move is in the top 5 moves that was identified in the previous search. I'll give 5 points for the best move, 4 for the next and so forth. If very bad moved are made, negative scoring is used. At the end I find an average to gauge the relative strength of my opponent.

If the engine looses against the opponent, he/she/it is probably better than the engine, so I use my OPPONENT'S moves to strengthen my opening book, not mine (the engine's) moves.

If I win, I strengthen the book, based on my moves and only if the opponent average score is high enough.

Win/Loose only will definitely weeken the opening book for the exact reasons you mentioned.

by **mjlef** » 02 Aug 2006, 14:23

Ther are several "learning" systems in use by various programs. One pretty safe method is just to save the results of each search in a "persistant has table". This table can be used in future games during the search to see that a given move sequence leads to a worse position than expected. Basically it extends the line of search along the path of past games. This means the program will see the effect of a bad move choice sooner, and avoid playing it by trying something else. The code is simple...just load that hash table from disk, during game play, store root positions, depths and scores in it, and save it to disk when you close the program. It is a very effective way to keep other programs or opponents from following the lines of a past game and winning the same way a second time.

Mark

by **Alvaro Begue** » 02 Aug 2006, 17:27

Mark's method has a problem, at least as described. Imagine your program makes a bad move and it only realizes how bad things are three moves later. Now we play again the same game. The program uses the stored scores, but the bad score is never accessed (well, until it is too late), since the hash lookup will match one of its predecessors and we will not get to explore farther into the branch.

One solution I thought of a long time ago (and never implemented) is an analysis mode where a game is "learned". The engine will start by the end of the game, thinking about each position for some time, then storing the score in the persistent hash table in disk. When we move to the previous position, the positions that came later in the game have already been searched and are used in the search. This way the bad score will propagate back until the engine finds an alternative move that would have kept it out of trouble.

I think this variation of what Mark suggested will behave much better.

by **mjlef** » 02 Aug 2006, 18:48

Alvaro Begue wrote:Mark's method has a problem, at least as described. Imagine your program makes a bad move and it only realizes how bad things are three moves later. Now we play again the same game. The program uses the stored scores, but the bad score is never accessed (well, until it is too late), since the hash lookup will match one of its predecessors and we will not get to explore farther into the branch.

One solution I thought of a long time ago (and never implemented) is an analysis mode where a game is "learned". The engine will start by the end of the game, thinking about each position for some time, then storing the score in the persistent hash table in disk. When we move to the previous position, the positions that came later in the game have already been searched and are used in the search. This way the bad score will propagate back until the engine finds an alternative move that would have kept it out of trouble.

I think this variation of what Mark suggested will behave much better.

Actually, that is not a probelm. During the search, the program will find both the move played, but also the moves deeper in the game with the lower scores. These low score moves will get propogated back in the search tree and change the best move picked. It actually does work...I had it in my program for a while. And others use it too. Very little work---you can even use the existing hash table and just preload these positions in it before a search.

Mark

by **Alvaro Begue** » 02 Aug 2006, 18:59

mjlef wrote:Actually, that is not a probelm. During the search, the program will find both the move played, but also the moves deeper in the game with the lower scores. These low score moves will get propogated back in the search tree and change the best move picked. It actually does work...I had it in my program for a while. And others use it too. Very little work---you can even use the existing hash table and just preload these positions in it before a search.

Mark

Maybe there is something I don't understand, then. Let's see if we can agree on what would happen in a more detailed example.

For simplicity of the argument, let's say we always search to a fixed depth of 12 (granted, the problem is not as systematic in a more realistic setting). Your move number 20 was bad, but you still had a good score for it, which gets stored in the persistent hash. The scores found when searching moves 21 and 22 are also ignorant of the bad move, but the score found when searching move 23 is disastrous.

Now we are playing the game again and we get to the same position on move 20, and now we have the opportunity to do the right thing and avoid the bad move. Whenever the search tries the bad move, it will find the hash entry recorded when searching move 21. Because the depth of the stored search result is larger than the current depth, its score is used and the search of this branch is stopped. So we will probably repeat the bad move.

What do you think would happen differently than I described?

by **mjlef** » 03 Aug 2006, 07:34

Alvaro Begue wrote:
mjlef wrote:Actually, that is not a probelm. During the search, the program will find both the move played, but also the moves deeper in the game with the lower scores. These low score moves will get propogated back in the search tree and change the best move picked. It actually does work...I had it in my program for a while. And others use it too. Very little work---you can even use the existing hash table and just preload these positions in it before a search.

Mark

Maybe there is something I don't understand, then. Let's see if we can agree on what would happen in a more detailed example.

For simplicity of the argument, let's say we always search to a fixed depth of 12 (granted, the problem is not as systematic in a more realistic setting). Your move number 20 was bad, but you still had a good score for it, which gets stored in the persistent hash. The scores found when searching moves 21 and 22 are also ignorant of the bad move, but the score found when searching move 23 is disastrous.

Now we are playing the game again and we get to the same position on move 20, and now we have the opportunity to do the right thing and avoid the bad move. Whenever the search tries the bad move, it will find the hash entry recorded when searching move 21. Because the depth of the stored search result is larger than the current depth, its score is used and the search of this branch is stopped. So we will probably repeat the bad move.

What do you think would happen differently than I described?

The search would go like this:

at a given depth itteration, the old move would be found with acceptable depth, and the score returned immediately. So in a fraction of a second, you would get the same result and depth as the originial move 20. But then the itterative searchj would increment the depth (lets say from depth 9 to 10). The persistant has move does not have enough depth to be accepted, so the program has to search. During that search it reaches the move played in old 21, and it would have enough depth, probably, so that branch does not ahve to be searched. But this speeds up the whole search process (remember, the old mobve 20 too say a minute to search, but using the persistant hash, that takes a fraction of a seocnd now). Anyway, since the new depth 10 search ends a lot faster since the important lines are found form the persistant has, it will probably get in yet another depth increment. And any of the lines ending in the future moves 21, 22, 23.. will be found immediately, and the move 23 in the PV is refuted.

Will this always find a better move? No, since another line not leading to old move 23 might still not see whatever was found at move 23. But then when a new 22 comes along, it will almost ceratinly see the problem sooner. Bad things tend ti be found 1-2 moves sooner with this method. One move sooner almost always. 2 depends on the nature of the bad line, and how many other misleading lines are left to be discovered.

It is pretty easy to test. Try just saving the hash score and depths for a game, then replaying the game and you will see what happens. A similar idea is opening book expansion. Some program save the hash score and depth for the first few moves after leaving the opening book. This means they will be smarter next game along the same lines. This stuff works well on opponents that keep trying to find a flaw in a program and they play the same lines over and over again. If they win a game, they want to repat it, but this mething will make the program play somethign new.

Mark

by **Alvaro Begue** » 03 Aug 2006, 16:29

That explanation makes sense. Still, revisiting the game backwards seems more robust to me, but it's obviously a lot more hassle.

It also looks like this method would wreak havoc with time control, at least the way I am implementing it right now. You could easily get to depth 14 in 0.01 seconds and then take several minutes to complete depth 15. Under my current implementation of time control I would waste a lot of time on this position and then abort the search without getting anything usefull out of all that time. Do you have a simple way of handling this? I am thinking of aborting the search if I have spent more than some constant (say 0.5) times the average time per move and the current depth is taking more than say 30 times longer than the previous depth.

by **Peter Fendrich** » 17 Aug 2006, 13:49

I once had this idea with Terra to just "steal" the books from the best opponents.
It is much simpler and not at all based on results. The drawback is that it is time consuming. You need a computer free 24 hours 7 days a week.

The idea is to just pick the opponent move and set counter+1 for this move. My program will always select from the own book, with higher probability for the move with highest counter. If there is no move in the book we must compute something. Break the game (resign) when the opponent start to think.

Let's say Fritz, Shredder and Crafty have the best books in the world. (I have no idea if this is true!)

Now, following this scheme I would copy their best lines:
I set the time to 3 second per move.
Lets say we play white.
e4 (computed) e6 (store in book with counter 1)
d3 (computed) d5 (store in book with counter 1)
e5 (computed) Nc6 (after 3 seconds)
Now resign

Next game we play black
e4 (store with counter 1) e6 (picked from book)
d4 (store with counter 1) d5 (computed)
and so on

After a week or so with Fritz, just do the same with Shredder and next with Crafty...

After myriads of games we have a pretty good book collected from the best! In normal games this option is turned off. No learning.

I never implemented this idea. I am not sure how it applies to the clone debate and in the end I thought it was more funny to invent my own learning scheme.

/Peter

Winboard Forum

Idea for opening book

Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Re: Idea for opening book

Who is online