EDIT: In the text below, I should perhaps have typed
fischerandom instead of
chess960. I was playing with the header utility, and not following the specification! Here's some of what I'd written in the posting that the site wouldn't take due to the unicode characters. ...........
I am on Windows 7.
I ran this ....
- Code: Select all
exec C:\\Users\\crystalclear\\chess\\polyglotHeader\\pgheader-1.0.exe -d test.bin > temp.txt
exec C:\\Users\\crystalclear\\chess\\polyglotHeader\\pgheader-1.0.exe -v suicide,chess960 -c "This is a my € ................. comment!\n" test.bin
exec C:\\Users\\crystalclear\\chess\\polyglotHeader\\pgheader-1.0.exe -s test.bin >> temp.txt
and then printed the file temp.txt to look at its contents.
- Code: Select all
Variants supported:
suicide
chess960
Comment:
This is a my €.................... comment!
So it seems that the windows version of the header utility can put UTF-8 characters in a polyglot header and recover them just fine on my windows computer.
I tried the same thing earlier using a Windows BAT file and a Windows console. That didn't work. However I expected that the problems lie (as usual) with the Microsoft software, so I used an alternative shell to launch the polyglot header program and recover its output.
I know the comment looks like garbage, but I wanted to test a fairly random selection of unicode characters, to see if I could put them in the opening book and get them back out. I think the Polyglot opening books will be fairly transparent to the whole thing and it's only a question of how the bytes are interpreted and displayed. With the tree structure of the header there is no reason that a later version of the specification couldn't have an ASCII only comment and a unicode comment for program that can handle it.
We could compromise at the moment by saying that books should initially use ASCII only; programs may refuse to display comments with non ASCII characters; and the preferred interpretation of characters above 127 is UTF-8 with whatever byte order Michel happens to have.
For UTF-8 I think byte order needs to be specified, so the fact that Michel and I can display our unicode characters correctly doesn't necessarily mean that he would display things correctly if I emailed him my opening book. Integers in Polyglot opening books are bigendian I think, and most computers are little endian. Confusion is avoided by things being well specified and read a byte at a time in the polyglot software thus making it machine architecture independent. IF we specify UTF-8 I guess we need to specify a byte order.
My editor gives the byte codes for the € symbol as
E2 82 AC
- Code: Select all
00000000: 00 00 00 00 00 00 00 00 - 40 50 47 40 0A 31 2E 30 | @PG@ 1.0|
00000010: 00 00 00 00 00 00 00 00 - 0A 33 0A 32 0A 73 75 69 | 3 2 sui|
00000020: 00 00 00 00 00 00 00 00 - 63 69 64 65 0A 63 68 65 | cide che|
00000030: 00 00 00 00 00 00 00 00 - 73 73 39 36 30 0A 54 68 | ss960 Th|
00000040: 00 00 00 00 00 00 00 00 - 69 73 20 69 73 20 61 20 | is is a |
00000050: 00 00 00 00 00 00 00 00 - 6D 79 20 E2 82 AC 20 E8 | my |
00000060: 00 00 00 00 00 00 00 00 - BF B7 EB 99 97 F0 9D 94 | |
00000070: 00 00 00 00 00 00 00 00 - 89 EF B7 B5 EF B8 97 EF | |
00000080: 00 00 00 00 00 00 00 00 - AD BA ED 9C 98 EC A3 BC | |
00000090: 00 00 00 00 00 00 00 00 - EA 99 AC EA 97 A4 EA 97 | |
000000a0: 00 00 00 00 00 00 00 00 - B6 EA 97 BA EA 97 BC EA | |
000000b0: 00 00 00 00 00 00 00 00 - 94 9A EA 94 99 EA 91 84 | |
000000c0: 00 00 00 00 00 00 00 00 - EA 92 93 EA 92 96 EA 91 | |
000000d0: 00 00 00 00 00 00 00 00 - 98 EA 8D 94 20 63 6F 6D | com|
000000e0: 00 00 00 00 00 00 00 00 - 6D 65 6E 74 21 0A 00 00 | ment! |
000000f0: 00 00 96 8B 7F CB 18 68 - 0E 39 00 05 00 00 00 00 | h 9 |
00000100: 00 00 DA 48 99 75 03 D0 - 0F 74 00 75 00 00 00 00 | H u t u |
00000110: 00 00 DA 48 99 75 03 D0 - 0E AC 00 0E 00 00 00 00 | H u |
00000120: 00 01 A6 F6 D7 E6 3F 5F - 02 93 00 06 00 00 00 00 | ?_ |
00000130: 00 01 BA 75 2B FB 0B 80 - 02 10 00 07 00 00 00 00 | u+ |
00000140: 00 01 BF 34 2C BC 43 E4 - 0F AD 00 0D 00 00 00 00 | 4, C |
00000150;
and they are visible in a hexdump of the opening book in that order too. Previous versions of the editor have allowed UTF-8 text files to be saved as little or big-endian, with or without a byte order marker. Now the option is "unicode transformation format" (whatever that means) and you cannot choose the byte order, although the byte order marker is still optional.