This indeed offers a danger for the games not really being independent. I have not had time to look at the games at all, so I couldn't say if there are dupilcats, or many games that follow the same path until after the game is already decided. In normal Chess I guard against this by playing from the Nunn positions, but there is not yet any 10x8 equivalent of that.
Now I must say that against Joker80 the danger is not as large as against many other engines, as Joker randomizes all its moves to a certain extent. The main bias would thus be a preference for the first few opening moves, leading to a position that is strategically unbalanced in favor of one of the engines. So I agree that the current results cannot really be interpreted as a scientific measurement of the strength difference of Joker and Smirf. To be credible such a measurement would have to be done by an independent person anyway, so that is not the purpose now.
One way to avoid the problem (which I use in my piece-value measurement project) is to start from shuffled opening arrays. (So play caparandom, really, but as Joker80 is not a shuffle-Chess engine I have to limit the opening arrays to those that have King and Rooks in the usual place. And on 10x8 this even gives 216 possible shuffles if you restict the Bishops to be on opposite colors, and have one B and one N on either side of the King. And they can be played with both colors, so that gives 432 games that are guaranteed to be different.
Another procedure could be to always start from the same opening position, but play games with reversed colors after a variable, small number of moves. E.g. play a game from scratch, and then take the position after 6 moves (which presumably is still pretty even), and use that as a starting point for the next game with the colors reversed. That way you would eliminate any bias that could have developed in the first 6 moves, and these six moves would only serve to create some variety in the starting positions.