If you look at it in a broader perspective, evaluation falls in he class of 'convergence-accelerating techniques'. A program that searches all the way upto checkmate (e.g. tablebases) would not need evaluation, it just counts DTM. Evaluation is an (attempted) approximate way to predict what further search would uncover.
(Convergence-accelerating techniques are often used in numerical analysis. For instance, if one has to calculate the infinite series 1-1/2+1/3-1/4+1/5-1/6+...., one can make a computer simply do the addition brute force (the chess equivalent would be fixed-depth minimax), adding terms (increasing depth) until the result is stable to the desired precision. A mathematician can tell you that the eventual result equals ln(2)=0.693147, and the computer gets there by brute force only after a million terms, and not even a supercomputer could get you to 12 digits this way. One can try to be less dumb with the additions, and not stop abruptly (which makes the result alternately too high and to low) but take the average of to consecutive terminations, which amount to adding the last term not fully, but only half (the equivalent of QS, where you don't consider all moves in the last ply of the search, but only a fraction). The number of good digits grows much faster then. (Try it in your favourite spreadsheet!) You can push this approach further by, e.g. including the last 2 terms you consider only with weights 0.75 and 0.25, respectively (a smarter and deeper 'QS'). On the other hand, if you are a mathematician, (but don't have a logarithm table at hand), you would realize that you can guess the remaining part of the sum, e.g. by approximating it as an integral, and that everyting beyond the last term 1/N you did take into account explicitly adds up to ~ -0.5/(N+0.5+0.25/N). Correcting every partial sum with this simple formula (the 'evaluation') produces a result that is correct to 6 digits already after 20 terms, rather than a million! The correction is only applicable, though, to the abrupt termination of the series. If you took the last term into account only by half, you would need another correction. So you cannot speak of a 'better correction' in general, the correction has to be tailored to the way you terminate the brute-force calculation. A comparison is only meaningful between corrections in their own context.
This example might not appeal at all to non numerical analistst.... )
Thus, as any convergence accelerator, the evaluation is highly dependent on the conditions under which you terminate your search. For instance, a search algorithm without quescence search would need a very good static exchange evaluator. In a program that exhaustively search all captures in QS, like TSCP, an SEE would be completely useless, since it would always contribute zero at evaluation time, since eval is only called when there is nothing left to (profitably) capture. If the SEE is not perfect, it might actualy hurt: in QS nodes where all captures are tried and found to be no good (i.e. lose material) we fall back on the evaluation, and if the SEE erroneously thinks that there is a good capture (because of subtle tactics exceeding its capablilities), the evaluation will be totally wrong.
Another example that I actually encontered in my minimalistic chess code posted in another thread, is end-game evaluation. Before I included a has-table, it could only do 12 plies in the KQK end-game, which in general is not enough to see the mate. Without any help from the evaluation the program than would run with one of its pieces to a corner and keep sulking there, never finding the mate, and hardy winning any games. A simple evaluation that was helpfull draws the king through the center of the board though piece-square tables, but even that would not bring the mate within the horizon, certainly not in KRK. It needed to include an evaluation rule that awarded the winning king for being 'close' to the losing king, e.g. in the same 4x4 corner of the board. This very rough ad-hoc measure for the DTM makes it find the mate always. But now that I implemented the hash table even the KRK mate is found easily in the search (the hash table is large enough to build the full tablebase), and the rule only hurts. Evaluation points for taking the 'opposition' or pushing passed pawns forward is good for searches that are not able to find the queening within their horizon (e.g. by extensions for moving passed pawns). For programs that do search deep enough, these evaluation terms are just noise that might put it off track: pushing the pawn too fast in the KPK end game is the classical beginners mistake that gives away the win.
So the more I think of it, the more important I believe it is that the evalution is tuned to the way the search terminates. Other aspects of the seach, such as extension and move sorting, probably interact with the evaluation much less. (With the caveat that when internal iterative deepening is used to sort the moves on the score from the previous iteration, the efficiency of the alpha-beta might be strongly dependent on the evaluation. But if you go to fixed depth irrespective of time or node count, you implicitly corrected for that.)