If I get five bars on the reception, should I simply assume that my measurements are , say, 95% accurate?
No, but you may assume that there's a high probability that the results are correct.
I see where four bars reduces the accuracy, by about what percentage of tolerance?
I think it's better to say that 4 bars indicate increased probability that the results may not be correct. That sentence tells it all.
Due to highly non-linear dependencies there's no formula for calculating measurement tolerance from the signal quality rating (which is a mix of noise voltage and AM). For example the noise voltage may be affected by more than 5 influences, while each influence has a different impact on the measurements. There's no way how to detect all influences separately and plausibly. We're talking about FM which is a non-linear modulation. The noise voltage is also affected by the peak deviation but peak deviation is affected by the noise voltage. That task has no solution if the reception quality is insufficient (but in the P275, some correction algorithms expect fixed 75 kHz peak deviation so it can be said that this value is measured with best accuracy).
For reference measurements you should simply meet the general rules and official recommendations (60 dBuV, directional antenna, no interference sources) otherwise it is preferable to verify the measurement from another location, depending on the purpose.
I was measuring a station 50km away, just a 1kw station but 4-bay antenna and almost 500m AMSL. Five bars it was!
It is entirely ok, the distance is not a problem if you use directional antenna to suppress multipath and if there's no extra strong signal in your neighborhood.