Most state-of-the-art statistical machine trans-lation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maxi-mize a translation quality metric, using held-out test sentences and their corresponding ref-erence translations. However, obtaining refer-ence translations is expensive. ... [Show full abstract] In our earlier work (Madnani et al., 2007), we introduced a new full-sentence paraphrase technique, based on English-to-English decoding with an MT system, and demonstrated that the resulting paraphrases can be used to cut the number of human reference translations needed in half. In this paper, we take the idea a step further, asking how far it is possible to get with just a single good reference translation for each item in the development set. Our analysis suggests that it is necessary to invest in four or more hu-man translations in order to significantly im-prove on a single translation augmented by monolingual paraphrases.