The Croatian language is a pitch-accent language, in which the tone contour realized in the stressed syllable carries the lexical information. Therefore, in some cases, a different lexical accent gives the word a different meaning. In such cases, the ambiguity of the word in written texts, where accents are not usually marked, can be solved by determining the appropriate accent. There are also
... [Show full abstract] cases when various basic and derived forms of words have different meanings, different morphosyntactic descriptions (MSDs), and possibly different accents. When words have the same written forms but different meanings, they are called homograms. In order to resolve the ambiguity of homograms, we created a lexicon of homograms that is comprised of all Croatian nouns of different gender, which have the same written forms (if accents are not marked) but different meanings, MSDs, and possibly different accents. This lexicon consists of 19,366 entries and 3,460 unique homograms. Each entry in the lexicon comprises the homogram (unaccented word), the accented word, the corresponding MSD, and the accented lemma. The obtained lexicon enables us to identify and disambiguate homograms within the corpus efficiently and accurately. We also evaluated and analyzed the performance of machine translation (MT) systems for the Croatian-English language pair with a special emphasis on homogram translation. We confirmed that the disambiguation of homograms can improve the performance of MT systems in avoiding major translation mistakes related to assigning the wrong meaning to homograms. © 2018, International Association of Computer Science and Information Technology.