Ilya Tikhomirov’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Fig. 1. Dual encoder architecture
BLEU scores of popular MT services.
Statistic of essays dataset
Samples of reused text from Essay2 dataset
Recall for each obfuscation type

+1

Cross-Lingual Plagiarism Detection Method
  • Chapter
  • Full-text available

January 2022

·

101 Reads

·

1 Citation

Communications in Computer and Information Science

·

Ilya Tikhomirov

·

In this paper, we describe a method for cross-lingual plagiarism detection for a distant language pair (Russian-English). All documents in a reference collection are split into fragments of fixed size. These fragments are indexed in a special inverted index, which maps words to a bit array. Each bit in the bit array shows whether a ithi_{th} sentence contains this word. This index is used for the retrieval of candidate fragments. We employ bit arrays stored in the index for assessing similarity of query and candidate sentences by lexis. Before doing retrieval, top keywords of a query document are mapped from one language to other with the help of cross-lingual word embeddings. We also train a language-agnostic sentence encoder that helps in comparing sentence pairs that have few or no lexis in common. The combined similarity score of sentence pairs is used by a text alignment algorithm, which tries to find blocks of contiguous and similar sentence pairs. We introduce a dataset for evaluation of this task - automatically translated Paraplag (monolingual dataset for plagiarism detection). The proposed method shows good performance on our dataset in terms of F1. We also evaluate the method on another publicly available dataset, on which our method outperforms previously reported results.KeywordsCross-lingual plagiarism detectionCross-lingual word embeddingsCross-lingual sentence embeddings

Download

Data Driven Detection of Technological Trajectories

July 2021

·

14 Reads

·

1 Citation

Communications in Computer and Information Science

The paper presents a text mining approach to identifying and analyzing technological trajectories. The main problem addressed is the selection of documents related to a particular technology. These documents are needed to detect a trajectory of technology. The approach includes new keyword and keyphrase detection method, word2vec embeddings-based similar document search method and fuzzy logic-based methodology for revealing technology dynamics. USPTO patent database was used for experiments. The database contains more than 4.7 million documents from 1996 to 2020. Self-driving car technology was chosen as an example. The result of the experiment shows that the developed methods are useful for effective searching and analyzing information about given technologies.