How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
I am currently a postdoctoral researcher at INRIA Paris in the MiMove team. My research focuses on machine learning and particularly lies at the intersection of data stream mining algorithms and summarization techniques (e.g., sketches and dimension reduction). I obtained my Ph.D. degree in Computer Science from Télécom Paris - Institut Polytechnique de Paris in the Data, Intelligence and Graphs (DIG) group.
I am using a sketch technique to improve the memory of a standard classifier (naive Bayes) with data streams. The sketch technique is composed of a sketch table (hash table) means the true values can be over-estimated due to collisions. For the first step (learning phase), data are stored in the sketch table. And during the test phase, i.e., to classify data, we take the information of data from the sketch table (that can be over-estimated as I said). Thereafter, when we are going to evaluate the accuracy of this classifier (retrieving the values from the hash table), we are going to use estimated counts and not true counts as the standard classifier, naive Bayes (because of collisions) Is it normal to obtain with the newly created classifier (that combines naive Bayes with the sketch) a better accuracy than the standard one while using approximations?
In which case this could happen? Because I am getting a better accuracy (not a big difference) but still weird...
Thank you in advance :)