Many computer science teachers are very concerned about students cheating in their courses. Surveys report that almost three-quarters of high school students admit to cheating within the past year. John Barrie, founder of the plagiarism-detecting Web site Turnitin.com, says that about a third of the papers submitted to the site have significant levels of plagiarism. Many people say that the Internet has made cheating easier and harder to detect and they wonder if the moral fabric of our youth is fraying. In the trenchant analysis below, Bill Murray approaches the teaching-learning system as a game in which students, teachers, and others play various roles. He wonders whether the game itself encourages cheating, and suggests that teachers could restructure the game so that cheating is less rewarding and less likely. --Peter J. Denning
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.