Roger Bilisoly

Roger Bilisoly
  • Ph.D. in statistics
  • Professor at Central Connecticut State University

About

28
Publications
8,955
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
92
Citations
Introduction
I'm interested in applying both machine learning and natural language processing to analyzing one or more texts, sometimes called text analytics. I prefer programming in Python and R.
Current institution
Central Connecticut State University
Current position
  • Professor
Additional affiliations
August 2001 - May 2004
Sandia National Laboratories
Position
  • Senior Member of Technical Staff
Education
September 1990 - June 1998
The Ohio State University
Field of study
  • Statistics
September 1986 - June 1988
Drexel University
Field of study
  • Physics
August 1984 - May 1986
Purdue University West Lafayette
Field of study
  • Mathematics

Publications

Publications (28)
Presentation
Full-text available
This contains slides that introduce how to use Python to analyze texts, which is called 'text analytics' or 'text mining.'
Presentation
Full-text available
This contains slides that introduce how to use Python to analyze texts, which is called 'text analytics' or 'text mining.'
Presentation
Full-text available
This contains slides that introduce how to use Python to analyze texts, which is called 'text analytics' or 'text mining.' Regular expressions are emphasized here.
Presentation
Full-text available
This contains slides that introduce how to use Python to analyze texts, which is called 'text analytics' or 'text mining.' Regular expressions are emphasized here.
Presentation
Full-text available
This contains slides that introduce how to use Python to analyze texts, which is called 'text analytics' or 'text mining.'
Article
Full-text available
Although squaring integers is deterministic, squares modulo a prime, $p$, appear to be random. First, because they are all generated by the multiplicative linear congruential equation, $x_{i+1} = g^2 x_i \mod p$, where $x_0 = 1$ and $g$ is any primitive root of $p$, a pseudorandom number heuristic suggests that they are, in fact, unpredictable. Mor...
Article
Full-text available
Walter Skeat published his critical edition of William Langland's 14th century alliterative poem, Piers Plowman, in 1886. In preparation for this he located forty-five manuscripts, and to compare dialects, he published excerpts from each of these. This paper does three statistical analyses using these excerpts, each of which mimics a task he did in...
Conference Paper
Full-text available
In preparation of his edition of the 14th century alliterative poem Piers Plowman, the 19th century philologist, Walter Skeat, was able to find forty-five manuscripts. These were used in two different ways. First, he studied these with respect to their dialects, which led to his identification of three versions of the poem, denoted as texts A, B, a...
Article
Full-text available
Interest in the mathematical structure of poetry dates back to at least the 19th century: after retiring from his mathematics position, J. J. Sylvester wrote a book on prosody called $\textit{The Laws of Verse}$. Today there is interest in the computer analysis of poems, and this paper discusses how a statistical approach can be applied to this tas...
Article
Full-text available
Although textbook publishers offer course management systems, they do so to promote brand loyalty, and while an open source tool such as WeBWorK is promising, it requires administrative and IT buy-in. So supported in part by a College Access Challenge Grant from the Department of Education, we collaborated with other instructors to create online ho...
Article
Full-text available
Researchers have developed ways to generalize the mean and variance to situations in which a data metric is available. We apply the tools developed in Pennec (2006) to categorical data, and show the generality of this approach by considering two quite different applications. First, spelling variability in Middle English is quantified. Second, varia...
Article
Full-text available
Markov chains are an important example for a course on stochastic processes because simple board games can be used to illustrate the fundamental concepts. For example, a looping board game (like Monopoly) consists of all recurrent states, and a game where players win by reaching a final square (like Chutes and Ladders) consists of all transient sta...
Article
Full-text available
Statistics pedagogy values using a variety of examples. Thanks to text resources on the Web, and since statistical packages have the ability to analyze string data, it is now easy to use language-based examples in a statistics class. Three such examples are discussed here. First, many types of wordplay (e.g., crosswords and hangman) involve finding...
Article
In many clinical trials and epidemiological studies, comparing the mean count response of an exposed group to a control group is often of interest. This type of data is often over-dispersed with respect to Poisson variation, and previous studies usually compared groups using confidence intervals (CIs) of the difference between the two means. Howeve...
Article
Extra-dispersion (overdispersion or underdispersion) is a common phenomenon in practice when the variance of count data differs from that of a Poisson model. This can arise when the data come from different subpopulations or when the assumption of independence is violated. This paper develops a procedure for testing the equality of the means of sev...
Conference Paper
Full-text available
In this article, we discuss the modeling of count data occurring in biological applications. We then derive asymptotic procedures for the construction of con¯dence limits for the over-dispersion parameter of count data when there is no likelihood available. We also obtain closed-form asymptotic variance formulae for the estimator of the over-disper...
Book
IntroductionScalars, Interpolation, and Context in PerlArrays and Context in PerlWord Lengths in Poe's “The Tell-Tale Heart”Arrays and FunctionsHashesTwo Text ApplicationsComplex Data StructuresReferencesFirst TransitionProblems
Article
92.45 Anasquares: Square anagrams of squares - Volume 92 Issue 524 - Roger Bilisoly
Article
Provides readers with the methods, algorithms, and means to perform text mining tasks. This book is devoted to the fundamentals of text mining using Perl, an open-source programming tool that is freely available via the Internet (www.perl.org). It covers mining ideas from several perspectives--statistics, data mining, linguistics, and information r...
Conference Paper
Full-text available
Edgar Allan Poe wrote seventy short stories in his lifetime, and literary critics have categorized these stories in many ways, e.g., by genres such as horror, detective or proto-science fiction. This paper discusses how a computer can group stories by using families of words related by a theme, e.g., words denoting colors. This approach combines tw...
Chapter
Full-text available
The effect of variable demands at short time scales on the transport of a solute through a water distribution network has not previously been studied. We simulate flow and transport in a small water distribution network using EPANET to explore the effect of variable demand on solute transport across a range of hydraulic time step scales from 1 minu...
Article
Full-text available
Previous work on sample design has been focused on constructing designs for samples taken at point locations. Significantly less work has been done on sample design for data collected along transects. A review of approaches to point and transect sampling design shows that transects can be considered as a sequential set of point samples. Any two sam...
Thesis
Full-text available
I will reconstruct the ocean currents for a region in the northeast Pacific based a combination of (i) pre-existing knowledge of the average properties of the currents in this region; (ii) information obtained from floating instrument platforms that freely move with the currents; and (iii) the equations of fluid motion. The reconstruction will be t...
Article
Soil chemical field data typically do not satisfy the required statistical assumptions, and this renders statistical tests based on normal theory either invalid or not particularly powerful. The objective of this study was to compare the t-test and two nonparametric tests (Wilcoxon signed rank and the Sign test) for a theoretical data set and 3 yr...
Article
Full-text available
We address the issue of how to make decisions about the degree of smoothness demanded of a flexible contour used to model the boundary of a 2D object. We demonstrate the use of a Bayesian approach to set the strength of the smoothness prior for a tomographic reconstruction problem. The Akaike Information Criterion is used to determine whether to al...
Article
Full-text available
As demonstrated by the anthrax attack through the United States mail, people infected by the biological agent itself will give the first indication of a bioterror attack. Thus, a distributed information system that can rapidly and efficiently gather and analyze public health data would aid epidemiologists in detecting and characterizing emerging di...
Article
Full-text available
This note is inspired by Numbo-Carrean, which was introduced in Ross Eckler's Word Recreations [1] in the chapter called "Ten Logotopian Lingos." This lingo uses words with the following property: when each letter is replaced by its letter rank (or alphabetic position number), the resulting number is a perfect square. That is, a is replaced by 1, b...

Network

Cited By