
Duncan Temple Lang- PhD
- Professor at University of California, Davis
Duncan Temple Lang
- PhD
- Professor at University of California, Davis
About
60
Publications
15,643
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,217
Citations
Introduction
Current institution
Publications
Publications (60)
The Journal of Statistics and Data Science Education special issue on “Computing in the Statistics and Data Science Curriculum” features a set of papers that provide a mosaic of curricular innovations and approaches that embrace computing. As we reviewed the papers we felt that this collection would benefit from the perspective of the authors of th...
The Explorations in Statistics Research workshop is a one-week NSF-funded
summer program that introduces undergraduate students to current research
problems in applied statistics. The goal of the workshop is to expose students
to exciting, modern applied statistical research and practice, with the
ultimate aim of interesting them in seeking more tr...
The Explorations in Statistics Research workshop is a one-week NSF-funded summer program that introduces undergraduate students to current research problems in applied statistics. The goal of the workshop is to expose students to exciting, modern applied statistical research and practice, with the ultimate aim of interesting them in seeking more tr...
We describe NIMBLE, a system for programming statistical algorithms within R
for general model structures. NIMBLE is designed to meet three challenges:
flexible model specification, a language for programming algorithms that can
use different models, and a balance between high-level programmability and
execution efficiency. For model specification,...
Web services are an important development in making data available to clients in a programmatic manner. Rather than displaying results for humans to view on the Web, we can make requests to a Web service in order to get the resulting data directly and then consume it in our applications immediately. An increasingly common architecture for Web servi...
In this chapter, we focus on general, all-purpose infrastructure we can use in R for accessing networks and the Web. The RCurl package provides both high-level and intermediate-level functionality within R that allow us to make rich and flexible requests to a large variety of different servers, including Web servers and applications that speak diff...
In Chapter 5, we saw how to scrape data from HTML pages. In this chapter, we focus on a variation of this where we get the Web page containing the data we want by submitting an HTML form. Rather than using a Web browser, we submit the form from R, providing inputs to parameterize the request from data in R. We use functionality in the RCurl package...
In this chapter, we explore the possibilities for data exchange offered by the Office Open XML (OOXML) standard. Many of the office suites have adopted OOXML for their spreadsheets, word processing, and presentation tools. We demonstrate the kinds of functionality that can be built using the tools in the XML package to interface with XML-based spre...
In this chapter, we explore approaches to parsing XML content within R and extracting content from the various types of elements in the XML document. The primary approach is to parse an XML document into a hierarchical tree object. We show how the tree representation of an XML document (described in Chapter 2) can be treated as a list in R, which m...
This chapter looks at another mechanism used to access Web services—SOAP—which is similar to XML-RPC and to some extent REST. SOAP is more structured and general than XML-RPC and REST. It is also ostensibly more complex and more difficult to understand because of all of the details about how to format and send requests as XML documents. However, mu...
Some REST APIs require authentication to access data and services, and many are starting to require use of a more general mechanism named OAuth. This avoids logins and passwords and allows secure three-party interactions between the owner of the data, the application accessing it, and the host of the data and Web service. In this chapter, we descri...
In this chapter, we compare different approaches to parsing XML and HTML documents and extracting data from these documents into R. We illustrate these with comprehensive, real-world examples that illustrate XPath and R functions for processing XML documents. We also introduce event-driven parsing where we use a collection of R functions to respond...
This chapter explores some ideas about how we can think of documents in several different ways from the traditional write–format–publish cycle. Instead, we think about documents in some of the same ways we do about software. We want to be able to programmatically verify that a document is “correct,” i.e., unit testing for documents. We want modular...
In this chapter, we explore approaches to creating XML content within R. The primary approach is to create nodes and trees that are identical in nature to those returned by the XML parser via the xmlParse() and htmlParse() functions. Rather than generating an entire document in one step, we use functions to build individual nodes, add child nodes a...
In this chapter, we focus on XPath, a domain-specific language that we can use from within R (amongst others) to query sets of nodes in an XML tree by patterns within nodes. XPath is quite simple but very powerful. Similar to a file hierarchy, it allows us to identify nodes of interest by specifying paths through the tree, based on node names, node...
This chapter aims to give a reasonably comprehensive definition and motivation for the various aspects of the generic XML language and also to illustrate these aspects with some existing XML dialects or vocabularies. We describe elements, attributes, child elements, and the hierarchical structure of XML. We talk about “well-formedness” of an XML do...
This chapter discusses the functionality to read, process, and use XML schema within R. The primary goal is to be able to programmatically generate R classes and code to work with XML documents. Recall from Section 2.7 that an XML schema describes the structure, content and data types for XML documents in a particular XML vocabulary, grammar or for...
In this chapter, we explore two innovative tools for interactive visualization: Google Earth and Google Maps. Google Earth renders Keyhole Markup Language (KML) documents for viewing on a virtual earth browser, and similarly, Google Maps displays KML-formatted data on two-dimensional maps in a Web browser. KML is a grammar of XML for marking up spa...
In this chapter, we look at one of the early approaches used for Web services: XML-RPC. This is a form of remote procedure call, i.e., invoking functions that reside in a different process or machine. This is basically calling a “foreign” function. Like REST, this uses HTTP to communicate the requests and responses. However, XML-RPC uses XML to rep...
This chapter explores a powerful two-dimensional graphics format named SVG (scalable vector graphics), which is becoming widely used to represent object-based, vector (nonraster) graphics both on the Web and elsewhere. In addition to displaying scalable plots, the XML-based SVG format provides functionality for interactivity, animation, and a numbe...
In this chapter, we describe the JSON format and then the fromJSON() and toJSON() functions to both read and create JSON content. Because JSON is so simple and there are few supporting technologies for JSON, there are not many details that we need to examine before being able to work with JSON effectively. As a result, we spend a significant part o...
I describe an approach to compiling common idioms in R code directly to
native machine code and illustrate it with several examples. Not only can this
yield significant performance gains, but it allows us to use new approaches to
computing in R. Importantly, the compilation requires no changes to R itself,
but is done entirely via R packages. This...
Background/Question/Methods
Bayesian analysis in ecology has exploded in popularity, largely facilitated by the availability of the BUGS language for declaring models. WinBUGS, OpenBUGS and JAGS each implement BUGS and provide automated MCMC algorithms that can lead to long run times for complicated models. However, MCMC research has led to many i...
The goal of this chapter is to provide a rapid introduction to a few high-level functions available to R users for parsing XML and JSON content. In many cases, these functions (read- HTMLTable(), xmlToList(), xmlToDataFrame(), and fromJSON()) are all that you will need to read XML- or JSON- formatted data directly into an R list or dataframe. One o...
Data Formats XML and JSON.- Web Technologies, Getting Data from the Web.- General XML Application Areas.- Bibliography.- General Index.- R Function and Parameter Index.- R Package Index.- R Class Index.- Colophon.
The TreeBASE portal is an important and rapidly growing repository of phylogenetic data. The R statistical environment has also become a primary tool for applied phylogenetic analyses across a range of questions, from comparative evolution to community ecology to conservation planning.
We have developed treebase , an open‐source software package (f...
This article introduces a package that provides interactive and programmatic access to the FishBase repository. This package allows interaction with data on over 30 000 fish species in the rich statistical computing environment, R. This direct, scriptable interface to FishBase data enables better discovery and integration essential for large-scale...
The TreeBASE portal is an important and rapidly growing repository of phylogenetic data. The R statistical environment has also become a primary tool for applied phylogenetic analyses across a range of questions, from comparative evolution to community ecology to conservation planning.We have developed treebase, an open‐source software package (fre...
GGobi is an open source visualization program for exploring
high-dimensional data. It provides highly dynamic and interactive
graphics such as tours, as well as familiar graphics such as the
scatterplot, barchart and parallel coordinates plots. Plots are
interactive and linked with brushing and identification.
Graphical user interfaces (GUIs) are growing in popularity as a complement or alternative to the traditional command line interfaces to R. RGtk2 is an R package for creating GUIs in R. The package provides programmatic access to GTK+ 2.0, an open-source GUI toolkit written in C. To construct a GUI, the R programmer calls RGtk2 functions that map to...
The nature of statistics is changing significantly with many opportunities to broaden the discipline and its impact on science and policy. To realize this potential, our curricula and educational culture must change. While there are opportunities for significant change in many dimensions, we focus more narrowly on computing and call for computing c...
Dynamic documents that combine text and code, which is evaluated to dynamically create content when the document is “rendered,”
for example, Sweave, are a large step forward in reproducible data analysis and computation. However, to capture the research
process, we need richer paradigms and infrastructure. The process includes all the investigation...
This paper describes an R package that allows one to read and generate a description of C and C++ source code elements. These
descriptions are meta-data about that C/C++ code and can be used for several different purposes. The most obvious application
is to programmatically generate bindings/wrappers which are R functions and C routines that allow...
The R computing environment has become an important part of the statistical community and fostered the development of over
a thousand add-on packages, many representing state-of-the-art research in statistical methodology. Although it is relatively
easy to develop functionality on top of the system, it is very difficult for developers to directly e...
Recently, there has been a lot of discussion about what a statistics curriculum should contain, and which elements are important for different types of students. For the most part, attention has been understandably focused on the introductory statistics course. This course services thousands of students who take only one statistics course. In the U...
We present the cacher and CodeDepends packages for R, which provide tools for (1) caching and analyzing the code for statistical
analyses and (2) distributing these analyses to others in an efficient manner over the Web. The cacher package takes objects
created by evaluating R expressions and stores them in key-value databases. These databases of c...
The application of cutting-edge statistical methodology is limited by the capabilities of the systems in which it is implemented. In particular, the limitations of R mean that applications developed there do not scale to the larger problems of interest in practice. We identify some of the limitations of the computational model of the R language tha...
Significant efforts have been made to overhaul the "introductory" statistics courses by placing greater emphasis on statistical thinking and literacy and less on rules, methods and procedures. We advocate broadening and increasing this effort to "all levels" of students and, importantly, using topical, interesting, substantive problems that come fr...
For various reasons, it is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, etc. with the documents that describe and rely on them. This integration allows readers to both verify and adapt the statements in the documents. Authors can easily reproduce them in the fut...
The Web is clearly an important source of data for statisticians as is emerging as vital component in distributed computing via Web services. HTTP is the primary mechanism that underlies the Web and data transfer. As such, it is important for programming lan-guages to have tools for HTTP requests and other protocols. We describe the RCurl package t...
Graphs have long been of interest in telecommunications and social network analysis, and they are now receiving increasing attention from statisticians working in other areas, particularly in biostatistics. Most of the visualization software available for working with graphs has come from outside statistics and has not included the kind of interact...
Graphs have long been of interest in telecommunications and social network analysis, and they are now receiving increasing attention from statisticians working in other areas, particularly in biostatistics. Most of the visualization software available for working with graphs has come from outside statistics and has not included the kind of interact...
This paper defines the metrics to characterize the performance of ad hoc networks based on timescales for information flow, power consumption and interference. The statistical distribution of timescales has not been previously considered. Yet, it is important for understanding the feasibility of communicating over such networks, for comparing diffe...
GGobi is a direct descendent of a data visualization system called XGobi that has been around since the early 1990's. GGobi's new features include multiple plotting windows, a color lookup table manager, and an XML (Extensible Markup Language) file format for data. Perhaps the biggest advance is that GGobi can be easily ex- tended, either by being...
This paper describes a decade's worth of evolution of integrating software to support exploratory spatial data analysis (ESDA) where there are multiple measured attributes. The multivariate graphics tools we use are XGobi, and more recently, GGobi. The paper is divided into two parts. In the first part, we review early experiments in software linki...
Interfaces to other languages such as C and Fortran have provided S users access to a vast collection of existing software. Adding general interfaces to other languages allows us to access the software developed for those languages.
We describe a mechanism for generating information about C code. This can be used by environments such as S, Java, Perl, Python, etc. to dynamically provide access to arbitrary C code from those languages. This is similar to the concepts of reflectance in Java and other languages. Rather than providing this meta-information directly in the language...
Statistical environments such as S, R, XLisp-Stat, and others have had a dramatic effect on the way we, statistics practitioners, think about data and statistical methodology. However, the possibilities and challenges introduced by recent technological developments and the general ways we use computing conflict with the computational model of these...
Statistical computing is part of a more general process, which can be called computing with data. Besides traditional statistical analysis, this involves acquiring, organizing, and visualizing data, often in large, structured datasets organized in database management systems and used for purposes beyond analysis. An important challenge for statisti...
GGobi is a direct descendant of XGobi, with multiple plotting windows, a color lookup table manager, an XML (Extended Markup Language) file format for data, and other changes. Perhaps the biggest change is that GGobi can be embedded in other software and controlled using an API (Application Programming Interface). This design has been developed and...
Karen Kafadar is the 1998 Chair of the Statistical Computing Sec- tion. In her column she considers the Section Charter and the oppor- tunities it represents.
Web services are most effective on statically typed objects exposed in a well-developed infrastructure. This document summarizes our approach to exposing R objects and functionality in a Java class hierarchy of stati-cally typed methods. The approach is to use R's formal (S4) class system to strongly type R functions using TypeInfo. We then convert...
We present an approach that allows methods and data from certain languages to be transparently made available as CORBA servers. Also, methods in remote servers can be transparently invoked by local calls in these languages. The approach relies the the language supporting self description of an object, or reflectance, which provides access to the de...
The Slcc package provides an S language interface to a C code parser. It allows a user to get S level descriptions of certain elements of C code. This essentially provides a source-level reflectance mechanism for C within S. This can be used, for example, to automatically • create export tables of routines to be accessed from S (Python, etc.), • ge...