Fred Douglis

Fred Douglis
Peraton Labs · Cybersecurity

PhD

About

159
Publications
24,035
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,569
Citations
Citations since 2016
15 Research Items
1188 Citations
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
Additional affiliations
June 2009 - present
EMC Corporation
Position
  • Consultant Software Engineer
April 2002 - February 2009
IBM
Position
  • Research Staff Member
September 1996 - January 1997
Princeton University
Position
  • Professor

Publications

Publications (159)
Article
Full-text available
The DARPA FastNICs program targets orders of magnitude improvement in applications such as deep learning training by making radical improvements to network performance: while raw bandwidth has grown dramatically, the fundamental roadblock to application performance has been in delivering that data to the application. FLEET provides a primarily off-...
Article
Full-text available
In this article, we put forward the substantial challenges in cyber resilience in the domain of autonomous systems and outline foundational solutions to address these challenges. These solutions fall into two broad themes: resilience-by-design and resilience-by-reaction. We use several application drivers from autonomous systems --- some in the nea...
Article
The articles in this special section focus on distributed ledger technologies (DLT). DLT, of which blockchain is a popular example, are increasingly becoming a popular means to maintain transactional integrity and achieve consensus among competing parties in many modern distributed data exchanges. Indeed, a Gartner survey estimates that by 2020, DL...
Article
Enterprise KV stores are often not well suited for HPC applications, and thus cumbersome end-to-end KV design customization is required to meet the needs of modern HPC applications. To this end, in this article we present bespoKV , an adaptive, extensible, and scale-out KV store framework. bespoKV decouples the KV store design into the control...
Preprint
A set of about 80 researchers, practitioners, and federal agency program managers participated in the NSF-sponsored Grand Challenges in Resilience Workshop held on Purdue campus on March 19-21, 2019. The workshop was divided into three themes: resilience in cyber, cyber-physical, and socio-technical systems. About 30 attendees in all participated i...
Article
Full-text available
The articles in this special section focus on microservices and containers. These services allow an application to be comprised of many independently operating and scalable components, have become a common service paradigm. The ability to construct an application by provisioning these interoperating components has various advantages, including the...
Article
Full-text available
Classic caching algorithms leverage recency, access count, and/or other properties of cached blocks at per-block granularity. However, for media such as flash which have performance and wear penalties for small overwrites, implementing cache policies at a larger granularity is beneficial. Recent research has focused on buffering small blocks and wr...
Conference Paper
Full-text available
Most storage systems that write in a log-structured manner need a mechanism for garbage collection (GC), reclaiming and consolidating space by identifying unused areas on disk. In a deduplicating storage system, GC is complicated by the possibility of numerous references to the same underlying data. We describe two variants of garbage collection in...
Article
Disk-based backup systems use data deduplication to replace redundant data chunks with references. These systems initially supported workloads from their tape-based predecessors, but changes to hardware and applications have forced them to adapt in interesting and challenging ways.
Conference Paper
Deduplication is nearly ubiquitous in backup environments, common for data distribution, and increasingly important for wide-area networking. Each of these three domains handles deduplication in a separate manner, but integrating them into an end-to-end deduplication paradigm would enable efficiencies and simplifications that will improve performan...
Article
Full-text available
Data deduplication, an efficient approach to data reduction, has gained increasing attention and popularity in large-scale storage systems due to the explosive growth of digital data. It eliminates redundant data at the file or subfile level and identifies duplicate content by its cryptographically secure hash signature (i.e., collision-resistant f...
Article
Discusses the current state of the journal, reports on current and future areas of exploration and research, and presents new editors.
Conference Paper
Full-text available
NAND-based solid-state (flash) drives are known for providing better performance than magnetic disk drives, but they have limits on endurance, the number of times data can be erased and overwritten. Furthermore, the unit of erasure can be many times larger than the basic unit of I/O; this leads to complexity with respect to consolidating live data...
Article
This special issue of Internet Computing surveys topics and challenges surrounding cloud storage. These include software-defined object storage in OpenStack Swift, shared services for file synchronization, convergent dispersal for secure cloud storage with high storage efficiency, quality of service for tiered storage in the cloud, and an applicati...
Conference Paper
Full-text available
Classic caching algorithms leverage recency, access count, and/or other properties of cached blocks at per-block granularity. However, for media such as flash which have performance and wear penalties for small overwrites, implementing cache policies at a larger granularity is beneficial. Recent research has focused on buffering small blocks and wr...
Article
Modern storage systems orchestrate a group of disks to achieve their performance and reliability goals. Even though such systems are designed to withstand the failure of individual disks, failure of multiple disks poses a unique set of challenges. We empirically investigate disk failure data from a large number of production systems, specifically f...
Conference Paper
Full-text available
Deduplication is widely used to improve space efficiency in storage systems. While much attention has been paid to making the process of deduplication fast and scalable, the effectiveness of deduplication can vary dramatically depending on the data stored. We show that many file formats suffer from a fundamental design property that is incompatible...
Patent
Full-text available
Methods, systems, and products generate web pages using elidable links to additional content. When a link is selected in a web page, elision is used to automatically remove a URL and its associated content from the web page, thus reducing previously visited material. When a user selects an elision-enabled link, the link is not displayed during subs...
Conference Paper
Full-text available
Modern storage systems orchestrate a group of disks to achieve their performance and reliability goals. Even though such systems are designed to withstand the failure of individual disks, failure of multiple disks poses a unique set of challenges. We empirically investigate disk failure data from a large number of production systems, specifically f...
Patent
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with in response to receiving a first request for caching a first file extent associated with a first file in a cache memory device, generating a first fingerprint based on content of the first file extent. Then the method continues wi...
Patent
A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base...
Patent
Full-text available
A mechanism is provided that aggregates data in a way that permits data to be deleted efficiently, while minimizing the overhead necessary to support bulk deletion of data. A request is received for automatic deletion of segments in a container and a waterline is determined for the container. A determination is made if at least one segment in the c...
Patent
Full-text available
A cooperative data stream processing system is provided that utilizes a plurality of independent, autonomous and possibly heterogeneous sites in a cooperative arrangement to process user-defined job requests over dynamic, continuous streams of data. A method is provided to organize the distributed sites into a plurality of virtual organizations tha...
Patent
Techniques for detecting unwanted data are described herein. In one embodiment, a request is received for storing a data object in a storage system from a client over a network, where the request includes first representative data representing the data object without including actual content of the data object. It is detected whether the data objec...
Article
Full-text available
This issue of Internet Computing surveys issues surrounding Web-scale datacenters, particularly in the areas of cloud provisioning as well as networking optimization and configuration. They include workload isolation, recovery from transient server availability, network configuration, virtual networking, and content distribution.
Conference Paper
Full-text available
The term sequential I/O is widely used in systems research with the intuitive understanding that it means consecutive access. From a survey of the literature, though, this intuitive understanding has translated into numerous, inconsistent definitions. Since sequential I/O is such a fundamental concept in systems research, we believe that a sequenti...
Patent
Techniques for searching data in a storage system are described herein. In one embodiment, in response to a request for searching target data in a storage system, first representative data for the target data being searched are generated by applying a predetermined algorithm to at least a portion of the target data. The first representative data ar...
Patent
Techniques for replicating data chunks in a storage system are described herein. In one embodiment, in response to a request for replicating data chunks of a source storage system having a first average chunk size to a target storage system having a second average chunk size, a new chunk size is determined based on metadata of the data chunks in vi...
Patent
Full-text available
A cooperative data stream processing system is provided that utilizes a plurality of independent, autonomous and potentially heterogeneous sites in a cooperative arrangement to process user-defined inquiries over dynamic, continuous streams of data. The system derives jobs from the inquiries and these jobs are executed on the various distributed si...
Patent
A storage system includes a plurality of data vats, and a processor including an optimizing unit that optimizes a value of data stored in the storage system. The optimizing unit optimizes the value by computing and implementing an optimal decision for allocating new data to a first data vat of the plurality of data vats, moving existing data from a...
Patent
The method and apparatus collect file recipes from deduplicated data storage systems, the file recipes consist of a list of fingerprints of data chunks of a file. Detailed meta-data for each unique data chunk is also collected. In an offline process, research and analysis can be performed on either the meta-data itself or on a reconstruction of a f...
Patent
Techniques are disclosed for optimizing schedules used in implementing plans for performing tasks in data processing systems. For example, an automated method of negotiating for resources in a data processing system, wherein the data processing system comprises multiple sites, comprises a negotiation management component of a computer system at a g...
Conference Paper
Full-text available
We propose Migratory Compression (MC), a coarse-grained data transformation, to improve the effectiveness of traditional compressors in modern storage systems. In MC, similar data chunks are re-located together, to improve compression factors. After decompression, migrated chunks return to their previous locations. We evaluate the compression effec...
Patent
Techniques for evaluating deduplication effectiveness of data chunks in a storage system are described herein. In one embodiment, metadata of first data chunks associated with a deduplicated storage system is examined, where the first data chunks have been partitioned according to a first chunk size. A second chunk size is calculated based on the e...
Patent
Full-text available
Backup requirements of a new client and one or more existing clients stored in a first set of first storage units are determined. Data of the one or more existing clients is then migrated from the first set of storage units to a first storage unit that is selected from a second set of storage units based on a cost calculated using a cost function b...
Patent
A request for allocating a storage unit of a storage system is received to back up data of one or more clients. The storage system includes multiple storage units and each storage unit storing data that is deduplicated within each storage unit. In response to the request, one or more of the storage units are selected based on an amount of deduplica...
Patent
Full-text available
A system and method for composing a stream servicing environment which considers all stakeholders includes identifying service component requirements needed for processing a data stream, and determining available service elements for processing the stream. Feasible service environments are constructed based upon the available service elements and t...
Patent
Full-text available
A cooperative data stream processing system utilizing a plurality of independent, autonomous and heterogeneous sites in a cooperative arrangement process user-defined job requests over dynamic, continuous streams of data. A distributed plan is created that identifies the processing elements that constitute a job that is derived from user-defined in...
Patent
Full-text available
An information processing system comprises first and second levels of a storage hierarchy, wherein accessing information in the first level consumes more energy than accessing information in the second level; and a processor for writing information to the second level of storage based on energy-conserving criteria. The energy-conserving criteria co...
Article
Full-text available
With virtualization, a resource's consumers are provided with a virtual rather than physical version of that resource. This layer of indirection has helped address myriad problems, including efficiency, security, high availability, elasticity, fault containment, mobility, and scalability. This special issue of IEEE Internet Computing surveys some e...
Conference Paper
Full-text available
Data-protection class workloads, including backup and long-term retention of data, have seen a strong industry shift from tape-based platforms to disk-based systems. But the latter are traditionally designed to serve as primary storage and there has been little published analysis of the characteristics of backup workloads as they relate to the desi...
Conference Paper
Full-text available
When backing up a large number of computer systems to many different storage devices, an administrator has to balance the workload to ensure the successful completion of all backups within a particular period of time. When these devices were magnetic tapes, this assignment was trivial: find an idle tape drive, write what fits on a tape, and replace...
Conference Paper
Full-text available
As data have been growing rapidly in data centers, deduplication storage systems continuously face challenges in providing the corresponding throughputs and capacities necessary to move backup data within backup and recovery window times. One approach is to build a cluster deduplication storage system with multiple deduplication storage system node...
Article
Full-text available
Fred Douglis writes his farewell column as editor in chief. He discusses how social networking sites can add to the problem of information overload.
Article
Full-text available
Editor in Chief Fred Douglis discusses the pros and cons of removing a social network presence.
Article
Full-text available
Editor in Chief Fred Douglis takes us through a witty, yet informative, summary of smart phones and their applications, pricing, and privacy issues.
Article
Full-text available
Editor in Chief Fred Douglis briefly discusses two presentations given at IC's recent editorial board meeting. He then describes situations in which too much data is a good thing, and when it's a bad thing.
Article
Full-text available
Editor in Chief Fred Douglis discusses the online media dilemma of free versus paid content and services.
Article
Full-text available
The paper discusses the Internet as a social network. Networking is naturally a central focus of the Internet computing space. Initially, this term referred to connecting computers together, now it refers just as often to how people are interconnected - that is, "social networks." The paper discuss information overload in the context of three socia...
Article
Full-text available
Some banks now offer an added level of security, requiring a temporary passcode obtained via SMS on a mobile phone or a SecurID dongle to log in. There's even the possibility to use that bank as a springboard to access other accounts without providing the password. In theory, this might offer enough security to let a remote traveler do remote banki...
Article
Full-text available
Twitter feeds range from truly useful to banal and currently represent one of the fastest growing social networking technologies. EIC Fred Douglis looks at whether Twitter's success will continue, or whether it will ultimately be overtaken by a better, more selective technology.
Conference Paper
IT services need an automatic and flexible ability to react to dynamic changes in their environment. Managing change effectively and reducing the negative effects of day-today operations has become one of the most important tasks in IT service management, which require hiring highly skilled IT professionals with correspondingly high labor costs. Th...
Article
Full-text available
Cloud computing's premise is to lower computing costs by providing computational resources in a shared infrastructure. This could be a godsend to smaller organizations, but interoperability and security challenges still exist for this emerging technology.
Article
Full-text available
Email security is a significant concern, so why don't more people encrypt their email? EIC Fred Douglis takes a stab at using a new privacy guard with Gmail, and describes his experience.
Article
Full-text available
A few weeks ago, a new Google Labs feature for their Gmail system made some news. They called it "mail goggles" and the tag line was "Stop sending mail you later regret". The announcement said that one of their engineers wanted to keep from sending pleas to his ex-girlfriend when he wasn't thinking straight, so he came up with a simple test. I chec...
Article
Full-text available
Data stream management is a great topic, and is the subject of this issue's special theme. EIC Fred Douglis reflects on some observations he's made in the course of his work in the field.
Article
When software ideas are ahead of their time, two problems can arise. Either better hardware is needed for the software to be feasible, a new technology isn't adoptable because there's no current need for it. EIC Fred Douglis looks at examples of both problems in this installment.
Article
Full-text available
In this issue, EIC Fred Douglis discusses IC's scope and content, and queries readers on what areas they'd like to see the magazine cover.
Conference Paper
I provide several lessons learned from running a number of conference program committees over the past decade, as well as some additional thoughts on conference organization and the reviewing process. Topics include how to deal with poor or absent reviewers, ...
Article
Full-text available
The author relates stories about security and identity issues and how they relate to the Internet.
Article
Full-text available
Many will agree that email is badly broken when it comes to protecting users from spam, especially spoofing. With the growing prevalence of social networking sites, the increasing visibility into end users’ social connections will make it easy for scammers to figure out how to better forge return addresses, among other problems. Those who create so...
Conference Paper
Full-text available
I provide several lessons learned from running a number of conference program committees over the past decade, as well as some additional thoughts on conference organization and the reviewing process. Topics include how to deal with poor or absent reviewers, inbreeding among PC members, starting a new conference, and several other issues. 1. Backgr...
Conference Paper
In cooperating systems such as grids (4) and collaborative stream- ing analysis (2), autonomous sites can establish "agreements" to arrange access to remote resources for a period of time (1). The determination of which resources to reserve to accomplish a task need not be known a priori, because there exist multiple plans for accomplishing the sam...
Article
Full-text available
In Part 1, the author reflected on the trouble IC had when encountering a submission that was substantially similar to another submission received elsewhere - a definite no-no. The author described some tools for detecting self-plagiarism and considered a possible way to get authors to make submitted works available alongside those already publishe...
Conference Paper
There are currently a number of streaming data analysis systems in research or commercial operation. These systems are generally large-scale distributed systems, but each system operates in isolation, under the control of one administrative authority. We are developing middleware that permits autonomous or semi-autonomous streaming analysis systems...
Conference Paper
Full-text available
There are currently a number of streaming data analysis systems in research or commercial operation. These systems are generally large-scale distributed systems, but each system operates in isolation, under the control of one administrative authority. We are developing middleware that permits autonomous or semi-autonomous streaming analysis systems...
Article
Not all authors appreciate the rules for overlapping manuscript submissions, but we might be able to improve the process for the future.
Chapter
Full-text available
Mobile computers such as notebooks, subnotebooks, and palmtops require low weight, low power consumption, and good interactive performance. These requirements impose many challenges on architectures and operating systems. This chapter investigates three alternative storage devices for mobile computers: magnetic hard disks, flash memory disk emulato...
Article
Full-text available
When is advertising good, and when is it bribery. This installment of All Systems Go explores both the good and the bad in online advertising practices.
Article
Full-text available
When computer scientist Jim Gray disappeared in his sailboat in January, several in the community began trying to locate him through computing technology.
Conference Paper
We present a failure recovery framework for System S, a large-scale stream data analysis environment. It is intended to support multiple sites, which have their own local administration and goals. However, it is beneficial for these sites to cooperate with each other, especially in the presence of various failures. Our ultimate goal is to support a...
Article
Full-text available
From spam to bad security, Internet pet peeves are common. EIC Fred Douglis shares some of his, and invites readers to share some of their own - as well as any solutions they've come up with.
Article
Full-text available
In his inaugural column as editor-in-chief, the author discusses changes to the Web. The author look back on how the magazine has evolved along with the Internet The author picked the column heading, All Systems Go, to highlight one of these evolutionary steps
Article
In his inaugural column, EIC Fred Douglis discusses changes to the Web throughout the past decade.
Conference Paper
Full-text available
We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, both incoming data and intermediate results may need to be stored to enable analyses at unknown future times. The quantity of data of potential use would dominate even the largest storage system. Thus, a mechanis...
Article
Full-text available
System S is a large-scale distributed streaming data analysis environment, designed to handle extreme data rates. Multiple System S sites can cooperate to further im- prove the scale, breadth and depth of data analysis. We de- scribe three autonomic features in the operation of such a cooperative stream processing environment: interoperation models...