Ryan Wong's research while affiliated with University of Chicago and other places

Publications (2)

Conference Paper
Full-text available
We introduce Xtract, an automated and scalable system for bulk metadata extraction from large, distributed research data repositories. Xtract orchestrates the application of metadata extractors to groups of files, determining which extractors to apply to each file and, for each extractor and file, where to execute. A hybrid computing model, built o...
Conference Paper
Full-text available
The use and reuse of scientific data is ultimately dependent on the ability to understand what those data represent, how they were captured, and how they can be used. In many ways, data are only as useful as the metadata available to describe them. Unfortunately, due to growing data volumes, large and distributed collaborations, and a desire to sto...

Citations

... Several frameworks have been implemented on top of funcX to create workflows for different scientific use cases. For instance, Xtract [71] uses funcX to enable workflow compositions for distributed bulk metadata extraction. Globus Automate [78] uses funcX to run arbitrary computations as part of automated and event-based workflows, it uses funcX's APIs to automatically monitor the status of a funcX function and trigger the next step when it completes. ...
... To the best of our knowledge, no prior system prioritizes extractors based on the expected value of metadata. While this work strictly focuses on designing an FTI-based extractor scheduler for our system Xtract, prior work illuminates the system design [18,3] and extractor library [19]. Table 1: Taxonomy of metadata extraction systems. ...