RDF (Resource Description Framework) is seeing rapidly increasing
adoption, for example, in the context of the Linked Open Data (LOD)
movement and diverse life sciences data publishing and integration
projects. This paper discusses how we have adapted OpenLink Virtuoso, a
general purpose RDBMS, for this new type of workload. We discuss
adapting Virtuoso's relational engine for native RDF support with
dedicated data types, bitmap indexing and SQL optimizer techniques. We
further discuss scaling out by running on a cluster of commodity
servers, each with local memory and disk. We look at how this impacts
query planning and execution and how we achieve high parallel
utilization of multiple CPU cores on multiple servers. We present
comparisons with other RDF storage models as well as other approaches to
scaling out on server clusters. We present conclusions and metrics as
well as a number of use cases, from DBpedia to bio informatics and
collaborative web applications.