PresentationPDF Available

Building efficient NGS analysis pipelines with ViennaNGS

Authors:

Abstract and Figures

ViennaNGS is a Perl distribution for building efficient NGS data analysis pipelines, integrating high-level routines and wrapper functions for common NGS processing tasks. While ViennaNGS is not an established pipeline per se, it provides tools and functionality for the development of custom NGS pipelines in Perl. ViennaNGS comes with a set of utility scripts that serve as reference implementation for most library functions and can readily be applied for specific tasks or integrated as-is into custom pipelines. ViennaNGS covers a broad range of NGS data processing tasks, including functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, quantification and normalization of read count data, identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser and wrapper routines for a set of commonly used NGS command line tools. We will organize a 3 hour workshop for prospective ViennaNGS users, covering the following topics: - ViennaNGS introduction - Abstract representation of feature annotation data within ViennaNGS - Building a custom NGS pipeline setp by step - Mapping statistics - Extraction and manipulation of mapped sequencing data - Computation of normalized coverage profiles - Automatic UCSC genome browser visualization
No caption available
… 
No caption available
… 
Content may be subject to copyright.
Building efficient NGS analysis pipelines
with ViennaNGS
Michael T. Wolfinger
Jörg Fallmann
Florian Eggenhofer
Fabian Amman
30th TBI Winter Seminar
Bled, Slovenia!
!
19 Febuary 2015
ViennaNGS Executive Summary
ViennaNGS is a Perl distribution for building efficient next-generation
sequencing (NGS) data analysis pipelines, integrating high-level
routines and wrapper functions for common NGS processing tasks.
Project started in Summer 2014
Not an established pipeline per se, it provides tools and functionality
for the development of custom NGS pipelines in Perl
Provides modular and reusable code for NGS processing
2
ViennaNGS implements thematically
related functionality in different Perl
modules and classes under the Bio
namespace, partly building on BioPerl
and the Moose object framework.
ViennaNGS Components
3
ViennaNGS Module Overview 1/3
Bio::ViennaNGS::AnnoC!
Lightweight interface for conversion of sequence annotation data
Bio::ViennaNGS::Bam!
High-level manipulation of BAM files
Bio::ViennaNGS::BamStat!
Moose based class for collecting mapping statistics
Bio::ViennaNGS::BamStatSummary!
Interface for processing BamStatSummary objects on multiple BAM files
Bio::ViennaNGS::Util!
Wrapper routines for common third party NGS utils and auxiliary functions
4
ViennaNGS Module Overview 2/3
Bio::ViennaNGS::Expression!
Compute normalized expression based on read counts
Bio::ViennaNGS::Fasta!
Moose wrapper for Bio::DB::Fasta
Bio::ViennaNGS::Bed !
Convenience class for handling genomic interval data in BED format
Bio::ViennaNGS::SpliceJunc!
Identification and characterization of splice junctions
Bio::ViennaNGS::UCSC!
Automatic generation of UCSC Assembly and Track Hubs
5
ViennaNGS Module Overview 3/3
Bio::ViennaNGS::MinimalFeature!
Base class for handling genomic interval data
Bio::ViennaNGS::Feature!
Interface for simple genomic intervals representing BED6 entries
Bio::ViennaNGS::ExtFeature!
Extends BED6 elements
Bio::ViennaNGS::FeatureChain!
Bundles individual Feature objects
Bio::ViennaNGS::FeatureLine!
Abstract representation of transcripts, pools FeatureChain objects
6
Moose In 30 Seconds
use Point;!
use Point3D;
my $pt2D = Point->new(x => 2, # x:2!
y => 4, # y:4!
);!
$pt2D->clear(); # x:0 y:0
my $pt3D = Point3D->new(x => 10, # x:10!
y => 20, # y:20!
z => 30 # z:30!
);!
$pt3D->clear; # x:0 y:0 z:0
7
A postmodern object system for Perl 5 that makes Object Oriented
programming easier, more consistent, and less tedious
package Point;!
use Moose;!
has 'x' => (is => 'rw', isa => 'Int'); !
has 'y' => (is => 'rw', isa => 'Int');
sub clear { !
my $self = shift;!
$self->x(0);!
$self->y(0); !
}
package Point3D;!
use Moose; !
extends 'Point'; !
!
has 'z' => (is => 'rw', isa => 'Int');
after 'clear' => sub {!
my $self = shift;!
$self->z(0);!
};
The BED Annotation Format
8
Window Position
Scale
chr1:
chr1:1,165,129-1,166,810 (1,682 bp)
500 bases araThaTAIR10
1,165,500 1,166,000 1,166,500
AT1G04350.1
chr1 1165164 1166768 AT1G04350.1 0 + 1165295 1166538 0 3 637,322,485, 0,708,1119,
Generic Feature Annoation 1/2
Bio::ViennaNGS::MinimalFeature
has ‘chromosome’ => (isa => ‘Str’)!
has ‘start’ => (isa => ‘Int’)
has ‘end’ => (isa => ‘Int’)
has ‘strand’ => (isa => ‘PlusOrMinus’) # +/-/.
Bio::ViennaNGS::Feature
extends Bio::ViennaNGS::MinimalFeature
has ‘name’ => (isa => ‘Str’)!
has ‘score’ => (isa => ‘Value’)
Bio::ViennaNGS::ExtFeature
extends Bio::ViennaNGS::Feature
has ‘extension’ => (isa => ‘Str’) 9
Generic Feature Annoation 2/2
Bio::ViennaNGS::FeatureChain
has ‘type’ => (isa => ‘Str’)!
has ‘chain’ => (isa => ‘ArrayRef’)
Bio::ViennaNGS::FeatureLine
extends Bio::ViennaNGS::MinimalFeature
has ‘id’ => (isa => ‘Str’)!
has ‘fc’ => (isa => ‘HashRef’)
10
Feature extends MinimalFeature by two
attributes, thereby representing a BED6 entry
FeatureChain bundles Feature elements,
creating individual annotation chains for e.g.
exons, introns,UTRs etc.
FeatureLine combines a set of individual
FeatureChain objects, thereby providing a
convenient means of representing transcripts
ViennaNGS Interval Classes
11
ViennaNGS Documentation and Tutorials
ViennaNGS comes with extensive documentation based on Perl’s
POD system, thereby providing a single documentation base
ViennaNGS::Tutorial guides prospective users through the
development of basic NGS analysis pipelines
The tutorial is split into different chapters, each covering a common
use case in NGS analysis and describing a possible solution step by
step
12
ViennaNGS Utilities
ViennaNGS comes with a collection of complementary executable Perl scripts for
accomplishing routine tasks often required in NGS data processing.
13
These CLI utilities serve as reference implementations of the library routines and
can readily be used for atomic tasks in NGS data processing.
ViennaNGS Availability
The ViennaNGS Perl distribution is available from GitHub & CPAN
https://github.com/mtw/Bio-ViennaNGS
http://search.cpan.org/dist/Bio-ViennaNGS
“ViennaNGS: A toolbox for building efficient next-generation sequencing analysis pipelines”
M.T. Wolfinger, J. Fallmann, F. Eggenhofer, F. Amman
bioRxiv preprint DOI:10.1101/013011
14
15
ResearchGate has not been able to resolve any citations for this publication.
Data
Full-text available
Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools.