[Show abstract][Hide abstract] ABSTRACT: The measurement of change in biological systems through protein quantification is a central theme in modern biosciences and medicine. Label-free MS-based methods have greatly increased the ease and throughput in performing this task. Spectral counting is one such method that uses detected MS2 peptide fragmentation ions as a measure of the protein amount. The method is straightforward to use and has gained widespread interest. Additionally reports on new statistical methods for analysing spectral count data appear at regular intervals, but a systematic evaluation of these is rarely seen. In this work, we studied how similar the results are from different spectral count data analysis methods, given the same biological input data. For this, we chose the algorithms: Beta Binomial, PLGEM, QSpec and PepC to analyse three biological datasets of varying complexity. For analysing the capability of the methods to detect differences in protein abundance, we also performed controlled experiments by spiking a mixture of 48 human proteins in varying concentrations into a yeast protein digest, to mimic biological fold changes. In general, the agreement of the analysis methods was not particularly good on the proteome-wide scale, as considerable differences were found between the different algorithms. However, we observed good agreements between the methods for the top abundance changed proteins, indicating that for a smaller fraction of the proteome changes are measurable, and the methods may be used as valuable tools in the discovery-validation pipeline when applying a cross-validation approach as described here. Performance ranking of the algorithms using samples of known composition showed PLGEM to be superior, followed by Beta Binomial, PepC and QSpec. Similarly the normalized versions of the same method, when available, generally outperformed the standard ones. Statistical detection of protein abundance differences was strongly influenced by the number of spectra acquired for the protein, and correspondingly its molecular mass.
Journal of Proteome Research 03/2014; · 5.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: With automatic plant identification methods, the amount of herbicides used in agriculture can be reduced when herbicides are sprayed only on weeds. In the present study, leaves of oat (Avena sativa) and dandelion (Taraxacum officinale, TAROF) were arranged so that there was overlap between the species, imaged with a pulse amplitude modulation fluorescence camera and photographed with a digital color camera. The fluorescence induction curves from each pixel were parameterized to obtain a set of features and from color photographs, texture features were calculated. A support vector algorithm that also performed feature selection was used for pattern recognition of both data sets. Fluorescence-based identification worked well with oat leaves, producing 92.2 % of correctly identified pixels, whereas the texture-based method often mis-identified the central vein of a TAROF leaf as oat, identifying correctly only 66.5 % of oat pixels. With TAROF that shows a clear dicot-type texture, the texture method was slightly better (96.4 % correctly identified pixels) than the fluorescence method (94.6 %). In fluorescence-based identification, the accuracy varied between entire TAROF leaves, probably reflecting the genetic variability of TAROF. The results suggest that the accuracy of identification could be improved by combining two identification methods.
[Show abstract][Hide abstract] ABSTRACT: In printed circuit board (PCB) manufacturing the revolver head gantry machines are nowadays popular because of their flexibility, accuracy and high speed. The operation control of this kind of machines includes, for example, selecting the nozzles into the revolver, assigning the component reels to feeder slots, and determining the pick-up and placement sequences of the components. We consider in this article the feeder assignment and the pick-up and placement sequencing problems.Unlike to some previous literature on these problems, we suppose that each component can be picked up by a nozzle of a certain type, only. For the feeder assignment problem a new heuristic is given and tested against four existing algorithms. The proposed heuristic considers the types of the r nearest neighbors of each component on the PCB and assigns component feeders close to each other according to the closeness of the component types on the PCB. The experimental tests are performed using two data sets based on realistic PCBs. The new heuristic outperformed the previous methods with 3.4% faster placement times. For determining the pick-up and placement sequences four rule-based algorithms are introduced and their performances evaluated. The best two of these construct the sequences greedily around the placement position of a starting component, which is selected in turns as the component nearest to each corner of the PCB.
Computers & Operations Research 11/2013; 40(11):2611–2624. · 1.91 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Graph drawing is an integral part of many systems biology studies, enabling visual exploration and mining of large-scale biological networks. While a number of layout algorithms are available in popular network analysis platforms, such as Cytoscape, it remains poorly understood how well their solutions reflect the underlying biological processes that give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, such as those based on the force-directed drawing approach, may become uninformative when applied to larger networks with dense or clustered connectivity structure.
We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape.
The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure.
By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications.
[Show abstract][Hide abstract] ABSTRACT: Lysinuric protein intolerance (LPI) is an autosomal recessive disorder caused by mutations in cationic amino acid transporter gene SLC7A7. Although all Finnish patients share the same homozygous mutation, their clinical manifestations vary greatly. The symptoms range from failure to thrive, protein aversion, anemia and hyperammonaemia, to immunological abnormalities, nephropathy and pulmonary alveolar proteinosis. To unravel the molecular mechanisms behind those symptoms not explained directly by the primary mutation, gene expression profiles of LPI patients were studied using genome-wide microarray technology. As a result, we discovered 926 differentially-expressed genes, including cationic and neutral amino acid transporters. The functional annotation analysis revealed a significant accumulation of such biological processes as inflammatory response, immune system processes and apoptosis. We conclude that changes in the expression of genes other than SLC7A7 may be linked to the various symptoms of LPI, indicating a complex interplay between amino acid transporters and various cellular processes.
Molecular Genetics and Metabolism 12/2011; 105(3):408-15. · 2.83 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Printed Circuit Board (PCB) assembly business is a fast paced field of industry where the manufacturers must quickly adapt their production to meet the customer requirements. The goal of the production scheduling is to prioritize jobs and optimize the usage of the production lines by allocating the jobs optimally to different lines. For the efficient usage of the production lines the workload balancing between the machines must be done properly. Balancing is usually a challenging operation and doing it consists of multiple calculations that solve the optimal placement time for each machine for different groups of components. In the present paper, the problem is modeled and solved using Mixed Integer Nonlinear Programming (MINLP) techniques. A pseudoconvex objective function for optimizing the production planning is presented. Different convexification techniques of non-linear functions are presented. The convexified model guarantees, in theory, that the global optimal solution will be found. A set of test problems are solved using the CPLEX-software. The presented techniques can easily be applied in the design of new industrial systems, and, to improve the performance of already existing ones.
Proceedings of the 2011 American conference on applied mathematics and the 5th WSEAS international conference on Computer engineering and applications; 01/2011
[Show abstract][Hide abstract] ABSTRACT: We present a versatile user-friendly software tool, PolyAlign, for the alignment of multiple LC-MS signal maps with the option of manual landmark setting or automated alignment. One of the spectral images is selected as a reference map, and after manually setting the landmarks, the program warps the images using either polynomial or Hermite transformation. The software provides an option for automated landmark finding. The software includes a very fast zoom-in function synchronized between the images, which facilitate detecting correspondences between the adjacent images. Such an interactive visual process enables the analyst to decide when the alignment is satisfactory and to correct known irregularities. We demonstrate that the software provides significant improvements in the alignment of LC-MALDI data, with 10-15 landmark pairs, and it is also applicable to correcting electrospray LC-MS data. The results with practical data show substantial improvement in peak alignment compared to MZmine, which was among the best analysis packages in a recent assessment. The PolyAlign software is freely available and easily accessible as an integrated component of the popular MZmine software, and also as a simpler stand-alone Perl implementation to preview data and apply landmark directed polynomial transformation.
International journal of proteomics. 01/2011; 2011:450290.
[Show abstract][Hide abstract] ABSTRACT: With a great variation of products, and small product lot sizes, PCB assembling machines must be reconfigured frequently, and their configuration must account for multiple product types. The tradeoff between reconfiguring between product types, or using a single (albeit locally less efficient) configuration for all product types, depends on product lot sizes, and of course, on the cost of machine reconfiguration. In this paper we consider PCB assembly machines of the radial type, which are used in manufacturing robust electronics devices. In this machine type, the components are brought to the assembly point by the means of a single component tape. The component tape is constructed on-line by a separate feeder unit (which is the sequencer), composed of a set of slots storing component reels of various types. Insertion of certain component types (slow components) causes a delay in the movement of the component tape. We study the problem of assigning component reels to the sequencer in such a way that the delay caused by the tape construction is minimized for multiple PCB types. We assume that all the necessary components fit in the sequencer and therefore, its reconfiguration between PCB types can be avoided. We also give an integer programming formulation for the problem, and present a heuristic optimization algorithm to reduce the component insertion time caused by slow components.
[Show abstract][Hide abstract] ABSTRACT: The paper describes a parallel implementation of a cell-based algorithm for determining directional distances from points to polygons. Such distances have been applied, e.g., in coastal research. While the parallelization of the algorithm is in principle straightforward, the limitations of GPU devices lead to challenges in obtaining good performance. Our simple parallel GPU implementation achieves 8-fold speedup compared to a CPU implementation, yet maximal possible speedup is not achieved.
Proceedings of the 12th International Conference on Computer Systems and Technologies, CompSysTech 2011, Vienna, Austria, June 16-17, 2011; 01/2011
[Show abstract][Hide abstract] ABSTRACT: Water quality monitoring in topographically fragmented archipelago coasts calls for a dense observational network. However, visiting multiple sites and analyzing the samples requires a significant amount of work, leading to considerable economic cost. It is of interest to determine an efficient set of sites, which still offers adequate information on the water quality with a sufficient spatial accuracy. A method for optimizing an existing observational network is proposed. The method is concretized by applying it for an observational network in the Archipelago Sea, South West Finland. The network is pruned with the requirement that the observations of the removed sites can be estimated using those of the remaining sites. Suboptimal heuristics are used in pruning to keep the computational time acceptable. Some observations are not available and need to be estimated (imputed) before the pruning. For the network in the Archipelago Sea, the results of the pruning are somewhat sensitive to differences in imputed datasets and heuristics used for site selection.
[Show abstract][Hide abstract] ABSTRACT: This paper studies the combined task of determining a favorable machine configuration and line balancing (MCLB) for an assembly
line where a single type of printed circuit board is assembled by a set of interconnected, reconfigurable machine modules.
The MCLB problem has been solved previously by heuristic methods. In the present work, we give a mathematical formulation
for it and transform the model into a linear integer programming model that can be solved using a standard solver for problems
of moderate size. The model determines the best machine configuration and allocation of components to the machine modules
with the objective of minimizing the cycle time. Because the solutions found in this way are globally optimal, they can be
used to evaluate the efficiency of previous heuristics designed for the MCLB problem. In our experiments, an evolutionary
algorithm gave near optimal results.
KeywordsPrinted circuit board assembly–Reconfigurable machine modules–Line balancing–Integer programming–Mixed integer linear programming
International Journal of Advanced Manufacturing Technology 01/2011; 54(1):349-360. · 1.78 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Tandem mass spectrometry-based proteomics experiments produce large amounts of raw data, and different database search engines are needed to reliably identify all the proteins from this data. Here, we present Compid, an easy-to-use software tool that can be used to integrate and compare protein identification results from two search engines, Mascot and Paragon. Additionally, Compid enables extraction of information from large Mascot result files that cannot be opened via the Web interface and calculation of general statistical information about peptide and protein identifications in a data set. To demonstrate the usefulness of this tool, we used Compid to compare Mascot and Paragon database search results for mitochondrial proteome sample of human keratinocytes. The reports generated by Compid can be exported and opened as Excel documents or as text files using configurable delimiters, allowing the analysis and further processing of Compid output with a multitude of programs. Compid is freely available and can be downloaded from http://users.utu.fi/lanatr/compid. It is released under an open source license (GPL), enabling modification of the source code. Its modular architecture allows for creation of supplementary software components e.g. to enable support for additional input formats and report categories.
Journal of Proteome Research 10/2010; 9(12):6795-800. · 5.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Rapid nondestructive screening of mutants is a common step in many research projects in plant biology. Here we report the development of a method that uses kinetic imaging of chlorophyll fluorescence to detect phenotypes that differ from wild-type plants. The method uses multiple fluorescence features simultaneously in order to catch different types of photosynthesis-related mutants with a single assay. The Mahalanobis distance was used to evaluate the degree of similarity in fluorescence features between the wild-type and test plants, and plants differing strongly from the wild-type were classified as mutants. The method was tested on a collection of photosynthesis-related mutants of Arabidopsis thaliana. The plants were evaluated from images in which the color of each pixel depended on the Mahalanobis distance of the fluorescence features. Two parameters of the color-coding procedure were used to adjust the trade-off between detection of true mutants and erratic classification of wild-type plants as mutants. We found that a large percentage of photosynthesis-related mutants can be detected with this method. Scripts for the free statistics software R are provided to facilitate the practical application of the method.
Photosynthesis Research 09/2010; 105(3):273-83. · 3.15 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The study focuses on p120catenin, a regulator of cell adhesion, which has previously been described in many malignancies and suggested with a role in invasion and metastatic behaviour. In this study, we investigate the role of altered immunoexpression of p120catenin isoforms in the prognosis of invasive breast cancer (n = 351).
We used cDNA microarrays to screen differences in gene expression in invasive breast cancer in general, and between local and metastasized disease particularly. On this basis, we performed p120catenin immunohistochemistry in order to confirm the prognostic value of p120catenin isoforms on tissue microarrays comprising 341 patients from the era of mammographic screening, directed to modern surgical and oncological treatments, and followed-up for maximum of 20 years.
In cDNA microarray analysis, p120catenin was discovered down-regulated along with E-cadherin and alpha-catenin. In addition, p120catenin distinguished metastasized breast cancer from local disease. Immunohistochemistry confirmed the value of p120catenin as an independent prognosticator of breast cancer survival. In our results, p120catenin was associated with 3.7-fold risk of breast cancer death in multivariate Cox's regression analyses adjusted for the established prognosticators of breast cancer (p = 0.039). Particularly, the long isoform of p120catenin predicted metastatic disease (p = 0.029).
The present paper is the first report on p120catenin in invasive breast cancer based on a well-characterized patient material with long-term follow-up. We observed altered expression of p120catenin isoforms in invasive breast cancer and, in our material, the decrease in p120 immunoexpression was significantly associated with poor outcome of disease.
Journal of Cancer Research and Clinical Oncology 02/2010; 136(9):1377-87. · 2.91 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Several production planning tasks in the printed circuit board (PCB) assembly industry involve the estimation of the component placement times for different PCB types and placement machines. This kind of task may be, for example, the scheduling of jobs or line balancing for single or multiple jobs. The simplest approach to time estimation is to let the production time be a linear function of the number of components to be placed. To achieve more accurate results, the model should include more parameters (e.g. the number of different component types, the number of different component shapes, the dimensions of the PCBs, etc.). In this study we train multilayer neural networks to approximate the assembly times of two different types of assembly machines based on several parameter combinations. It turns out that conventional learning methods are prone to overfitting when the number of hidden units of the network is large in relation to the number of training cases. To avoid this and complicated training and testing, we use Bayesian regularisation to achieve efficient learning and good accuracy automatically.
International Journal of Production Research 01/2010; · 1.46 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Assembly of electronic components to Printed Circuit Boards (PCB) is a complicated manufacturing process and therefore its
control is usually divided into several subtasks which are handled separately. We consider the combined task of determining
a machine configuration and line balancing for a single assembly line of interconnected, reconfigurable machine modules and
one PCB type in production. The modules can be tailored to the needs of each PCB type by suitable assignments for placement
heads, nozzles and feeders. Out of these, the component-to-machine assignment appears to be most difficult and we propose
five different solution methods for it; brute force, random, greedy, local search and genetic algorithm. The genetic algorithm
outperformed the other methods in practical test.
[Show abstract][Hide abstract] ABSTRACT: Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to predict caspase cleavage sites from individual proteins, but currently none of them can be used to predict caspase cleavage sites from multiple proteins or entire proteomes, or to use several classifiers in combination. The possibility to create a database from predicted caspase cleavage products for the whole genome could significantly aid in identifying novel caspase targets from tandem mass spectrometry based proteomic experiments.
Three different pattern recognition classifiers were developed for predicting caspase cleavage sites from protein sequences. Evaluation of the classifiers with quality measures indicated that all of the three classifiers performed well in predicting caspase cleavage sites, and when combining different classifiers the accuracy increased further. A new tool, Pripper, was developed to utilize the classifiers and predict the caspase cut sites from an arbitrary number of input sequences. A database was constructed with the developed tool, and it was used to identify caspase target proteins from tandem mass spectrometry data from two different proteomic experiments. Both known caspase cleavage products as well as novel cleavage products were identified using the database demonstrating the usefulness of the tool. Pripper is not restricted to predicting only caspase cut sites, but it gives the possibility to scan protein sequences for any given motif(s) and predict cut sites once a suitable cut site prediction model for any other protease has been developed. Pripper is freely available and can be downloaded from http://users.utu.fi/mijopi/Pripper.
We have developed Pripper, a tool for reading an arbitrary number of proteins in FASTA format, predicting their caspase cleavage sites and outputting the cleaved sequences to a new FASTA format sequence file. We show that Pripper is a valuable tool in identifying novel caspase target proteins from modern proteomics experiments.
[Show abstract][Hide abstract] ABSTRACT: Gantry type component placement machines are widely used in electronics industry due to their accuracy, speed and flexibility. In order to make optimal plans of their usage one has to know the manufacturing time of a given component placement work before the actual processing. This presupposes the use of a reliable machine simulator. In the present article, discrete-event simulation is applied to predict component placement times of gantry type machines. The generic simulator model is based on a circumstantial analysis of the time factors by the means of observing and measuring the times of operations performed by the machine. As a result of this, very high accuracy is achieved; the operation times predicted by the simulator are for a particular machine type (Universal Instruments GC-60D) in range ±1 % from the observed operation times.