[Show abstract][Hide abstract] ABSTRACT: Automatic identification of plant species is needed in precision agriculture in order to collect species information and guide sprayers of agrochemicals. Identification methods based on spectroscopic properties, leaf forms and chlorophyll fluorescence have been developed. Leaf overlap is a major difficulty and most of the proposed methods only operate on isolated leaves. The present study focused on the leaf overlap problem by analysing colour photographs of a mixed cultivation of oat (. Avena sativa) and a dicot weed (dandelion, Taraxacum officinale, TAROF). Leaves of the two species appeared to have very similar colours and therefore species identification was based on the different textures of monocot and dicot leaves. An automatic classifier, based on the RankRLS learning algorithm, was developed in the study and trained with manually labelled parts of the photographs. We adopted a strategy in which the misclassification of oat pixels to TAROF was avoided at the expense of classifying most TAROF pixels as oat. This strategy is appropriate when the aim of the automatic identification is to guide a herbicide sprayer. In photograph-wise cross-validation, the misclassification of oat as TAROF was negligible and considerably smaller than the expected amount of misclassifications, indicating that leaf texture is useful for identification of plant species in this very demanding case.
Full-text · Article · Oct 2015 · Computers and Electronics in Agriculture
[Show abstract][Hide abstract] ABSTRACT: Expression of proteins can be quantified in high throughput using different types of mass spectrometers. In recent years, there have emerged label-free methods to determine protein abundance. Although the expression is initially measured at the peptide level, a common approach is to combine the peptide-level measurements into protein-level values before differential expression analysis. However, simple combination is prone to inconsistencies between peptides and may lose valuable information. To this end, we introduce here a method to detect differentially expressed proteins by combining peptide-level expression change statistics. Using controlled spike-in experiments, we show that the approach of averaging peptide-level expression changes yields more accurate lists of differentially expressed proteins than the conventional protein-level approach. This is particularly true when there are only few replicate samples or the differences between the sample groups are small. The proposed technique is implemented in the Bioconductor package PECA and it can be downloaded from http://www.bioconductor.org.
No preview · Article · Sep 2015 · Journal of Proteome Research
[Show abstract][Hide abstract] ABSTRACT: Production planning and control of the printed circuit board (PCB) assembly includes several decisions dealing with, for example, grouping of PCB jobs, allocation of PCB batches to machine lines, sequencing of batches and load balancing of lines. The production time of a PCB job for a given placement machine is a key factor in this context and it must be quickly and accurately estimated, possibly millions of times in a single planning task, to avoid erroneous decisions. The commonly used nominal tact time-based estimators are very rough and the machine simulators too slow. Therefore, the purpose of this study is to give better machine-specific estimators that avoid the construction the actual machine control program. Two new estimators are proposed for gantry machines, one based on the information given by the manufacturer about the operations of the placement head, and the other on the regularised least-squares regression method trained with a set of PCB placement jobs. In practical evaluation with 95 PCB jobs, the mean absolute percentage error of the first and second methods are 3.75% and 6.52%, respectively, while that of the tact time-based approach is more than 17%. This indicates a great potential of the proposed methods as production time estimators.
No preview · Article · Apr 2014 · International Journal of Computer Integrated Manufacturing
[Show abstract][Hide abstract] ABSTRACT: The measurement of change in biological systems through protein quantification is a central theme in modern biosciences and medicine. Label-free MS-based methods have greatly increased the ease and throughput in performing this task. Spectral counting is one such method that uses detected MS2 peptide fragmentation ions as a measure of the protein amount. The method is straightforward to use and has gained widespread interest. Additionally reports on new statistical methods for analysing spectral count data appear at regular intervals, but a systematic evaluation of these is rarely seen. In this work, we studied how similar the results are from different spectral count data analysis methods, given the same biological input data. For this, we chose the algorithms: Beta Binomial, PLGEM, QSpec and PepC to analyse three biological datasets of varying complexity. For analysing the capability of the methods to detect differences in protein abundance, we also performed controlled experiments by spiking a mixture of 48 human proteins in varying concentrations into a yeast protein digest, to mimic biological fold changes. In general, the agreement of the analysis methods was not particularly good on the proteome-wide scale, as considerable differences were found between the different algorithms. However, we observed good agreements between the methods for the top abundance changed proteins, indicating that for a smaller fraction of the proteome changes are measurable, and the methods may be used as valuable tools in the discovery-validation pipeline when applying a cross-validation approach as described here. Performance ranking of the algorithms using samples of known composition showed PLGEM to be superior, followed by Beta Binomial, PepC and QSpec. Similarly the normalized versions of the same method, when available, generally outperformed the standard ones. Statistical detection of protein abundance differences was strongly influenced by the number of spectra acquired for the protein, and correspondingly its molecular mass.
Full-text · Article · Mar 2014 · Journal of Proteome Research
[Show abstract][Hide abstract] ABSTRACT: With automatic plant identification methods, the amount of herbicides used in agriculture can be reduced when herbicides are sprayed only on weeds. In the present study, leaves of oat (Avena sativa) and dandelion (Taraxacum officinale, TAROF) were arranged so that there was overlap between the species, imaged with a pulse amplitude modulation fluorescence camera and photographed with a digital color camera. The fluorescence induction curves from each pixel were parameterized to obtain a set of features and from color photographs, texture features were calculated. A support vector algorithm that also performed feature selection was used for pattern recognition of both data sets. Fluorescence-based identification worked well with oat leaves, producing 92.2 % of correctly identified pixels, whereas the texture-based method often mis-identified the central vein of a TAROF leaf as oat, identifying correctly only 66.5 % of oat pixels. With TAROF that shows a clear dicot-type texture, the texture method was slightly better (96.4 % correctly identified pixels) than the fluorescence method (94.6 %). In fluorescence-based identification, the accuracy varied between entire TAROF leaves, probably reflecting the genetic variability of TAROF. The results suggest that the accuracy of identification could be improved by combining two identification methods.
No preview · Article · Dec 2013 · Precision Agriculture
[Show abstract][Hide abstract] ABSTRACT: In printed circuit board (PCB) manufacturing the revolver head gantry machines are nowadays popular because of their flexibility, accuracy and high speed. The operation control of this kind of machines includes, for example, selecting the nozzles into the revolver, assigning the component reels to feeder slots, and determining the pick-up and placement sequences of the components. We consider in this article the feeder assignment and the pick-up and placement sequencing problems.Unlike to some previous literature on these problems, we suppose that each component can be picked up by a nozzle of a certain type, only. For the feeder assignment problem a new heuristic is given and tested against four existing algorithms. The proposed heuristic considers the types of the r nearest neighbors of each component on the PCB and assigns component feeders close to each other according to the closeness of the component types on the PCB. The experimental tests are performed using two data sets based on realistic PCBs. The new heuristic outperformed the previous methods with 3.4% faster placement times. For determining the pick-up and placement sequences four rule-based algorithms are introduced and their performances evaluated. The best two of these construct the sequences greedily around the placement position of a starting component, which is selected in turns as the component nearest to each corner of the PCB.
No preview · Article · Nov 2013 · Computers & Operations Research
[Show abstract][Hide abstract] ABSTRACT: The present work studies the operation control of so-called collect-and-place component placement machines. These kinds of machines are suited for the flexible manufacturing of various printed circuit board products. These machines operate in cycles where a set of components is first collected from the component feeders to the vacuum nozzles of the component placement head. The head then moves on the circuit board and places the components to their appropriate locations. Different component types require the use of different nozzle types, but the placement head has only a limited capacity for nozzles. Hence, the ability to change nozzles every now and then allows the manipulation of a great variety of component types with the same machine. This is accomplished by storing a larger selection of nozzles in a separate nozzle magazine from where the nozzle collection of the placement head can be updated. The cost of changing the nozzle setup is, however, relatively large compared to the time costs of other operations in the placement cycle. What complicates things more is that the nozzle change cost is affected by the organization of nozzles in the magazine, too. The aim of this work is to determine the contents of the nozzle magazine in such a way that the change operation times are as small as possible. We develop two heuristics (a genetic algorithm and a swarm optimization algorithm) for this purpose and evaluate their performance on sample problems. Both heuristic approaches are capable of processing realistic production problems, in particular the genetic algorithm finds near-optimal results for small problem instances and outperforms clearly our other approaches for larger problems.
No preview · Article · Sep 2013 · International Journal of Advanced Manufacturing Technology
[Show abstract][Hide abstract] ABSTRACT: Graph drawing is an integral part of many systems biology studies, enabling visual exploration and mining of large-scale biological networks. While a number of layout algorithms are available in popular network analysis platforms, such as Cytoscape, it remains poorly understood how well their solutions reflect the underlying biological processes that give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, such as those based on the force-directed drawing approach, may become uninformative when applied to larger networks with dense or clustered connectivity structure.
We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape.
The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure.
By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications.
[Show abstract][Hide abstract] ABSTRACT: Lysinuric protein intolerance (LPI) is an autosomal recessive disorder caused by mutations in cationic amino acid transporter gene SLC7A7. Although all Finnish patients share the same homozygous mutation, their clinical manifestations vary greatly. The symptoms range from failure to thrive, protein aversion, anemia and hyperammonaemia, to immunological abnormalities, nephropathy and pulmonary alveolar proteinosis. To unravel the molecular mechanisms behind those symptoms not explained directly by the primary mutation, gene expression profiles of LPI patients were studied using genome-wide microarray technology. As a result, we discovered 926 differentially-expressed genes, including cationic and neutral amino acid transporters. The functional annotation analysis revealed a significant accumulation of such biological processes as inflammatory response, immune system processes and apoptosis. We conclude that changes in the expression of genes other than SLC7A7 may be linked to the various symptoms of LPI, indicating a complex interplay between amino acid transporters and various cellular processes.
No preview · Article · Dec 2011 · Molecular Genetics and Metabolism
[Show abstract][Hide abstract] ABSTRACT: With a great variation of products, and small product lot sizes, PCB assembling machines must be reconfigured frequently, and their configuration must account for multiple product types. The tradeoff between reconfiguring between product types, or using a single (albeit locally less efficient) configuration for all product types, depends on product lot sizes, and of course, on the cost of machine reconfiguration. In this paper we consider PCB assembly machines of the radial type, which are used in manufacturing robust electronics devices. In this machine type, the components are brought to the assembly point by the means of a single component tape. The component tape is constructed on-line by a separate feeder unit (which is the sequencer), composed of a set of slots storing component reels of various types. Insertion of certain component types (slow components) causes a delay in the movement of the component tape. We study the problem of assigning component reels to the sequencer in such a way that the delay caused by the tape construction is minimized for multiple PCB types. We assume that all the necessary components fit in the sequencer and therefore, its reconfiguration between PCB types can be avoided. We also give an integer programming formulation for the problem, and present a heuristic optimization algorithm to reduce the component insertion time caused by slow components.
No preview · Article · Nov 2011 · Computers & Industrial Engineering
[Show abstract][Hide abstract] ABSTRACT: Water quality monitoring in topographically fragmented archipelago coasts calls for a dense observational network. However, visiting multiple sites and analyzing the samples requires a significant amount of work, leading to considerable economic cost. It is of interest to determine an efficient set of sites, which still offers adequate information on the water quality with a sufficient spatial accuracy. A method for optimizing an existing observational network is proposed. The method is concretized by applying it for an observational network in the Archipelago Sea, South West Finland. The network is pruned with the requirement that the observations of the removed sites can be estimated using those of the remaining sites. Suboptimal heuristics are used in pruning to keep the computational time acceptable. Some observations are not available and need to be estimated (imputed) before the pruning. For the network in the Archipelago Sea, the results of the pruning are somewhat sensitive to differences in imputed datasets and heuristics used for site selection.
[Show abstract][Hide abstract] ABSTRACT: Clustering of a data set can be done by the well-known Pairwise Nearest Neighbor (PNN) algorithm. The algorithm is conceptionally very simple and gives high quality solutions. A drawback of the method is the relatively large running time of the original (exact) implementation. Recently, an efficient version of the exact PNN algorithm has been introduced in literature. In this paper we give a faster implementation of this algorithm. The idea is to postpone the updating of the nearest neighbor information in order to reduce the number of cluster distance calculations. Correctness of the algorithm follows from the monotony of the cluster distances. Practical tests show that the new organization of the algorithm decreases the running time of PNN by ca. 35 per cent.
[Show abstract][Hide abstract] ABSTRACT: We present a versatile user-friendly software tool, PolyAlign, for the alignment of multiple LC-MS signal maps with the option of manual landmark setting or automated alignment. One of the spectral images is selected as a reference map, and after manually setting the landmarks, the program warps the images using either polynomial or Hermite transformation. The software provides an option for automated landmark finding. The software includes a very fast zoom-in function synchronized between the images, which facilitate detecting correspondences between the adjacent images. Such an interactive visual process enables the analyst to decide when the alignment is satisfactory and to correct known irregularities. We demonstrate that the software provides significant improvements in the alignment of LC-MALDI data, with 10-15 landmark pairs, and it is also applicable to correcting electrospray LC-MS data. The results with practical data show substantial improvement in peak alignment compared to MZmine, which was among the best analysis packages in a recent assessment. The PolyAlign software is freely available and easily accessible as an integrated component of the popular MZmine software, and also as a simpler stand-alone Perl implementation to preview data and apply landmark directed polynomial transformation.
[Show abstract][Hide abstract] ABSTRACT: This paper studies the combined task of determining a favorable machine configuration and line balancing (MCLB) for an assembly
line where a single type of printed circuit board is assembled by a set of interconnected, reconfigurable machine modules.
The MCLB problem has been solved previously by heuristic methods. In the present work, we give a mathematical formulation
for it and transform the model into a linear integer programming model that can be solved using a standard solver for problems
of moderate size. The model determines the best machine configuration and allocation of components to the machine modules
with the objective of minimizing the cycle time. Because the solutions found in this way are globally optimal, they can be
used to evaluate the efficiency of previous heuristics designed for the MCLB problem. In our experiments, an evolutionary
algorithm gave near optimal results.
KeywordsPrinted circuit board assembly–Reconfigurable machine modules–Line balancing–Integer programming–Mixed integer linear programming
Full-text · Article · Apr 2011 · International Journal of Advanced Manufacturing Technology
[Show abstract][Hide abstract] ABSTRACT: The Printed Circuit Board (PCB) assembly business is a fast paced field of industry where the manufacturers must quickly adapt their production to meet the customer requirements. The goal of the production scheduling is to prioritize jobs and optimize the usage of the production lines by allocating the jobs optimally to different lines. For the efficient usage of the production lines the workload balancing between the machines must be done properly. Balancing is usually a challenging operation and doing it consists of multiple calculations that solve the optimal placement time for each machine for different groups of components. In the present paper, the problem is modeled and solved using Mixed Integer Nonlinear Programming (MINLP) techniques. A pseudoconvex objective function for optimizing the production planning is presented. Different convexification techniques of non-linear functions are presented. The convexified model guarantees, in theory, that the global optimal solution will be found. A set of test problems are solved using the CPLEX-software. The presented techniques can easily be applied in the design of new industrial systems, and, to improve the performance of already existing ones.
[Show abstract][Hide abstract] ABSTRACT: The paper describes a parallel implementation of a cell-based algorithm for determining directional distances from points to polygons. Such distances have been applied, e.g., in coastal research. While the parallelization of the algorithm is in principle straightforward, the limitations of GPU devices lead to challenges in obtaining good performance. Our simple parallel GPU implementation achieves 8-fold speedup compared to a CPU implementation, yet maximal possible speedup is not achieved.
[Show abstract][Hide abstract] ABSTRACT: Tandem mass spectrometry-based proteomics experiments produce large amounts of raw data, and different database search engines are needed to reliably identify all the proteins from this data. Here, we present Compid, an easy-to-use software tool that can be used to integrate and compare protein identification results from two search engines, Mascot and Paragon. Additionally, Compid enables extraction of information from large Mascot result files that cannot be opened via the Web interface and calculation of general statistical information about peptide and protein identifications in a data set. To demonstrate the usefulness of this tool, we used Compid to compare Mascot and Paragon database search results for mitochondrial proteome sample of human keratinocytes. The reports generated by Compid can be exported and opened as Excel documents or as text files using configurable delimiters, allowing the analysis and further processing of Compid output with a multitude of programs. Compid is freely available and can be downloaded from http://users.utu.fi/lanatr/compid. It is released under an open source license (GPL), enabling modification of the source code. Its modular architecture allows for creation of supplementary software components e.g. to enable support for additional input formats and report categories.
No preview · Article · Oct 2010 · Journal of Proteome Research