Chinese Journal of Computers

Print ISSN: 0254-4164
Face recognition in videos is a hot topic in computer vision and biometrics over many years. Compared to traditional face analysis, video based face recognition has advantages of more abundant information to improve accuracy and robustness, but also suffers from large scale variations, low quality of facial images, illumination changes, pose variations and occlusions. Related to applications, we divide the existing video based face recognition approaches into two categories: videoimage based methods and video-video based methods, which are surveyed and analyzed in this paper.
High availability in peer-to-peer DHTs requires data redundancy. This paper takes user download behavior into account to evaluate redundancy schemes in data storage and share systems. Furthermore, we propose a hybrid redundancy scheme of replication and erasure coding. Experiment results show that replication scheme saves more bandwidth than erasure coding scheme, although it requires more storage space, when average node availability is higher than 48%. Our hybrid scheme saves more maintenance bandwidth with acceptable redundancy factor.
Dynamic Voltage Scaling (DVS), which adjusts the clock speed and supply voltage dynamically, is an effective technique in reducing the energy consumption of embedded real-time systems. However, most existing DVS algorithms focus on reducing the energy consumption of CPU only, ignoring their negative impacts on task scheduling and system wide energy consumption. In this paper, we address one of such side effects, an increase in task preemptions due to DVS. We present energy-efficient Fixed-priority with preemption threshold (EE-FPPT) scheduling algorithm to solve this problem. First, we propose an appropriate schedulability analysis, based on response time analysis, for supporting energy-efficient FPPT scheduling in hard real-time systems. Second, we prove that a task set achieves the minimal energy consumptions under Maximal Preemption Threshold Assignment (MPTA).
For parallel applications running on high-performance clusters, performance is usually satisfied without paying much attention to energy consumption. In this paper, we develop a new scheduling algorithm called Energy efficient Scheduling Algorithm based on DVS and Dynamic threshold (ESADD), which combines dynamic threshold-based task duplication strategy with dynamic voltage scaling (DVS) technique. It first exploits an optimal threshold to get the optimal task grouping and leverage task duplication algorithm to reduce schedule length and communication energy, and then schedules the applications on DVS-enabled processors that can scale down their voltages to reduce computation energy whenever tasks have slack time due to task dependencies. Using simulations we show our algorithm not only maintains good performance, but also has a good improvement on energy efficiency for parallel applications.
Data replication introduces well-known consistency issues. This paper puts forward the question about data dependence in data consistency, which embodies pseudo-conflict updates and update dependency. According to that, an optimistic data consistency method is proposed. In our method, data object is partitioned into data blocks by fixed size, as the basic unit of data management. Updates are compressed by Bloom filter technique and propagated in double-path. Negotiation algorithms detect and reconcile update conflicts, and dynamic data management algorithms accommodate dynamic data processing. The results of the performance evaluation show that our method is an efficient method to achieve consistency, good dynamic property, and strong robustness.
By studying the dynamic value density and urgency of a task, a preemptive scheduling strategy based on dynamic priority assignment is proposed. In the strategy, two parameters p and q are used to adjust the weight that the value density and urgency of a task impact on its priority, and a parameter beta is used to avoid the possible system thrashing. Finally, the simulations show that our algorithm is prior to the analogous algorithms, such as EDF, HVF and HVDF, on gained-value of the system, deadline miss ratio and preemptive number. .
Translocation is a prevalent rearrangement event in the evolution of multi-chromosomal species which exchanges ends between two chromosomes. A translocation is reciprocal if none of the exchanged ends is empty; otherwise, non-reciprocal. Given two signed multi-chromosomal genomes A and B , the problem of sorting by translocations is to find a shortest sequence of translocations transforming A into B . Several polynomial algorithms have been presented, all of them only allowing reciprocal translocations. Thus they can only be applied to a pair of genomes having the same set of chromosome ends. Such a restriction can be removed if non-reciprocal translocations are also allowed. In this paper, for the first time, we study the problem of sorting by generalized translocations, which allows both reciprocal translocations and non-reciprocal translocations. We present an exact formula for computing the generalized translocation distance, which leads to a polynomial algorithm for this problem.
Process expression is one of the most useful tools to describe the process semantics of a Petri net. However, it is usually not easy to obtain the process expression of a Petri net directly. In this research, a construction method is proposed to obtain the process expressions of all kinds of Petri nets based on decomposition. With this decomposition method, a Petri net is decomposed into a set of S-Nets. The process properties of the decomposition nets are easy to analyze since they are well-formed and their process expressions are easier to obtain. With the process expressions of the decomposition nets, an algorithm to obtain the process expression of the original Petri net is presented, which is expressed by the synchronization shuffle operation between processes.
The development of Internet and the popularity of social sites provide the large-scale experimental platform for researching the statistical properties and structure evolution of social networks. This paper mainly uses DBLP and Facebook datasets and built the social networks. We classify these networks by using role-to-role connectivity profiles and found that they belong to stringy-periphery class. We confirm that they have these properties, such as free-scale distribution, densification law and shrinking diameter. We discover there is a small core with high connectivity in social networks, and observed that many middle-scale communities are composed of stars. We research the evolution of community structure based on event framework and revealed that the community merge depends largely on the clustering coefficient of the graph composed of nodes which are directly connected between communities and the community split is related to its clustering coefficient.
Block cipher is the core of cryptography that provides data encryption, authentication and key management in information security. The security of block cipher is an important issue in the cryptanalysis. Based on the principle of differential cryptanalysis, this paper introduces a new cryptanalytic technique on block cipher: asymmetric impossible boomerang attack. The technique used asymmetric impossible boomerang distinguisher to eliminate wrong key material and leave the right key candidate. With key schedule considerations, techniques of looking up differential tables and re-using the data, the authors apply asymmetric impossible boomerang attack to AES-128. It is shown that attack on 7-round AES-128 requires data complexity of about 2105.18 chosen plaintexts, time complexity of about 2115.2 encryptions and memory complexity of about 2106.78 AES blocks. The presented result is better than any previous published cryptanalytic results on AES-128 in terms of the numbers of attacked rounds, the data complexity and the time complexity.
In this work, a novel coding characteristics prediction scheme is presented to improve R-D modeling, by exploiting spatio-temporal correlations. Two different approaches to the problem of optimum bit allocation at a macroblock-by-macroblock basis are achieved, one of which is developed on a modified MPEG-4 Q2 rate model and the other on a linear rate model. Extensive experiments show that the linear scheme is a bit more accurate than the quadratic one while they achieve similar coding performance. It's also shown that both the two schemes significantly exceed JVT G012, the current standardized RC scheme.
The rate distortion optimization (RDO) technique employed in H.264 improves coding performance significantly, but a very high computational load is introduced because under full search fashion every macroblock needs calculate R-D costs for all the prediction modes. In this paper, a fast intra-prediction mode decision algorithm based on orientation gradient is proposed to reduce R-D cost calculation. In the proposed algorithm, the gradients between the reference pixels and the current block pixels are calculated in all the prediction directions to detect texture direction and edge strength, and then the prediction modes with larger gradients are prepruned in the prediction process to accelerate encoding. And further, utilizing the spatial correlation of the prediction mode an improved algorithm is presented which adopts most probable mode (MPM) as the default candidate mode instead of direct current (DC) mode. Experimental results for 10 standard sequences demonstrate that compared with JM18.0 full search fashion both the baseline and the improved algorithms employed in luma component intra prediction save more than 50% of the encoding time with a negligible coding performance loss. Keeping the same peak signal-to-noise (PSNR) as JM full search, the baseline algorithm has a bit rate increment of 2.361% and the improved algorithm has a bit rate increment of 1.47% averagely.
Conventional volume data classifications use the statistical information of volume data, e.g. the multi-dimensional histogram, to create a design space to allow for interactive selection and exploration. To obtain a satisfactory result, adjusting parameters may be a time-consuming and laborious process because of the indirect workspace (multi-dimensional histograms) and the lack of adequate visual hints about the underlying dataset. This paper presents an intelligent data classification interface by leveraging the newest volume classification and data analysis techniques. The main steps include: (1) an over-segmentation of the dataset in a user-specified high-dimensional feature space; (2) a proximity-preserving 2D embedding of the centroids of the computed classes by means of the multidimensional scaling analysis; (3) intuitive user exploration, classification and visualization based on the two-dimensional embedding. The approach achieves real-time performance, and has been verified by several experiments.
This paper proposes a deterministic construction of an encoding from a finite filed F q to a C 34 curve using finding cube roots method, where the time complexity is O(log 3q). Based on the deterministic encoding, we construct a hash function from plaintext to C 34 curves. The new method provides up more than 30% speed improvements compare to Icart T.'s hash function in Crypto 2009 on the same prime filed. Moreover, we provide a new function indifferentiable from a random oracle based on our deterministic encoding.
Present 3D vertical stacking technology is not thermal-scalable and is unable to stack enough layers to maximize chip performance owing to intolerable hotness. This paper proposes a novel thermal-scalable 3D parallel-heat-sinking (PHS) stacking methodology which stacks all layers parallel to the heat-sinking path. All layers are the same strip shape of short dimension parallel to and long dimension vertical to the heat-sinking path. Therefore instead of thermal through-silicon-via (TSV), each regular silicon substrate provides an independent shorter and perfect heat-conduction path for its attached device layer because of silicon's good heat conductance. As a result, the peak substrate temperature of 3D PHS stacking chips does not increase as they stack many more layers. This paper further proposes an analytical model to compute the peak substrate temperature of 3D PHS stacking chips and to show the thermal-scalability of the methodology. Experiments on 3D integration for the future on-chip thousand-core parallel computing draw the conclusion that the 3D PHS methodology is of advantages including thermal scalability, thermal-TSV free, and high yield high yield.
With the rapid development of computer technology, image-based 3D building reconstruction has become a hot topic in the fields of computer graphics and computer stereo vision. Because the backgrounds of building images are typically complicated, and the sequence is very long and disorderly, existing 3D reconstruction algorithms will take a lot of time to obtain the 3D model, and possibly get poor results in the local areas. This paper proposes a novel 3D building facade reconstruction algorithm based on image matching and point cloud fusing to address those problems. Firstly, we find the best matching images in the additional new image set. Secondly, we get the 2D projection points of the 3D point cloud on the same image. The 3D points corresponding to the same 2D projection points are collected for 3D point cloud fusing. Thirdly, based on the derived 3D corresponding points, we compute the best alignment transformation between the point sets such that they can match with each other with minimum error. Finally, we merge the two point clouds with the computed best transformation to get the complete building facade model. Experiments show that our method can take much less time to reconstruct the building facade, and improve the precision as well.
This paper presents a novel method for representing the 3D model. A new complete orthogonal function system over nested triangulated domains, called V-system, is applied to decompose 3D model to spectrums. Due to some special properties of the V-system those separate 3D models could be represented untidily. The orthogonal expressions of the 3D models make it be possible that to study 3D models by spectrum analysis.
Three-dimensional grid where the voxels are rhombic dodecahedra is called FCC (face-centered cubic) grid which is one of the three-dimensional equivalents of the two-dimensional hexagonal grid. 3D Line generation algorithm is an important and fundamental algorithm in applications for 3D graphics and images. An integer line generation algorithm on the FCC grid is presented in this paper. Firstly, a line generation algorithm is observed on the 2D hexagonal grid based on the adjunct rhomb space, and then is extended to the 3D FCC grid. The method is a modification of the 3D cubic Bresenham algorithm based on an adjunct parallelepiped space with the same center and basis vectors. The 3D FCC line is generated employing the one-to-one correspondence between the parallelogram cells of parallelepiped space and the voxels of the FCC space. The procedure is characterized by a simple discriminator with up to 3 voxels being processed in one step and is implemented in integer form without accumulated error. In this way, the accumulation of rounding errors is eliminated completely.
A realistic 3D virtual head using multi-inputs for human-machine interface is proposed. It has following characteristics: (1) it is driven by video, text and speech, thus can interact with humans through more interfaces; (2) online appearance model is used to track 3D facial motion from video in the framework of particle filtering, and multi-measurements are infused to reduce the influence of lighting and person dependence for the construction of online appearance model; (3) 3D facial animation algorithm combining parameterized model and muscular model reduces computational effort greatly while maintains high reality; (4) the computational effort of visual co-articulation model is reduced by tri-phone model without sacrifice any performance. The performance and perceptual evaluations show the suitability for human-machine interface.
This paper proposes a novel multi-pose 3D facial landmark localization method based on multi-model information. In the presented method, affine invariant Affine-SIFT was utilized to 2D face texture image for feature point detection, the detected points were then mapped into the corresponding 3D face model. For 3D facial surface, the local neighbor curvature maximum change and iterative constraint optimization were combined to complete the facial landmark localization. The proposed method does not need estimate and define the posture of the face model and the format of 3D face mesh, therefore is more suitable for practical application. Experimental results on FRGC2.0 and NPU3D face database show the proposed method is robust to face pose change, and has higher localization accuracy compared with the existing methods.
Heterogeneous wireless network integration, typically 3G and WLAN integration, is an inevitable trend. Security is one of the major challenges which heterogeneous wireless network integration faces. How to integrate the vastly different security architectures used in each access network and unify user management is to be solved in urgent need. To achieve the security integration of 3G and WAPI based WLAN, a USIM based certificate distribution protocol is proposed. Two security integration schemes, i.e., loosely coupled and tightly coupled, are presented, which unify user management of 3G security architecture and WAPI, and realize WAPI based network access for 3G subscribers and identity privacy protection. The entity authentication and anonymity of the certificate distribution protocol is analyzed in CK model, and the results show that the protocol is provably secure.
Long-distance IEEE 802.11 wireless Mesh networks (LDmesh) have been emerged in recent years for its high bandwidth, low cost and large coverage. It can be applied in rural regions or sparse-populated areas to provide high-bandwidth Internet access. In LDmesh networks, links are point-to-point and as long as tens to hundreds of kilometers. To achieve the long range, the wireless nodes are usually equipped with high-power (e. g. 400 mW) 802.11 wireless cards and high-gain (e. g. 32 dBi) directional antennas. The links in one node cannot transmit or receive data simultaneously because of the inter-link interference. The traditional CSMA/CA MAC protocol suffers poor performance in such a network for its long propagation delay, ACK timeout and inter-link interference. Therefore new TDMA-type MAC protocols have been proposed to solve the above issues, such as 2P MAC, WiLDNet and JazzyMac. The upper layer protocols such as routing and network management thus face great challenges and appeal new design diagram. In this paper, we introduce the fundamental concepts of LDmesh networks, survey the research activities in recent years, and discuss in depth the challenges in terms of link performance, MAC protocols, routing protocols and network management. The future work is also discussed.
In all-channel jamming, traditional single channel Jammer needed relative time consuming channel switching. In this paper, an IEEE 802.11g all-channel Jammer (ARJ) by adjacent channel interference (ACI), especially non-overlapping channel interference is proposed. Compared with single channel Jammer, ARJ has the advantages of realizing all-channel jamming on fixed single channel, realizing alterable jamming radius by adjusting transmission power of jamming signal and realizing energy-efficient jamming by adjusting jamming frequency. Through theoretical analysis by the Markov chain model of DCF under the channel bit error rate (BER) and simulation scenario experiments, it is found that ARJ can jam all channels of IEEE 802.11g. Moreover ARJ is also implemented and verified in the real experiments.
Querying and processing data streams are widely used in many applications, such as financial systems and so on. For example, in bank card transaction systems, there exist abnormal trading records caused by terminal multiplexers. In general, such records often contain many abnormal objects that occur frequently with high fluctuating rate. However, existing work on frequent item mining cannot be used to handle this issue directly since only the item's frequency, not the fluctuating rate, is considered. In this paper, we first define the query formally, and then propose several novel solutions to handle this issue. Moreover, we extend our work to the sliding-window model to meet the requirement of stream evolving issue. Analysis in theorem and experimental reports show that our methods have low space-and time-complexities.
Hierarchical diagnosis is an important method for reducing the complexity of model-based diagnosis. It is a significant step for hierarchical diagnosis to construct the hierarchical model of the system to be diagnosed. The previous methods have been based on some particular abstraction model. And they have not related the generic notion of abstraction to hierarchical process in model-based diagnosis and not formally analyzed the methods of automatically building the hierarchy of the system. The KRA (Knowledge Reformulation Abstraction) model provides a general framework from a view of constructive point which has potential to unite the former abstraction methods. The authors constructed the system model within the KRA model framework. This model provides a dependent operating process based on the basic framework Rg of the system. The process avoids producing a repetitious basic type and thus not only reduces space complexity but also makes the abstract operators reused more frequently to improve the performance of the whole abstraction operating. In this paper the authors describe the hierarchical abstraction process in model-based diagnosis and provide a general algorithm to automatically construct the hierarchical representation of the system to be diagnosed. The algorithm runs by building the operators database in two ways: Dynamic and Static whose advantage and disadvantage are also analyzed. In addition, the authors also discuss the validity and complexity of the proposed algorithm.
The Web services on the Internet are all autonomous and dynamic in nature, so the availability of a single service cannot be always guaranteed. Clustering resources of the same category and abstracting the resources to a higher level, and dynamically routing to suitable resource at runtime is a common approach to improve availability. In this paper the authors use business service model to cluster services with similar functionalities, and propose an approach combining the request splitting and dynamic switching methods at runtime to improve the service availability. The authors also give the corresponding implementation of the approach. Although by using proposed approach the drawbacks of traditional replication approaches, the possible unavailability status of a single service, have been overcome, extra costs have been introduced because of the services' availability statuses and their different capabilities of covering the business functionality. So the authors propose a service selection algorithm, called SelectIndex, based on services' capability of covering users' requests and an availability calculation method combining real time and historical availability reflection to improve the business service's efficiency and reduce its execution cost. Finally the authors evaluate the above work by case study and simulation experiment.
In the evolution of Internet, it is very hard to test and deploy new network protocols in production networks for researchers, SDN based on Openflow tries to provide a feasible way. Openflow implements flows forwarding based on multiple tables by multiple pipelines, while Openflow does not describe the ability of computing and storing within networks, which could not support content-centric networks. To enhance the scalability of SDN, a general abstraction of the forwarding layer, Label Cast, is proposed, which maps network addresses to fixed-length labels and characterizes the ability of forwarding operations and processing functions with Label and Cast tables. Forwarding layer lookups based on fixed-length labels and schedules services, including light-semantics instructions of general forwarding operations, and network protocol semantics or status-related processing functions arranged by the Cast table, which could be extended to support the complexity processing of new networks. Label Cast supports end-to-end forwarding based on flows and non-end-to-end forwarding in content or information centric network architectures, which supplies a general abstraction layer for SDN.
Positive approximation is an effective approach to characterizing the structure of a target concept in information systems. To overcome the limitation of time-consuming of all existing feature selection algorithms in incomplete decision tables. This paper provides a general accelerated algorithm based on the positive approximation. This modified algorithm both possesses the rank preservation of attributes and reduces the time consumption through reducing the scale of data, which effectively accelerates the process of feature selection in incomplete decision tables. Experimental analyses verify the validity and efficiency of the accelerated algorithm. It is deserved to point out that the performance of these modified algorithms are getting better in time reduction with the data set becoming larger.
This paper presents a graphics hardware accelerated continuous collision detection algorithm for deformable objects. It can detect all inter-object and intra-object collisions for complex scenes which consist of deformable objects in real time. The algorithm executes collision detection tasks fully in parallel by mapping them with streaming decomposition. With parallel streaming registration algorithm, variable length data structures are efficiently supported on graphics hardware. The algorithm has been implemented on AMD graphics hardware platform with OpenCL. A serial of benchmarks with different characteristics are used for testing. Comparing to an optimized single-threaded CPU implementation on Intel [email protected] /* */ GHz, the algorithm achieves 9.2x~11.4x speedups.
In many computing domains, hardware accelerators can improve throughput and lower power consumption, instead of executing functionally equivalent software on the general-purpose micro-processors cores. While hardware accelerators often are stateless, network processing exemplifies the need for stateful hardware acceleration. The packet oriented streaming nature of current networks enables data processing as soon as the packets arrive rather than when the data of the whole network flow is available. Due to the concurrence of many flows, an accelerator must maintain and switch contexts between many states of the various accelerated streams embodied in the flows, which increases overhead associated with acceleration. This paper proposes to dynamically reorder the requests of different accelerated streams in a hybrid on-chip/memory based request queue to reduce the associated overhead. Through a simulation-based performance study, the effectiveness of the proposed mechanism for different popular stateful accelerators is shown. The experimental results shown the approach can help reduce the average response time significantly and improve throughput up to 26.7% and response time reduction of upto 50% for decompression acceleration compared with the traditional FIFO order design.
With the continuous scaling of the semiconductor process technology, performance improvement simply by increase the frequency is not feasible any more. Parallelism has become the key technology to obtain energy-efficiency in large scale applications. How to keep improving performance with the energy consuming under control has become a key problem. In this paper, a high performance computing weather forecasting application was tuned using a functional unit (FU) array based architecture. Software approaches like Array reforming, Loop rescheduling, local buffer pre-fetching, local buffer partition, and hardware approaches like pre-fetching parallelization are specifically employed to increase the data locality and data reuse to accelerate the stencil computation in GRAPES. The simulation results indicate that we can have an average IPC of 11.3, which is 2.3x of the multi-core processors, while only consuming 12% of its energy. This can accordingly reduce communication in the cluster system, resulting in a 11.7x power-effciency boost.
Towards building a more effective MPI infrastructure for multicore systems, a thread-based MPI program accelerator, called MPIActor, is proposed in this paper. MPIActor is a transparent middleware to assist general MPI libraries. For any single-thread MPI program, the MPIActor is optional in compiling phase. With the join of MPIActor, the MPI processes in each node will be mapped as several threads of one process, and the intra-node communication will be enhanced by taking advantage of the light-weight thread-based mechanism. The authors have designed and implemented the point-to-point communication module. This paper details the mechanism, the communication architecture and key techniques, and evaluates it with OSU LATENCY benchmark on a real platform. The experimental results show that the introduction of MPIActor can achieve a 2X performance for transferring 8 KB and 4 MB messages on MVPICH2 and OpenMPI parallel environments.
The purpose based query technology is the basis of the privacy-aware data access control in relational databases. Most researches focus on how to effectively build a purpose based privacy-aware data access control mechanism for an independent privacy-preserving database system. However, with the popularity of application integration and data sharing, how to merge the purpose based access control mechanisms in different applications and systems becomes a key issue. To address the problem, this paper presents the purpose fusion based privacy-aware data access control mechanism for the integration of multiple applications and systems. It analyzes the potential leakage risks of privacy-aware data due to the fusion of multiple purposes, and evaluates the leakage risks of nodes by considering a merged purpose tree as the risk purpose tree. Then, it split the risk purpose tree into a risk balanced purpose tree with the privacy degree of 0 for all nodes, and a set of risking paths containing the nodes with non-zero privacy degrees. Therefore, a query can be answered by checking the risk balanced purpose tree and then the risking paths, thus safe query results can be obtained with minimized privacy leakage risks. Three sets of experimental results have been given in this paper: (1) the query time comparison between different purpose based models for a same user and query; The RPPAAC model presented in this paper does not lead to a larger time overhead; (2) validity checking for the RPPAAC model in terms of the disclosure of private data; (3) the comparison of execution time for purpose fusion in different instances. Different from related works, this paper considers purposes as the carrier of privacy-aware data, the paths of purpose tree as transmission channels of privacy-aware data, and by introducing the public risk and the hidden risk, it evaluates the potential leakage risks of privacy-ware data during query answering, and presents the purpose fusion based privacy-aware data access control mechanism for integrating multiple applications and systems.
Inconsistency conflicts may arise between static separation-of-duty and availability policies due to their opposite focuses. This paper provides a priority-based approach to resolve policy inconsistency conflicts. Considering the facts of the policy strictness and its influence on the whole policy set, we propose a method to calculate the policy priority. The concepts of self-satisfied frequency and weighted conflict area are introduced to denote the policy strictness and its influence on the whole policy set respectively. Based on these two concepts, two algorithms for inconsistency resolution are presented according to different objectives of the policies: minimum cost algorithm and lexicographical inference algorithm. The experimental results show that the proposed priority-based conflict resolution approach scales reasonably well when the number of static separation-of-duty and availability policies is not very large.
As Peer-to-peer (P2P) file sharing is increasingly becoming popular as a new paradigm for file exchange, its security has been an issue of concern, and access control is one of the key technologies of system security. However, the decentralized and dynamic characteristic of P2P system makes the traditional access control methods cannot be applied directly, and the existing access control methods for P2P file-sharing system cannot be a good solution to against the large concurrent file-downloading request or bombarded malicious file-downloading requests. To address the above problems, a kind of access control mechanism based on commodity market (ACMCM) in P2P file sharing system is presented in this paper. The market mechanism makes the file-downloading requests in a balanced distribution among the file providers; the price of downloaded file increases in exponential with the number of repeat downloading on same file, so the malicious file downloading is effectively denied; the bandwidth is allocated rationally among the concurrent downloading, and it makes system time used for file-downloading be short as possible. We believe that ACMCM preserves P2P decentralized structure and dynamic property. Finally, the distributed protocol of the major processes is presented.
Rendering of multi-fragment effects can be greatly accelerated on the GPU. However, existing methods always need to read the model data in more than one passes, due to the requirements for depth ordering of fragments and the architecture limitation of the GPU. This has been a bottleneck for increasing the rendering efficiency, because of the limited transmittance bandwidth from CPU to GPU. Though there have been methods proposed to use CUDA with the data loaded once, they cannot process large models due to the limited storage on the GPU. This paper proposes a new method to implement single-pass GPU rendering of multi-fragment effects. It first decomposes the 3D model into a set of convex polyhedrons, and then by the viewpoint determines the order of transmitting the convex polyhedrons one by one to the GPU, to guarantee the correct ordering of fragments. In the process, the new method immediately performs illumination computation and blends the rendering results of the transmitted convex polyhedrons, so that it can greatly reduce the storage requirement. As a result, it can take more shading parameters to promote the rendering effects. Experimental results show that the new method can be faster than existing methods, even compared with the methods using CUDA, and can conveniently handle large models, even those with high depth complexity.
The integration of different heterogeneous access network is one of the remarkable characteristics of the next generation network, in which users with multi-network interface terminals can independently select access network (SP, Service Providers) to obtain the most desired Internet service. In this paper, a user-oriented visualized operational scheme at user terminal is proposed for heterogeneous network access selection. In the novel scheme, a unified quantification model for evaluating access service of heterogeneous systems and a network selection decision-making algorithm for maximizing user performance-cost-ratio are both applied and the optimal network pricing mechanism based on different game models among access networks is described for practical application scenarios. The profit characteristics of access networks for both noncooperative and cooperative game frameworks are compared and the user performance-cost-ratios in different game scenarios are evaluated via extensive numerical simulations.
As more cores are integrated into one die, chip multiprocessors suffers higher on-chip communication latency, and linearly increased directory overheard. Hierarchical cache architecture partitions on-chip caches into multilevel regions recursively, reducing the communication latency by replicating the data blocks to multiple regions that contains the requestor and alleviating the storage overhead of directory by using multilevel directory. According to the data distribution in the last-level cache, we improve the data placement policy and propose an enhanced hierarchical cache directory (EHCD). EHCD directly puts an incoming off-chip data block into the lowest region that contains the requestor to reduce access latency, which guarantees only one data replica is kept in the last-level cache for private data. EHCD improves the capacity utilization of the last-level cache as well as good scalability. Simulation results on a 16-core CMP show that compared with shared organization, EHCD gets 24% execution time reduction, and 15% reduction over original hierarchical cache design.
In modern-day storage systems, disk arrays and their cooling systems consume a major portion of total system power consumption. The existing research work on energy conservation mainly concentrates on optimizing the storage systems for random access-based applications. However, for the widely-used storage systems for sequential access-based applications, such as video surveillance, continuous data protection (CDP), virtual tape library (VTL), few schemes are proposed to gain better energy conservation according to their inherent characteristics with I/O performance guarantee so far. To this end, S-RAID 5, an energy-saving RAID system, is proposed for sequential access-based applications. In S-RAID 5, a local-parallelism strategy is employed: the entire storage area of the array is partitioned into different groups, and in each group a parallel data access scheme is adopted. Data grouping makes it possible to keep only a portion of disks active while keeping the rest standby. On the other hand, the intra-group parallelism is used to provide the performance guarantee. With an appropriate caching strategy to filter out a small amount of random accesses, S-RAID 5 can achieve prominent energy saving. In a simulation experiment of 32-channel video surveillance with D1 resolution standard, the measurement results of 24-hour power consumption show that the power consumption of S-RAID 5 is 59%, 23%, 21%, and 21% of those of Hibernator, eRAID, PARAID, and GRAID, respectively, while meeting the I/O performance and single-disk fault tolerance.
It is effective to increase the throughput and reduce the delay of ad hoc by concurrently transmitting different packets through distinct channels. Compared with other multichannel medium access control (MAC) protocols, Multiple Rendezvous is more flexible since there are no requirements of external hardware and time synchronization. Considering that as the overloads of handling deadlock, waiting and matching increase, the performance of multichannel ad hoc network is reduced by broadcasting, Matching Algorithm for Multiple Rendezvous (MAMR) is proposed. According to broadcast requirements, it is converged to a stable state without collision or deadlock. It has the characteristics that only broadcast nodes' degrees are greater than 1, and there exists no edge between broadcast nodes. It is proved that the algorithm stabilizes at most 4m moves on a network with m edges. Under the condition that the set of broadcast nodes is presented before the initial state, a maximal match is reached; under the condition of no broadcast node, the algorithm is equivalent to Hsu and Huang's self-stabilizing algorithm for maximal matching. Simulation results show that the performance of MAXimal Matching multichannel (MAXM) and Busy Tone Multichannel (BTMC) protocols are increased by 10% by using MAMR when 5% of transmission packets are required for broadcasting.
A black-box secret sharing scheme (BBSSS) differs from an ordinary linear secret sharing scheme over finite field. It works in exactly the same way over any finite Abelian group, as it only requires black-box access to group operations and to random group elements. In particular, there is no dependence on e.g. the structure of the group or its order. Until now, the efficient BBSSS for non-threshold access structure has not been constructed. In this paper, we give an approach to construct BBSSS for some non-threshold over arbitrary Abelian group by making use of the technique of monotone span scheme (MSP). First, we introduce the notation of weak MSPs and give the technique to construct the MSP based on weak MSPs. Then, we construct a pair of coprime weak MSPs by using the MSP over rational field and the MSP over finite field. Finally, we construct an efficient non-threshold BBSSS such as BBSSS for disjunctive multi-level access structure. Our new scheme can be applied to construct the new secure multi-party computation protocol over rings, linear integer secret sharing scheme, and distributed RSA signature for disjunctive multi-level access structure, and new zero-knowledge protocol.
Power has been a big issue in processor design for several years. Conventional popular approaches for addressing this issue like DVFS (Dynamic Voltage Frequency Scaling) now hit the law of diminishing returns. As multi/many-core processors becoming the main stream processors, caches account for more and more CPU die area and power, this paper presents using filtering unnecessary way accesses to reduce dynamic power consumption of caches shared by instruction and data. The methods include using Invalid Filter, which could eliminate accesses to cache ways contained invalid blocks, and I/D Filter, which could eliminate accesses to cache ways contained instruction/data access type mismatch blocks, and Tag-2 Filter, which could eliminate accesses to cache ways contained tag lowest 2 bits mismatch blocks. Since the methods reducing the activities happened in cache architecture, dynamical CPU power could be significantly decreased. In the paper, we also propose combining the above methods together, which is called Invalid+I/D+Tag-2 Filter, in an attempt to achieve better power saving results. We have verified the effectiveness and complementariness of the three proposed methods through analysis and experiments. Also, our evaluations show that, we could obtain 19.6%~47.8% (which is on average 34.3%) improvement on a 64KB-4way set-associative cache and 19.6%~55.2% (which is on average 39.2%) improvement on a 128KB-8way set-associative cache comparing to Invalid+I/D Filter, and 16.1%~27.7% (which is on average 16.6%) improvement on a 64KB-4way set-associative cache and 6.9%~44.4% (which is on average 25.0%) improvement on a 128KB-8way set-associative cache comparing to Invalid+Tag-2 Filter, respectively.
In this paper we explore the capability and flexibility of FPGA solutions in a sense to accelerate high precision scientific computing applications. First, we research the inner product operation, which occurs in almost all scientific and engineering applications, and propose the exact inner product algorithm based on exact long fixed-point operations. Taking IEEE 754-2008 quadruple precision floating-point as an example, we have implemented a full-pipelined Quadruple Precision Multiplication and Accumulation (QPMAC) into FPGA devices. We propose a two-level RAM banks scheme to store the exact fixed-point result, and use carry-saved accumulator scheme to minimize the width of fixed-point adder and simplify the logic of carry resolution. We also introduce a scheme of partial summation to enhance the pipeline throughput of MAC operations, by dividing the summation function into 4 partial operations, processed in 4 banks. To prove the concept, we prototype four QPMAC units into a XC5VLX330 FPGA chip and perform LU decomposition and MGS-QR decomposition. The experimental results show that our implementations based on FPGA achieve 42X-97X better performance, more precision results and much lower power consumption compared with the use of a parallel software approach based on OpenMP running on an Intel Core2 Quad Q8200 CPU at 2.33 GHz.
Wireless sensor networks (WSNs) is an important part of Internet of Things (IoT). Privacy-preserving in data aggregating is an effective way to protect security of data in WSN. SMART is one of data aggregation algorithms. This paper presents some optimizing factors used to enhance performance of SMART scheme in terms of privacy-preserving and data aggregating precision. Based on the optimal parameters, it proposes some new data aggregating schemes, which reduce communication overhead and then the energy consumption. Therefore, they prolong the lifetime of the network. Simulation results show that the proposed schemes P-SMART-CLPNT demonstrate good performance in data aggregation precision, communication overhead, and privacy-preserving with the optimal parameters.
While detecting defects with path-sensitivity, the defect state contains all data flow information of the current control flow vex, which might lower the efficiency by the defect irrespective data flow information. Further, in order to avoid the path explosion while full-path-sensitive analysis, the defect states encountering the control flow confluent nodes might be simply merged. The preliminary state-merging strategy might lead to an accuracy loss which could induce false positives. To address the above issues, this paper proposes a new program slicing algorithm based on defect patterns. The slice criteria include defect feature and path condition, and the source program is sliced by the inclusion relation between the CFG dataflow information and the slice criteria. The sliced program not only slices the defect irrespective codes, but also is totally equivalent to the original program, which improves the efficiency. In order to further reduce the false positives of path-sensitive analysis, this paper presents a refined state-merging strategy to diminish the accuracy loss, which selectively merges the defect states by adding path condition as state attribute. The authors have implemented the technique in DTSGCC (Defect Testing System for GCC), a software defect detecting tool for GCC projects in Linux. DTSGCC is applied to validate plenty of GCC open source projects. Experimental results suggest that applying the technique to path-sensitive defect detecting analysis improves the efficiency, at the same time reduce potential false positives.
The Fiedler vector of a graph plays a vital role in many applications, including matrix reordering, graph partitioning, protein analysis, data mining, machine learning, and web search. But it is usually regarded as very expensive in that it involves the solution of an eigenvalue problem. In this paper, we provide a new scheme to compute the Fiedler vector, based both on the shrink method and on the inverse power method. The computation of the Fiedler vector is reduced to that of the eigenvector related to the minimum eigenvalue of the reduced matrix. Further, a preconditioning method is introduced to reduce the computation cost. Any kind of known preconditioning technique for the solution of linear systems can be used in this method. For the graphs related to some of the sparse matrices downloaded from the UF Sparse Matrix Collection, the experiments are compared to the known novel schemes. In the experiments, diagonal scaling is used as the preconditioner and the algorithm is implemented with hybrid programming of MPI and OpenMP for parallel computing. The results show that the new scheme has the advantage both in efficiency and in accuracy. The application to graph bisection also shows that the equality is better than those with the known novel methods in most cases.
The application of traditional software defect prediction model is limited for its low accuracy and applicability. This paper puts forward a software defect prediction model based on ACO-SVM, which take advantage of non-linear computing power of the SVM and parameters optimization power of the ACO. Firstly, use PCA to reduce the dimensions of software defect metrics to increase the computation speed. Secondly, use ACO to find the optimal parameters for SVM automatically. Last, predict software defect using SAM with the optimal parameters. During the experiments, the authors used the 10-fold experiment methods. The experiment results indicate that the method has a higher prediction precision than the traditional software defect prediction model.
Top-cited authors
Zhongzhi Shi
  • Chinese Academy of Sciences
Aoying Zhou
  • Fudan University
Mumu Zhang
  • Shanghai Ocean University
Fang Hou
  • Guangdong University of Finance
Yiyu Yao
  • University of Regina