Mikhail J. Atallah

Purdue University, West Lafayette, Indiana, United States

Are you Mikhail J. Atallah?

Claim your profile

Publications (273)81.71 Total impact

  • Mohamed Yakout, Mikhail J. Atallah, Ahmed Elmagarmid
    [Show abstract] [Hide abstract]
    ABSTRACT: Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims.
    Journal of Data and Information Quality (JDIQ). 08/2012; 3(3).
  • Kai C. Bader, Mikhail J. Atallah, Christian Grothoff
    [Show abstract] [Hide abstract]
    ABSTRACT: This article presents a new algorithm for finding oligonucleotide signatures that are specific and sensitive for organisms or groups of organisms in large-scale sequence datasets. We assume that the organisms have been organized in a hierarchy, for example, a phylogenetic tree. The resulting signatures, binding sites for primers and probes, match the maximum possible number of organisms in the target group while having at most k matches outside of the target group. The key step in the algorithm is the use of the lowest common ancestor (LCA) to search the organism hierarchy; this allows the combinatorial problem in almost linear time (empirically observed) to be solved. The presented algorithm improves performance by several orders of magnitude in terms of both memory consumption and runtime when compared to the best-known previous algorithms while giving identical, exact solutions. This article gives a formal description of the algorithm, discusses details of our concrete, publicly available implementation, and presents the results from our performance evaluation.
    Journal of Experimental Algorithmics 07/2012;
  • Ashish Kundu, Mikhail J. Atallah, Elisa Bertino
    [Show abstract] [Hide abstract]
    ABSTRACT: Redactable signatures for linear-structured data such as strings have already been studied in the literature. In this paper, we propose a formal security model for leakage-free redactable signatures (LFRS) that is general enough to address authentication of not only trees but also graphs and forests. LFRS schemes have several applications, especially in enabling secure data management in the emerging cloud computing paradigm as well as in healthcare, finance and biological applications. We have also formally defined the notion of secure names. Such secure names facilitate leakage-free verification of ordering between siblings/nodes. The paper also proposes a construction for secure names, and a construction for leakagefree redactable signatures based on the secure naming scheme. The proposed construction computes a linear number of signatures with respect to the size of the data object, and outputs only one signature that is stored, transmitted and used for authentication of any tree, graph and forest.
    01/2012;
  • Source
    Ashish Kundu, Mikhail J. Atallah, Elisa Bertino
    [Show abstract] [Hide abstract]
    ABSTRACT: Leakage-free authentication of trees and graphs have been studied in the literature. Such schemes have several practical applications especially in the cloud computing area. In this paper, we propose an authentication scheme that computes only one signature (optimal). Our scheme is not only super-efficient in the number of signatures it computes and in its runtime, but also is highly versatile -- it can be applied not only to trees, but also to graphs and forests (disconnected trees and graphs). While achieving such efficiency and versatility, we must also mention that our scheme achieves the desired security -- leakage-free authentication of data objects represented as trees, graphs and forests. This is achieved by another novel scheme that we have proposed in this paper -- a secure naming scheme for nodes of such data structures. Such a scheme assigns "secure names" to nodes such that these secure names can be used to verify the order between the nodes efficiently without leaking information about other nodes. As far as we know, our scheme is the first such scheme in literature that is optimal in its efficiency, supports two important security concerns -- authenticity and leakage-free (privacy-preserving/confidentiality), and is versatile in its applicability as it is to trees, graphs as well as forests. We have carried out complexity as well as experimental analysis of this scheme that corroborates its performance.
    IACR Cryptology ePrint Archive. 01/2012; 2012:36.
  • Hao Yuan, Mikhail J Atallah
    [Show abstract] [Hide abstract]
    ABSTRACT: A running max (or min) filter asks for the maximum or (minimum) elements within a fixed-length sliding window. The previous best deterministic algorithm (developed by Gil and Kimmel, and refined by Coltuc) can compute the 1D max filter using 1.5+o(1) comparisons per sample in the worst case. The best known algorithm for independent and identically distributed input uses 1.25+o(1) expected comparisons per sample(by Gil and Kimmel). In this work, we show that the number of comparisons can be reduced to 1+o(1) comparisons per sample in the worst case. As a consequence of the new max/min filters, the opening (or closing) filter can also be computed using 1+o(1) comparisons per sample in the worst case, where the previous best work requires 1.5+o(1) comparisons per sample (by Gil and Kimmel); and computing the max and min filters simultaneously can be done in 2+o(1) comparisons per sample in the worst case, where the previous best work (by Lemire) requires 3 comparisons per sample. Our improvements over the previous work are asymptotic, that is, the number of comparisons is reduced only when the window size is large.
    IEEE Transactions on Software Engineering 08/2011; · 2.59 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: T his paper develops and tests a privacy-preserving business process that supports the selection of a contract manufacturer by an original equipment manufacturer (OEM), and the determination of whether the OEM or the chosen contract manufacturer will procure each of the components to be used in the manufacture of the OEM's branded product. Our ''secure price-masking (SPM)'' technology contributes to procurement theory and practice in four significant ways: First, it preserves the privacy of every party's individual component prices. Second, SPM assures that the contract manufacturers will bid their own private purchase cost (i.e., not add a margin to their cost). Third, SPM is not invertible; i.e., none of the participants can ''solve'' for the private inputs of any other participant based on its own inputs and the outputs provided to it by SPM. Fourth, the posterior distribution of any other participant's private inputs is practically indistinguishable from its prior distribution. We also describe the results of a proof-of-concept implementation.
    Production and Operations Management 03/2011; 20(2). · 1.32 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In practice, assigning access permissions to users must satisfy a variety of constraints motivated by business and security requirements. Here, we focus on Role-Based Access Control (RBAC) systems, in which access permissions are assigned to roles and roles are then assigned to users. User-role assignment is subject to role-based constraints, such as mutual exclusion constraints, prerequisite constraints, and role-cardinality constraints. Also, whether a user is qualified for a role depends on whether his/her qualification satisfies the role's requirements. In other words, a role can only be assigned to a certain set of qualified users. In this paper, we study fundamental problems related to access control constraints and user-role assignment, such as determining whether there are conflicts in a set of constraints, verifying whether a user-role assignment satisfies all constraints, and how to generate a valid user-role assignment for a system configuration. Computational complexity results and/or algorithms are given for the problems we consider.
    IEEE Transactions on Dependable and Secure Computing 01/2011; 8:883-897. · 1.06 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A computer system's security can be compromised in many ways a denial-of-service attack can make a server inoperable, a worm can destroy a user's private data, or an eavesdrop per can reap financial rewards by inserting himself in the communication link between a customer and her bank through a man-in-the-middle (MITM) attack. What all these scenarios have in common is that the adversary is an untrusted entity that attacks a system from the outside-we assume that the computers under attack are operated by benign and trusted users. But if we remove this assumption, if we allow anyone operating a computer system- from system administrators down to ordinary users-to compromise that system's security, we find ourselves in a scenario that has received comparatively little attention. Methods for protecting against MATE attacks are variously known as anti-tamper techniques, digital asset protection, or, more commonly, software protection.
    IEEE Software 01/2011; 28:24-27. · 1.62 Impact Factor
  • Source
    Mikhail J. Atallah, Yinian Qi, Hao Yuan
    [Show abstract] [Hide abstract]
    ABSTRACT: Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied. Some earlier work focused on probabilistic skylines with a given threshold; Atallah and Qi [2009] studied the problem to compute skyline probabilities for all instances of uncertain objects without the use of thresholds, and proposed an algorithm with subquadratic time complexity. In this work, we propose a new algorithm for computing all skyline probabilities that is asymptotically faster: worst-case O(n &sqrt;n log n) time and O(n) space for 2D data; O(n2−1/d logd−1 n) time and O(n logd−2 n) space for d-dimensional data. Furthermore, we study the online version of the problem: Given any query point p (unknown until the query time), return the probability that no instance in the given data set dominates p. We propose an algorithm for answering such an online query for d-dimensional data in O(n1−1/d logd−1 n) time after preprocessing the data in O(n2−1/d logd−1) time and space.
    ACM Trans. Database Syst. 01/2011; 36:12.
  • Mikhail J. Atallah, Timothy W. Duket
    [Show abstract] [Hide abstract]
    ABSTRACT: It has long been known that pattern matching in the Hamming distance metric can be done in O(min(|Σ|,m/logm)nlogm) time, where n is the length of the text, m is the length of the pattern, and Σ is the alphabet. The classic algorithm for this is due to Abrahamson and Kosaraju. This paper considers the following generalization, motivated by the situation where the entries in the text and pattern are analog, or distorted by additive noise, or imprecisely given for some other reason: in any alignment of the pattern with the text, two aligned symbols a and b contribute +1 to the similarity score if they differ by no more than a given threshold θ, otherwise they contribute zero. We give an O(min(|Σ|,m/logm)nlogm) time algorithm for this more general version of the problem; the classic Hamming distance matching problem is the special case of θ=0.
    Information Processing Letters 01/2011; 111:674-677. · 0.49 Impact Factor
  • Keith B. Frikken, Hao Yuan, Mikhail J. Atallah
    [Show abstract] [Hide abstract]
    ABSTRACT: In the third-party model for the distribution of data, the data owner provides a third party (referred to as the dealer) with data as well as integrity verification information for that data, in the form of digital signatures that the dealer can use to convince a user of the data’s integrity (the dealer is not trusted with the owner’s signature keys, which is why it receives pre-signed items). The user’s interactions are with the dealer, who is in charge of enforcing access control and confidentiality for the data (i.e., no user should learn more than the outcome of their authorized query). This kind of outsourcing is becoming increasingly important because of its advantageous economics – a dealer who acts as a repository for many owners can achieve economies of scale that are not feasible for the individual owners, and the model allows the data owners to focus on what they do best (the creation and/or acquisition of high-quality data). A problem that arises in the context of outsourced databases (particularly for XML data) is the following: There is a total order Π on n items stored with the dealer, and a user query consists of a pair of items whose relative ordering should be revealed along with a proof that the result is correct. The proof is generated using the dealer’s local data (i.e., without bothering the data owner). The main difficulty is achieving efficient storage and query-processing while achieving the desiderata (that the user should learn nothing other than the answer to their query, and that a misbehaving dealer should not be able to convince a user of a wrong ordering). This paper gives a solution that is provably secure under a new assumption and can efficiently generate a very short proof. Furthermore, this scheme is generalized to partial orders that can be decomposed into d total orders. In this case, a user either learns the ordering of the queried items, or learns that they are incomparable.
    Applied Cryptography and Network Security - 9th International Conference, ACNS 2011, Nerja, Spain, June 7-10, 2011. Proceedings; 01/2011
  • Mikhail J. Atallah
    [Show abstract] [Hide abstract]
    ABSTRACT: The talk will review recent results and algorithmic challenges for computational geometry problems in the context of uncertain data. This is an active area of investigation in the database community, and we introduce it through the specifics of the maximal elements problem (called the skyline problem in the database community): Rather than being a point, an uncertain object is a set of points called instances, each with an associated probability; instances of the same uncertain object can be geometrically far from each other, and are mutually exclusive (i.e., at most one of them can occur). For this version of the maximal elements problem, the input is a collection of m uncertain objects, whose total number of instances is n, and the problem is to compute for each of these n instances the probability that it is a maximal point, i.e., that it occurs for its own object and is not dominated by any occurring instance from another object.
    08/2010;
  • Source
    Yinian Qi, Mikhail J. Atallah
    [Show abstract] [Hide abstract]
    ABSTRACT: Significant research efforts have recently been dedicated to modeling and querying uncertain data. In this paper, we focus on skyline analysis of uncertain data, modeled as uncertain objects with probability distributions over a set of possible values called instances. Computing the exact skyline probabilities of instances is expensive, and unnecessary when the user is only interested in instances with skyline probabilities over a certain threshold. We propose two filtering schemes for this case: a preliminary scheme that bounds an instance’s skyline probability for filtering, and an elaborate scheme that uses an instance’s bounds to filter other instances based on the dominance relationship. We experimentally demonstrate the effectiveness of our filtering schemes on both real and synthetic data sets and show the efficiency of our schemes compared with other algorithms.
    Database and Expert Systems Applications, 21th International Conference, DEXA 2010, Bilbao, Spain, August 30 - September 3, 2010, Proceedings, Part II; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates the possibilities of steganographi-cally embedding information in the \noise" created by automatic transla-tion of natural language documents. Because the inherent redundancy of natural language creates plenty of room for variation in translation, ma-chine translation is ideal for steganographic applications. Also, because there are frequent errors in legitimate automatic text translations, addi-tional errors inserted by an information hiding mechanism are plausibly undetectable and would appear to be part of the normal noise associated with translation. Signicantly, it should be extremely dicult for an ad-versary to determine if inaccuracies in the translation are caused by the use of steganography or by deciencies of the translation software.
    01/2010;
  • Source
    Hao Yuan, Mikhail J. Atallah
    [Show abstract] [Hide abstract]
    ABSTRACT: Given a d-dimensional array A with N entries, the Range Minimum Query (RMQ) asks for the minimum element within a contiguous subarray of A. The 1D RMQ problem has been studied intensively because of its relevance to the Nearest Common Ancestor problem and its important use in stringology. If constant-time query answering is required, linear time and space preprocessing algorithms were known for the 1D case, but not for the higher dimensional cases. In this paper, we give the first linear-time preprocessing algorithm for arrays with fixed dimension, such that any range minimum query can be answered in constant time.
    Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010; 01/2010
  • Mikhail J. Atallah, Keith B. Frikken
    [Show abstract] [Hide abstract]
    ABSTRACT: We give improved protocols for the secure and private outsourcing of linear algebra computations, that enable a client to securely outsource expensive algebraic computations (like the multiplication of large matrices) to a remote server, such that the server learns nothing about the customer's private input or the result of the computation, and any attempted corruption of the answer by the server is detected with high probability. The computational work performed at the client is linear in the size of its input and does not require the client to locally carry out any expensive encryptions of such input. The computational burden on the server is proportional to the time complexity of the current practically used algorithms for solving the algebraic problem (e.g., proportional to n3 for multiplying two n x n matrices). The improvements we give are: (i) whereas the previous work required more than one remote server and assumed they do not collude, our solution works with a single server (but readily accommodates many, for improved performance); (ii) whereas the previous work required a server to carry out expensive cryptographic computations (e.g., homomorphic encryptions), our solution does not make use of any such expensive cryptographic primitives; and (iii) whereas in previous work collusion by the servers against the client revealed to them the client's inputs, our scheme is resistant to such collusion. As in previous work, we maintain the property that the scheme enables the client to detect any attempt by the server(s) at corruption of the answer, even when the attempt is collusive and coordinated among the servers.
    Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2010, Beijing, China, April 13-16, 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper studies a discrepancy-sensitive approach to dynamic fractional cascading. We provide an efficient data structure for dominated maxima searching in a dynamic set of points in the plane, which in turn leads to an efficient dynamic data structure that can answer queries for nearest neighbors using any Minkowski metric. We provide an efficient data structure for dominated maxima searching in a dynamic set of points in the plane, which in turn leads to an efficient dynamic data structure that can answer queries for nearest neighbors using any Minkowski metric.
    05/2009;
  • Source
    M. Yakout, M.J. Atallah, A. Elmagarmid
    [Show abstract] [Hide abstract]
    ABSTRACT: Record linkage is the computation of the associations among records of multiple databases. It arises in contexts like the integration of such databases, online interactions and negotiations, and many others. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data. In such a framework where the entities are unwilling to share data with each other, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (i) they make no use of a third party; (ii) they achieve much better performance than that of previous schemes in terms of execution time and quality of output (i.e., practically without false negatives and minimal false positives). Our software implementation provides experimental validation of our approach and the above claims.
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on; 05/2009
  • Source
    Mikhail J. Atallah, Yinian Qi
    [Show abstract] [Hide abstract]
    ABSTRACT: Skyline computation is widely used in multi-criteria decision making. As research in uncertain databases draws increas- ing attention, skyline queries with uncertain data have also been studied, e.g. probabilistic skylines. The previous work requires "thresholding" for its efficiency - the efficiency re- lies on the assumption that points with skyline probabilities below a certain threshold can be ignored. But there are situations where"thresholding"is not desirable - low proba- bility events cannot be ignored when their consequences are significant. In such cases it is necessary to compute skyline probabilities of all data items. We provide the first algo- rithm for this problem whose worst-case time complexity is sub-quadratic. The techniques we use are interesting in their own right, as they rely on a space partitioning tech- nique combined with using the existing dominance counting algorithm. The effectiveness of our algorithm is experimen- tally verified.
    Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, June 19 - July 1, 2009, Providence, Rhode Island, USA; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates the possibilities of steganographically embedding information in the “noise” created by automatic translation of natural language documents. Because the inherent redundancy of natural language creates plenty of room for variation in translation, machine translation is ideal for steganographic applications. Also, because there are frequent errors in legitimate automatic text translations, additional errors inserted by an information hiding mechanism are plausibly undetectable and would appear to be part of the normal noise associated with translation. Significantly, it should be extremely difficult for an adversary to determine if inaccuracies in the translation are caused by the use of steganography or by deficiencies of the translation software.
    Journal of Computer Security. 01/2009; 17:269-303.

Publication Stats

5k Citations
81.71 Total Impact Points

Institutions

  • 1983–2012
    • Purdue University
      • Department of Computer Science
      West Lafayette, Indiana, United States
  • 2011
    • The University of Hong Kong
      Hong Kong, Hong Kong
  • 2009
    • Kwangwoon University
      • Department of Business Administration
      Seoul, Seoul, South Korea
  • 2005
    • Stony Brook University
      • Department of Computer Science
      Stony Brook, NY, United States
  • 2002
    • Syracuse University
      • Department of Electrical Engineering and Computer Science
      Syracuse, NY, United States
  • 1995
    • AT&T Labs
      Austin, Texas, United States
  • 1994
    • Utah State University
      • Department of Computer Science
      Logan, OH, United States
  • 1981–1992
    • Johns Hopkins University
      • • Department of Computer Science
      • • Department of Electrical and Computer Engineering
      Baltimore, MD, United States
  • 1988
    • University of Ottawa
      Ottawa, Ontario, Canada
  • 1984
    • CUNY Graduate Center
      New York City, New York, United States