
Hassan Alrehamy- PhD
- PhD at University of Babylon
Hassan Alrehamy
- PhD
- PhD at University of Babylon
About
7
Publications
26,582
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
160
Citations
Introduction
Current institution
Additional affiliations
October 2018 - present
November 2011 - July 2014
July 2014 - July 2018
Education
July 2014 - July 2018
Publications
Publications (7)
This paper presents Personal Data Lake, a single point storage facility for storing, analyzing and querying personal data. A data lake stores data regardless of their formats and thus provides an intuitive way to store personal data fragments of any type. Metadata management is a central part of the lake architecture. For structured/semi-structured...
Keyphrases are single- or multi-word phrases that are used to describe the essential content of a document. Utilizing an external knowledge source such as WordNet is often used in keyphrase extraction methods to obtain relation information about terms and thus improves the result, but the drawback is that a sole knowledge source is often limited. T...
Common Internet users today are inundated with a deluge of diverse data being generated and siloed in a variety of digital services, applications, and a growing body of personal computing devices as we enter the era of the Internet of Things. Alongside potential privacy compromises, users are facing increasing difficulties in managing their data an...
A data integration approach combines data from different sources and builds a unified view for the users. Big data integration inherently is a complex task, and the existing approaches are either potentially limited or invariably rely on manual inputs and inter position from experts or skilled users. SemLinker, an ontology-based data integration sy...
In this paper, we have introduced SemCluster, a clustering-based unsupervised keyphrase extraction method. By integrating an internal ontology (i.e., WordNet) with external knowledge sources, SemCluster identifies and extracts semantically important terms from a given document, clusters the extracted terms, identifies the most representive phrases...
Keyphrases provide important semantic metadata for organizing and managing free-text documents. As data grow exponentially, there is a pressing demand for automatic and efficient keyphrase extraction methods. We introduce in this paper SemCluster , a clustering-based unsupervised keyphrase extraction method. By integrating an internal ontology (i.e...
Questions
Questions (6)
I am running multiple evaluation tests on my new work regarding unsupervised AKE, Can you please list any social media or news datasets that are useful for evaluating AKE systems?
I need the dataset to be annotated with keyphrases by experts, and each annotation is based on controlled vocabulary.
I already have multiple scientific corpora like SemVal-2010, PubMed, and Inspec, but looking for datasets in other domains as my AKE system is domain-agnostic.
Thank you in advance.
Can you please list any clustering algorithms that don't require specifying the number of desired clusters in advance (e.g. Affinity Propagation)
For evaluation purpose, I need to evaluate my work using a dataset that contains changelogs describing the changes in schema during API upgrading.
It will help much if the changelogs swept from multiple well known APIs.
I have implemented Spectral Clustering in C# after long hours of struggling with the math behind its decomposition.
Now that I finished compiling and testing my code, I need to verify that my coding is correct, beyond 10 data points, verifying the goodness of my implementation using pen and paper is difficult, hence Can anyone please guide me where I can find an executable program / tool / toolbox that I can use to generate Top K eigen vectors for large number of input data points ?
I googled for so many times but couldn't find such resource.
Thank you in advance.
The question's title is straightforward ... can you recommend any decent tool for annotating named entities in a collection of domain-specific textual materials, for instance, medical research papers. I need to have a large training dataset in order to train my Maximum Entropy Sequence Model. However all the tools I checked online are either inaccurate or very difficult to use.
P.S/ the final output classifiers are domain specific, hence, the medical field is only an example, any recommendations of tools in different domain are most appreciated.
Thank you.
Which NLP library among the following ones below is the most mature and should be used in a scaled project of text processing.
In my project I am designing a full infrastructure of unstructured text processing, and there are scenarios where none of the available libraries can be leveraged, therefore I am implementing my own solution, however, I want to make evaluation against the current state of the art NLP libraries. and I want to pick up the most mature ones.
- OpenNLP
- Natural Language Toolkit (NLTK)
- Stanford NLP
- Machine Learning for Language Toolkit (MALLET)
- LingPipe
- Freeling
- TreeTagger
Thank you in advance.
Hassan.