Questions related to Data Warehousing
My goal in this project is create a datawarehouse. I am primarily using online communities and platforms to teach myself the concepts of data warehousing. For example,
I'd like to know the basic concepts for designing a system in something like python
Hello, I am interested in data integration approaches. As I discovered, there are two main approches: materialized and virtual (mediator - wrapper).
I want to combine both (hence hybrid) as part of my solution, but I can't find a well informed process on how to do so.
I am a master student , doing my masters in Software Engineering. I have a interest towards data, my question here would be what would be a good motivational topic to grow up my skills and work towards a spectacular thesis.
Any suggestions would be really helpful for me to improve my skills in this domain.
I'm looking for a research project topic for my masters degree in data analytics. below are the couple of areas I'm interested in, Please suggest me some project idea related in to these subject areas:
- Big Data
- Database/No SQL Database/Data Warehouse
- Cloud Computing
- Data Mining
- Machine Learning/Deep Learning/NLP
I am planing to migrate a Enterprise data warehouse into Hadoop ,What are best modeling patterns that I can fallow for Big data platforms .We will be using Hive .
What is data mining query?
The Data Mining Query Language (DMQL) was proposed by Han, Fu, Wang, et al. for the DBMiner data mining system. The Data Mining Query Language is actually based on the Structured Query Language (SQL). Data Mining Query Languages can be designed to support ad hoc and interactive data mining. This DMQL provides commands for specifying primitives. The DMQL can work with databases and data warehouses as well. DMQL can be used to define data mining tasks. Particularly we examine how to define data warehouses and data marts in DMQL.
You can read additional descriptions in the link below:
( Please share other resources with us )
Do you think 'data value conflict' issue can be resolved using data normalization techniqueS? From my understanding de-normalization is a suggested by practitioners for DW development, but, normalizing a database includes amongst other aspects arranging data into logical groupings such that each part describes a small part of the whole, also normalization implies modifying data in one place will suffice, it also minimizes the impact of duplicate data. What do you suggest?
There are three major types of slow changing dimensions. These are called SCD 1, SCD 2, SCD 3. However, another type is known as SCD 6. Not many examples are seen in literature. I am looking for some references & your help in this regards.
I am working on my master theses for a quite big company department. It is about creating a data warehouse for a department process metrics. I need to build my research on a state-of-the-art and state-of-the-practice in data warehousing field. However, I can't figure out what to understand as the state-of-the-art and -practice in this area.
Could you please help me to clear up what type of information should I categorize as state-of-the-art or -practice and possibly point out resources where should I look for them? My theses is mainly about the "database" part of DW/BI system, thus the dimensional modeling, metadata model creation and the ETL process creation.
- problem within the data warehouse domain
- suggested method for work in order to solve this problem
- argumentation for the choice of this method shall be provided
- a related research section, build on at least five scientific publications, convincing the reader for the relevance of the proposed work
Please help DWH geeks.. I am completely new to DWH world :(
It is very important to keep a check on scalability and performance using APMs to solve big data problems. How important is the role of automated APMs (Application Performance Management) in solving Big Data problems?
as known the ETL tools are used to load data in data warehouses, but in OLTP databases it can be used to integrate systems based on data exchange on DB Level. Mean while is the exchange of these data on the Application Level needs the use of EAI. which is more optimized to use?
I have database in excel, is there any way so that I can create multidimensional datacube for further processing. Furthermore, is there any way so that we can perform data warehousing on such (excel) data.
I am the first author of a group responsible for updating a Cochrane systematic review that needs the cooperation of a Chinese co-author. We are looking for a partner for data extraction and evaluation of some clinical studies and we are not able to partner a few months ago. Do you know someone who can help us?
Big Data and Data Science have continued to emerge among practitioners and researchers. But the foundation of these concepts involve large volumes and a variety of data created at high velocity. Hence, the focus have generally been on bigger organisations that generate such data. However, small and medium sized organisations are also active adopter of ICT. Can Big Data and Data Science benefit small and medium enterprises as well and how?
I'm looking for further information (going beyond information given in e.g. GEO publications) on how databases like NCBI-GEO (http://www.ncbi.nlm.nih.gov/geo/) are set up. I'm curious how such a system can be designed in order to allow growing numbers of experiments, samples and expression data - without the need of redesigning a database layout, when a new type of data is available. Would be glad, if you could share resources, examples, detailed explanations or your own experience - anything welcome ;)
Many projects use metadata. They are backbone to many data warehouse systems. There is a major drawback with metadata, though. The notion of metadata quality is neither agreed on nor even clearly laid out.
Bruce and Hillmann said 10 years ago 'Like pornography, metadata quality is difficult to define. We know it when we see it, but conveying the full bundle of assumptions and experience that allow us to identify it is a different matter. For this reason, among others, few outside the library community have written about defining metadata quality. Still less has been said about enforcing quality in ways that do not require unacceptable levels of human effort.'
What can we do about it? And if we could not agree on a definition of metadata quality, would it not be a failed concept?
Data warehouses are very expensive for both up front and ongoing costs, and staff resources and inflexible. What other technologies exist to assist with combining data across systems more flexibly than a DW. Is there such a thing as a hub that we can plug systems into, match up key elements, and then just start extracting?
Is it possible to detect the real time ATM card fraud detection using data warehouse based system. I've upto one million transaction dataset in the Oracle database & I need to prepare a fraud detection model based on the dataset available. If it's possible, I'd like to request to provide some links for related research papers.
Medical data set is difficult to compare with other data sets, so classification techniques need t be combined to get accurate results.
We are building a large data warehouse in Hadoop. I am responsible for develping all of the compliance and governance policies. There will be over 100 applications interfacing into the warehouse, and tens of thousands of users. I am looking for some good best practices regarding entry into the warehouse, access control practices, and SOX applicability for Application controls.