Science topic

Data Warehousing - Science topic

Explore the latest questions and answers in Data Warehousing, and find Data Warehousing experts.
Questions related to Data Warehousing
  • asked a question related to Data Warehousing
Question
3 answers
My goal in this project is create a datawarehouse. I am primarily using online communities and platforms to teach myself the concepts of data warehousing. For example,
I'd like to know the basic concepts for designing a system in something like python
Relevant answer
Answer
I recommend the following 7 Data Warehousing Steps:
Step 1: Establish Business Goals.
Step 2: Collect and analyze data in step two.
Step 3: Determine the Core Business Processes.
Step 4: Build a Conceptual Data Model.
Step 5: Identifying Data Sources and Planning Data Transformations
Step 6: Define the Tracking Duration.
Step 7: Put the Plan into Action.
Kind Regards
Qamar Ul Islam
  • asked a question related to Data Warehousing
Question
3 answers
Hello, I am interested in data integration approaches. As I discovered, there are two main approches: materialized and virtual (mediator - wrapper).
I want to combine both (hence hybrid) as part of my solution, but I can't find a well informed process on how to do so.
Relevant answer
  • asked a question related to Data Warehousing
Question
3 answers
Someone can guide me to find the Thesis topic related to Datawarehousing?
Relevant answer
Answer
data mining technique
data leak
data security
data protection and encryption
data power
  • asked a question related to Data Warehousing
Question
18 answers
I am a master student , doing my masters in Software Engineering. I have a interest towards data, my question here would be what would be a good motivational topic to grow up my skills and work towards a spectacular thesis.
Any suggestions would be really helpful for me to improve my skills in this domain.
Relevant answer
Good Answer Dariusz Prokopowicz
  • asked a question related to Data Warehousing
Question
8 answers
Hi,
I'm looking for a research project topic for my masters degree in data analytics. below are the couple of areas I'm interested in, Please suggest me some project idea related in to these subject areas:
  • Big Data
  • Database/No SQL Database/Data Warehouse
  • Cloud Computing
  • Data Mining
  • Machine Learning/Deep Learning/NLP
Regards,
Richard
Relevant answer
Answer
As far as the current scenario is considered noting can be an apt topic than, SARS-COVID 19. Try using your computational skills and expertise in this field.
  • asked a question related to Data Warehousing
Question
3 answers
I am planing to migrate a Enterprise data warehouse into Hadoop ,What are best modeling patterns that I can fallow for Big data platforms .We will be using Hive .
Relevant answer
Answer
hello,
please see the works of Mohammed El malki in 2015-2016-2017
  • asked a question related to Data Warehousing
Question
5 answers
What is data mining query?
The Data Mining Query Language (DMQL) was proposed by Han, Fu, Wang, et al. for the DBMiner data mining system. The Data Mining Query Language is actually based on the Structured Query Language (SQL). Data Mining Query Languages can be designed to support ad hoc and interactive data mining. This DMQL provides commands for specifying primitives. The DMQL can work with databases and data warehouses as well. DMQL can be used to define data mining tasks. Particularly we examine how to define data warehouses and data marts in DMQL.
You can read additional descriptions in the link below:
( Please share other resources with us )
Relevant answer
Answer
SQL - Structured Query Level Programming - ORACLE 11i - Widely used data base programming .
  • asked a question related to Data Warehousing
Question
7 answers
Data warehousing is becoming increasingly vital in the world today
Relevant answer
Answer
Data warehousing and blochchain
  • asked a question related to Data Warehousing
Question
3 answers
Do you think 'data value conflict' issue can be resolved using data normalization techniqueS? From my understanding de-normalization is a suggested by practitioners for DW development, but, normalizing a database includes amongst other aspects arranging data into logical groupings such that each part describes a small part of the whole, also normalization implies modifying data in one place will suffice, it also minimizes the impact of duplicate data. What do you suggest?
Relevant answer
Answer
I want to answer this discussion in four series in the first I will focus on the when and why to use denormalization.
When and Why to Use Denormalization
As with almost anything, you must be sure why you want to apply denormalization. You need to also be sure that the profit from using it outweighs any harm. There are a few situations when you definitely should think of denormalization:
  • Maintaining history: Data can change during time, and we need to store values that were valid when a record was created. What kind of changes do we mean? Well, a person’s first and last name can change; a client also can change their business name or any other data. Task details should contain values that were actually at the moment a task was generated. We wouldn’t be able to recreate past data correctly if this didn’t happen. We could solve this problem by adding a table containing the history of these changes. In that case, a select query returning the task and a valid client name would become more complicated. Maybe an extra table isn’t the best solution.
  • Improving query performance: Some of the queries may use multiple tables to access data that we frequently need. Think of a situation where we’d need to join 10 tables to return the client’s name and the products that were sold to them. Some tables along the path could also contain large amounts of data. In that case, maybe it would be wise to add a client_id attribute directly to the products_sold table.
  • Speeding up reporting: We need certain statistics very frequently. Creating them from live data is quite time-consuming and can affect overall system performance. Let’s say that we want to track client sales over certain years for some or all clients. Generating such reports out of living data would “dig” almost throughout the whole database and slow it down a lot. And what happens if we use that statistic often?
  • Computing commonly-needed values up front: We want to have some values ready-computed so we don’t have to generate them in real time.
It’s important to point out that you don’t need to use denormalization if there are no performance issues in the application. But if you notice the system is slowing down – or if you’re aware that this could happen – then you should think about applying this technique. Before going with it, though, consider other options, like query optimization and proper indexing. You can also use denormalization if you’re already in production but it is better to solve issues in the development phase. Source: Emil (2016)
Any thought: Dennis Mazur , Nawroz Abdul-razzak Tahir Jeyris Martínez Gutiérrez Nada QASIM Mohammed
Dr.Hikmat Abdulkarim Almadhkhori
Ahmad Saad Ahmad Al-Dafrawi
Dr R Senthilkumar Lilianna Wojtynek ?
  • asked a question related to Data Warehousing
Question
2 answers
There are three major types of slow changing dimensions. These are called SCD 1, SCD 2, SCD 3. However, another type is known as SCD 6. Not many examples are seen in literature. I am looking for some references & your help in this regards.
Relevant answer
Answer
Check the attached link.
Hope it will help.
Regards
  • asked a question related to Data Warehousing
Question
3 answers
I am working on my master theses for a quite big company department. It is about creating a data warehouse for a department process metrics. I need to build my research on a state-of-the-art and state-of-the-practice in data warehousing field. However, I can't figure out what to understand as the state-of-the-art and -practice in this area.
Could you please help me to clear up what type of information should I categorize as state-of-the-art or -practice and possibly point out resources where should I look for them? My theses is mainly about the "database" part of DW/BI system, thus the dimensional modeling, metadata model creation and the ETL process creation.
Relevant answer
Answer
Hi, 
what is missing in your message is the subject of your thesis and its title, from which you can have a first idea about the context and motivation of your work. Is the department in need for a traditional data warehouse or are there any specific needs to be met? is the company dealing with structured data only? are there any 'data warehouse / big data" integration needs? what is specific with the company's data and needs that cannot be met by existing approaches of data warehousing?
I think that your starting point is rather the title and theme of your thesis. Once you have a clearer idea about what your company is looking for about data warehousing, you can draw the main keywords then drive your "state-of-the-art" and "state-of-practice" according to the keywords.
  • asked a question related to Data Warehousing
Question
1 answer
What are the recent technologies for enhancement of the shape and facilities of Data warehouse?
Relevant answer
Answer
Agile BI development products
Data Warehouse appliances
Big data analytics
In-memory data
BI workspaces and dashboards
Collaborative sharing of BI content
Mash-ups
Complex event processing
Mobile BI and data federation
above are the some recent technologies for data wharehousing
  • asked a question related to Data Warehousing
Question
1 answer
- problem within the data warehouse domain
- suggested method for work in order to solve this problem
- argumentation for the choice of this method shall be provided
- a related research section, build on at least five scientific publications, convincing the reader for the relevance of the proposed work
Please help DWH geeks.. I am completely new to DWH world :(
Relevant answer
  • asked a question related to Data Warehousing
Question
1 answer
It is very important to keep a check on scalability and performance using APMs to solve big data problems. How important is the role of automated APMs (Application Performance Management) in solving Big Data problems?
Relevant answer
Answer
Support existing and new applications, voluminous data and storage performance
  • asked a question related to Data Warehousing
Question
2 answers
as known the ETL tools are used to load data in data warehouses, but in OLTP databases it can be used to integrate systems based on data exchange on DB Level. Mean while is the exchange of these data on the Application Level needs the use of EAI. which is more optimized to use?
Relevant answer
Answer
It depends on the application's requirements. Optimisation almost always means "real-enough time" not "real-time". Integrating by OLTP is very expensive. Reducing time to replicate data even a small amount can vastly reduce costs.
  • asked a question related to Data Warehousing
Question
12 answers
I have database in excel, is there any way so that I can create multidimensional datacube for further processing. Furthermore, is there any way so that we can perform data warehousing on such (excel) data.
Relevant answer
Answer
Dear Mr Manish, I suggest you to take a look at Pentaho Data Integration (http://community.pentaho.com/projects/data-integration/) and Data Cleaner (http://datacleaner.org/)
Best regards.
  • asked a question related to Data Warehousing
Question
10 answers
I am the first author of a group responsible for updating a Cochrane systematic review that needs the cooperation of a Chinese co-author. We are looking for a partner for data extraction and evaluation of some clinical studies and we are not able to partner a few months ago. Do you know someone who can help us?
Relevant answer
Answer
If you are working with Cochrane, they provide good solutions and assistance with non-English papers.Also you may try Archie.
1. Log into the Archie
2. Go to Search tab
3. Select Advance tab and People
4. From first drop-down menu, select Group Role Assigned
5. From second drop-down menu, select Translator
6. You may or may not select the Cochrane Group and run the search
7. The other way is selecting Role Specification from first menu and search for Chinese
Almost all the people in the search results are volunteer translator but only some are active and maybe some like to do volunteer work.
If you ask the editorial base of the group, they might be able to link you to an active Chinese researcher.
Good Luck
  • asked a question related to Data Warehousing
Question
15 answers
Big Data and Data Science have continued to emerge among practitioners and researchers. But the foundation of these concepts involve large volumes and a variety of data created at high velocity. Hence, the focus have generally been on bigger organisations that generate such data. However, small and medium sized organisations are also active adopter of ICT. Can Big Data and Data Science benefit small and medium enterprises as well and how?
Relevant answer
Answer
Hi Kayode - i appreciate your question; it appears that there is much debate about the 'Big Data' these days, but perhaps comparatively little analysis of the actual uses for the Small to Medium Enterprises (SMEs).
I suppose the applicability of the 'Big Data' to SMEs might perhaps best relate to the business fundamentals around what sets of strategic actions might actually make a business more adaptive, resilient, sustainable, in better services of it's customer ecosystems, and ultimately, more profitable in the long-term.
What role might 'Big Data' play in this context?
Some obvious areas of compatibility might be semantic / sentiment analysis of the Social Cloud data, inferring 'what your customers are thinking about your SME brand' - as well as perhaps many other applications around hyper-localization.
Wouldn't it be nice if SMEs could very quickly and easily understand the emergent opportunities in their hyper-local contexts - what is it that their customers are actually looking / hoping / asking for, etc?
The analysis and utilization of Big Data is already starting to play a key role in this context; while extending these types of capabilities to the SMEs might correspondingly yield significant benefits.
  • asked a question related to Data Warehousing
Question
1 answer
I'm looking for further information (going beyond information given in e.g. GEO publications) on how databases like NCBI-GEO (http://www.ncbi.nlm.nih.gov/geo/) are set up. I'm curious how such a system can be designed in order to allow growing numbers of experiments, samples and expression data - without the need of redesigning a database layout, when a new type of data is available. Would be glad, if you could share resources, examples, detailed explanations or your own experience - anything welcome ;)
Relevant answer
Answer
NCBI-GEO are built are real time data experiments which were performed to find gene expression of various tissues. One can find gene expression data in following sites :
  • asked a question related to Data Warehousing
Question
4 answers
I want to analyze a CDR. What are the different methods that can used for this? Such as hadoop, some data mining tool, etc..
Relevant answer
Answer
Thank you Damien Mather for the suggestion. I think , the following 2 papers will be helpful for those working in this area:
Best-Fit Mobile Recharge Pack Recommendation
Mobile Subscriber Fingerprinting- A BigData Approach
  • asked a question related to Data Warehousing
Question
3 answers
Distributed data warehouses
Relevant answer
Answer
Well, as already mentioned, you don't need a data warehouse to do distributed data mining. Data warehouses are often useful for OLAP processing, and, to an extent, data mining. However, in a number of cases, the data warehouse is seen as a useful processing tool, used to fetch data that is then transformed and utilized by the data mining algorithms. And a poor data warehouse can actually impede the mining activities.
That said, the data warehouse may also be seen as a good tool for preparing and cleaning data (or, at least encoding how to handle the problems).
But, yes, distributed data mining is very possible (and done frequently) without data warehouses, both in the sense of (a) distributing the data and (b) distributing the computational requirements.
  • asked a question related to Data Warehousing
Question
13 answers
Many projects use metadata. They are backbone to many data warehouse systems. There is a major drawback with metadata, though. The notion of metadata quality is neither agreed on nor even clearly laid out.
Bruce and Hillmann said 10 years ago 'Like pornography, metadata quality is difficult to define. We know it when we see it, but conveying the full bundle of assumptions and experience that allow us to identify it is a different matter. For this reason, among others, few outside the library community have written about defining metadata quality. Still less has been said about enforcing quality in ways that do not require unacceptable levels of human effort.'
What can we do about it? And if we could not agree on a definition of metadata quality, would it not be a failed concept?
Relevant answer
Answer
Michael, I'll answer your question from a different perspective to Arjun & would like to separate the issues that arise in your provocative title "Is metadata a failed concept?" from the commentary around "metadata quality".
I have participated for many years in international standards development efforts relating to metadata schemas (in organisations like IEEE LTSC, IMS Global, DCMI & SC36). All these organisations have responded -- & are still responding -- to the consequences of the digital revolution in terms of information management & discovery. Successful standards are usually judged as being "fit for purpose" & rarely can add value beyond that. There's plenty of debate out there as to the usefulness of metadata standards or profiles of them but there's also plenty of success stories too. But another point I'd like to make is that "metadata" is not just a term that describes metadata schemas. There's an enormous amount of metadata that gets used in virtually every web service you can think of -- whether it is in the form of XML, RDF, RSS, etc; "date posted" information; or whether it's just a tag cloud or all the other data that gets collected as "analytics". Any content or data that can somewhow be expressed in terms of "who, what, when, & where" while also relating to some other content or data is essentially metadata. So, i don't think "metadata" is a failed concept -- to the contrary, it is as you indicate the "backbone of many warehouse systems" as well as what enables so much systems interoperability on the web.
The issue of metadata quality is of a totally different order for me. If you're managing systems or curating content that requires high quality metadata then it's best to make sure you're drawing on expert input. In many other situations the metadata quality may be of questionable quality or value -- but it's a consequence of how social media works. in time, I'd expect it to improve.
  • asked a question related to Data Warehousing
Question
10 answers
Data warehouses are very expensive for both up front and ongoing costs, and staff resources and inflexible. What other technologies exist to assist with combining data across systems more flexibly than a DW. Is there such a thing as a hub that we can plug systems into, match up key elements, and then just start extracting?
Relevant answer
Answer
Hi Susan, thanks for you question. Although I am not that familiar with the healthcare domain, I want to give two further comments on how to support the data exchange and integration across diverse clinical information systems. First, we can harness the benefits of applying interoperability standards, such as HL7, xDT, which may be based on XML, SGML, or other markup languages. Second, we might enhance the searching and information retrieval capabilities by building and using ontologies. However, similar to establishing a data warehouse application, creating an ontology can be costly and time-consuming. Still, I believe that Semantic Web technologies and the idea of Linked Data are key to efficient integration of data from distributed systems.
  • asked a question related to Data Warehousing
Question
1 answer
Is it possible to detect the real time ATM card fraud detection using data warehouse based system. I've upto one million transaction dataset in the Oracle database & I need to prepare a fraud detection model based on the dataset available. If it's possible, I'd like to request to provide some links for related research papers.
Relevant answer
Answer
Hi Jivan
There are various papers available from a google search. One which I've read and am busy applying to other financial data is by Wiese and Omlin 2011:
"Credit Card Transactions, Fraud Detection, and Machine Learning: Modelling Time with LSTM Recurrent Neural Networks"
  • asked a question related to Data Warehousing
Question
16 answers
Medical data set is difficult to compare with other data sets, so classification techniques need t be combined to get accurate results.
Relevant answer
Answer
I think it is not a question of whether data is from medical field or not, but the nature of the data. You may like to elaborate the kind of data you are dealing with -- images, numeric values, qualitative values, etc -- as well as their uncertainty/reliability aspects. Also the nature of classification you are interested in will be useful -- binary vs multi-class, overlapping vs mutually exclusive, etc. Classification is a very very researched domain -- plenty of techniques and experiences can be found in literature.
  • asked a question related to Data Warehousing
Question
1 answer
We are building a large data warehouse in Hadoop. I am responsible for develping all of the compliance and governance policies. There will be over 100 applications interfacing into the warehouse, and tens of thousands of users. I am looking for some good best practices regarding entry into the warehouse, access control practices, and SOX applicability for Application controls.
Relevant answer
Answer
Dear Ms Niemiec,
Best practices for Data Governance must commensurate with the category, purpose and End utilisation of the data warehouse that you are building;
The more defined the purpose, categorization of information and the resultant value of the data stored will result in the set of governance to be setup and adopted.
However, here are few of the key controls that the Data warehouse may have to meetup:
a. Value, Volume, retention, IP (Intellectual Property), Patents related etc., are to be assessed on the data that will be stored in the warehouse for final access
b. You may also check the ISO standards for Documentation storage and classification (there are standard on the security access) that needs to be setup
c. You may need to make the entire warehouse security / risk compliant in order to provide regulated access to users
d. Disaster Recovery in the event of any calamity needs to be though of and
modeled
e. Many of the data warehouses are generally "All Electronic: and hence the
benefit of fully monitored controls on the accessibility can be brought in.
f. You may also have to analyse the value / volume of information that will be stored in the data warehouse so that any increase in volume & value may result in either expansion of overall security / access controls or may result in more stringent enforcement
Lastly, Data is always a by-product of some thing and hence you may model a recreation process for any critical items of information that cannot be stored in a processed form for reasons of security / access (For Example, Individuals Wealth information);
Hope this help,
All the best for you endeavor,
SRINIVAASAN