Science topics: BioinformaticsData Integration
Science topic
Data Integration - Science topic
Explore the latest questions and answers in Data Integration, and find Data Integration experts.
Questions related to Data Integration
I want to integrate 5 GEO datasets and analyze them together. However, I don't want to include all the samples in all the GEO datasets but to include a specific phenotype along with the control group. Are there any tools available to do the same such as R package, web tools, linux based applications, etc? and if available what are the steps to do the analysis?
Hello, I am interested in data integration approaches. As I discovered, there are two main approches: materialized and virtual (mediator - wrapper).
I want to combine both (hence hybrid) as part of my solution, but I can't find a well informed process on how to do so.
I have transcriptomic and proteomics data from prokaryotes. What is the best way to integrate this data? Is there any bioinformatic tool for this data integration?
Are data fusion one stage of data integration? Is data fusion is reduced or replacement technique?
Please do let me know, thanks
The BCS in the UK have developed a Blueprint for Cyber Security in Health and Care, and aim to bring together key stakeholders to work together in improving security within the NHS. But are our health and social care systems really fit for the 21st Century, and are they citizen-focused?
If you have an opinion, please consider submitting to a paper to the link provided on this question.
Call for Papers
The development of new electronic services within health and social care provides an opportunity for citizen-focused health care, especially in sharing information across traditional organisational boundaries. With the move to enhanced services, there are also increasing risks, including from data breaches and in the hacking of medical devices.
As health care data often contains sensitive data, there are many risks around the trustworthiness of the security infrastructures used within health and social care and in the methods that can be applied to share information across domains. New regulations, too, such as with GDPR (General Data Protection Regulation), focus on an integrated security approach for incident response, encryption, and pseudoanonymisation and in providing citizens with more control over their data. This will require new approaches to the design of data architectures and the services used within health and social care infrastructures.
This special issue focuses on the latest research within health and social care for cyber security, including the application of new methods to integrate data from each part of the patient journey. It will also focus on the integration of policies and data protection methods, to protect against data breaches, along with allowing data to be used to improve patient safety and in reducing the costs of health care provision.
Potential topics include but are not limited to the following:
- Data breaches and risk models within health and social care
- Detecting and responding to health care data breaches
- New design principles for GDPR requirements within health and social care data
- Models for risk analysing cyber security threats
- Citizen-focused health and social care systems
- Information sharing and secure architectures within health and social care
- Trust, governance, and consent around interagency approaches
- Data sharing models for interagency approaches
- Creating consensus models for health and social care
- Cloud-based architectures for integrated health and social care
- Cryptography for health and social care data
- Integration of cryptography for trust and governance, including methods for anonymization and secure data processing
- Policy integration for data access
- Anonymization and sanitisation of health care data
- Application of blockchain methods within a health care environment
- Attacks on health care devices
- Vulnerabilities in health care devices
Hello everyone:
Many countries around the world are looking for updated and integrated data and information related to Renewable Energy Sources for solving practical problems in this area. The question is Do you think that building a Knowledge Graph on Renewable Energy Sources could be a suitable research project?
Do you know about any research project related to this topic?
I would like to hear your comments and insight about that.
Best regards.
Dear Researchers,
I would like to know what were technical, practical challenges with integratation of various country-, regional- and provider-level health data and how you overcome them (i.e. from problem to solution).
ALCOA is a compliance used in life sciences and pharmaceuticals for maintaining the integrity of the data. i am thinking of using this standard in IT sector for maintaining the integrity of the data.can anyone please suggest do we use it or not or is there any standard available for data integrity in IT sector.
I want to compare with what i currently have
I have a list in the following format. I want to sort my list in decreasing order according to the length of each list.
mylist:
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
+ 6/9453 vertices, named:
[1] VEGFA EPHB2 GRIN2B AP2M1 KCNJ11 ABCC8
[[1]][[2]]
[[1]][[2]][[1]]
+ 4/9453 vertices, named:
[1] VEGFA VTN PRKCA ADCY5
[[1]][[3]]
[[1]][[3]][[1]]
+ 0/9453 vertices, named:
[[1]][[4]]
[[1]][[4]][[1]]
+ 4/9453 vertices, named:
[1] VEGFA KDR GRB2 ADRB1
[[1]][[5]]
[[1]][[5]][[1]]
+ 3/9453 vertices, named:
[1] VEGFA AKT1 AKT2
[[1]][[6]]
[[1]][[6]][[1]]
+ 4/9453 vertices, named:
[1] VEGFA CTGF AP3D1 AP3S2
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
+ 6/9453 vertices, named:
[1] HHEX EFEMP2 TP53 ARIH2 ENSA ABCC8
[[2]][[2]]
[[2]][[2]][[1]]
+ 5/9453 vertices, named:
[1] HHEX TLE1 POLB PRKCA ADCY5
[[2]][[3]]
[[2]][[3]][[1]]
+ 0/9453 vertices, named:
[[2]][[4]]
[[2]][[4]][[1]]
+ 5/9453 vertices, named:
[1] HHEX TLE1 ATN1 MAGI2 ADRB1
[[2]][[5]]
[[2]][[5]][[1]]
+ 4/9453 vertices, named:
[1] HHEX JUN ESR1 AKT2
[[2]][[6]]
[[2]][[6]][[1]]
+ 6/9453 vertices, named:
[1] HHEX TLE1 CDK1 BUB1 AP3B1 AP3S2
[[3]]
[[3]][[1]]
[[3]][[1]][[1]]
+ 7/9453 vertices, named:
[1] PPP1R3A RPS6KA1 MAPK1 TP53 ARIH2 ENSA ABCC8
[[3]][[2]]
[[3]][[2]][[1]]
+ 4/9453 vertices, named:
[1] PPP1R3A PLN PRKACA ADCY5
[[3]][[3]]
[[3]][[3]][[1]]
+ 0/9453 vertices, named:
[[3]][[4]]
[[3]][[4]][[1]]
+ 4/9453 vertices, named:
[1] PPP1R3A RPS6KA1 GRB2 ADRB1
[[3]][[5]]
[[3]][[5]][[1]]
+ 4/9453 vertices, named:
[1] PPP1R3A RPS6KA1 PDPK1 AKT2
[[3]][[6]]
[[3]][[6]][[1]]
+ 6/9453 vertices, named:
[1] PPP1R3A RPS6KA1 MAPK1 IRS1 AP3S1 AP3S2
where "+ 6/9453" indicating the length of that list.For component 1 there are six list of different length, so i want to sort all the component list in decreasing order. The zero length element is not to be considered.
I used one command i don't know is it right way to do it.
mylist[sort(order(mylist))]
Error: unexpected ']' in "mylist[sort(order(mylist))]"
Thanks.
I have a data set where I have collected the following scatter parameters:
FSC-A, SSC-A, SSC-H, and SSC-W.
( I dont have FSC-H and FSC-W in this set).
Is there a pair that I can use to to isolate singlets? Thanks.
Pharmaceutical industry, data integrity
I know about three panel co-integration existence tests i.e. pedroni, kao and Johansen fisher tests. if all tests give significant results then which test should be used for interpreting results?
Which test is more valid?
Please provide reference too..
When you read scientific research related to climate change it is apparent that many types of disparate data are integrated when simulations of the future are made. How does the educated, social scientist (with no climatology training ) evaluate the quality of the research and how it is integrated to produce different scenarios with broader or more narrow ranges of values? Are many researchers focusing on the replication of research?
How would you rate the state of climate research: nascent, developing, developed in specific areas, mature? What are the biggests current gaps in our knowledge of ocean and atmospheric systems...? Are there also gaps in modeling the interactions between systems that contribute to less certainty? Does the climate change community readily admit to those gaps? What are some significant recent anomalies? Have they been accounted for sufficiently?
How do you judge climate scientists that have gradually evolved into advocates?
How many observation will suffice to conduct panel co-integration test? I mean, how many groups and time spans are needed for this test?
Hi All,
I am doing some analyses on a population of adolescents using measurements of some clinical variables collected over 3 time points and broken down by 5 groups of stages of puberty. I was wondering which is the most appropriate strategy of analysis if I add another level of complexity by adding a variable "year" indicating a repetition of the measurements in the same experimental setting but one year later (so the population under analysis has own puberty stage changed of 1-2 units). Another element of complexity is that the set of participants in the 2 consecutive years is different with some overlap between the 2 sets but including disjoint sets.
this is a demo of my data set:
idx id stage_of_p clinical_var time_point year
1 1 1 11 1 2003
2 1 1 10 2 2003
3 1 1 11 3 2003
4 2 2 14 1 2003
5 2 2 13 2 2003
6 2 2 10 3 2003
7 3 3 15 1 2003
8 3 3 13 2 2003
9 3 3 10 3 2003
10 4 4 10 1 2003
11 4 4 11 2 2003
12 4 4 10 3 2003
13 5 5 13 1 2003
14 5 5 15 2 2003
15 5 5 17 3 2003
16 6 1 11 1 2004
17 6 1 12 2 2004
18 6 1 12 3 2004
19 7 2 11 1 2004
20 7 2 14 2 2004
21 7 2 13 3 2004
22 1 1 12 1 2004
23 1 1 11 2 2004
24 1 1 15 3 2004
25 2 2 11 1 2004
26 2 2 12 2 2004
27 2 2 11 3 2004
Thanks,
Maria
There are many papers associated with my research direction in different regions around the world,so I want to synthesize the data from papers in different regions .However ,It is difficult to synthesize the data because of different experiments and statistical methods.And I don't know whether there is a software or method to solve my question.I anticipate your reply sincerely
In schema matching, there are various ambiguities. I would like to know in real scenario, what are the various differences exists.
When working in MAUD with 2D data that is integrated in say 10 degree slices around the azimuthal direction you get data sets that cover different 2theta ranges. I want to cut several such ranges off before the edge of the detector as it works badly there. Is there any way to actually do this in MAUD?
As far as I can tell, you can only set one cut-off range that applies to all sections and you can also only make excluded regions that apply to all ranges.
I manage a research participant registry. I have just come on board and the data from participant surveys has been being input by volunteers. I have checked the first 100 surveys for errors and have found around a 60% error rate (around 60% of the surveys have at least one entry error). I plan to double enter all of the current surveys at 100%. However, outside of more extensive volunteer training, I am looking for measures to ensure data integrity for the future surveys.
Deconvolution (or commonly named integration) of NMR 13C spectra of compost samples seems to be difficult in free software (such as DmFit). Is there another way to do it rapidly without fitting spectrum manually to the one given by the instrument whitch is a Bruker UltraShield 400.
I am conducting a study about evaluating Document database (MongoDb) and Relational database. For both databases, I created a simple application in web2py with only CRUD functions to evaluate fairly. However, I cannot find a good way in evaluating the two. I am mostly focused in terms of their Data Integrity, Ease of Use, Cost (in-memory) and Database Structure.
I need to find a reference on how to measure the 4 criteria. For instance, how can I measure the data integrity of Document database?
Please, could someone show me some good material to study and understand about the methods of data integration?
My project involves the use microarray and HPLC to generate the data. I'm having trouble to finding good information about the new methods employed today.
How to interpret Mass spectra of DNA from raw data, actually we have conjugate DNA with organic molecules after conjugation we want characterized our product by MALDI-mass analysis and got spectra in m/z , these spectra are not showing actual peak of product, it is less than 300D from product .
All of us, with our skills, expertise and "know hows" could get the intellectual power to give a solution to whatever problem we face in science
Instead of doing a simulation, I need a dataset with hundreds or thousands of records distributed over multiple data sources. I need it for data integration purposes where the same entity may have multiple records in different data sources.
I have read a couple of articles which are trying to sell the idea that the organization should basically choose between either implementing Hadoop (which is a powerful tool when it comes to unstructured and complex datasets) or implementing Data Warehouse (which is a powerful tool when it comes to structured datasets). But my question is, can´t they actually go along, since Big Data is about both structured and unstructured data?
This is basically integrating web query interfaces.
I have implemented an algorithm which integrates two geographical datasets. Each record of both datasets must define its geographical coordinates (latitude and longitude) and a label (e.g. the name of the record). I would like to compare my algorithm with existing algorithms in literature.
Can anyone suggest any algorithm which integrates two geographical datasets? Thanks in advance!
In systems biology, most of the time its needed to integrate several layers of information (e.g. genomics, proteomics and transcriptomics). There is software in some levels for example transcriptome to genome integration. What are the mathematical basics? Does anyone have good introductory references?
I tried searching on Pathway Commons, SBML.org and Cytoscape App Store, trying PC2Path, web services as PSICQUICUniversalClient and several plug-in Cytoscape ... the merging results?? ...., different macro-pathway for the same query , duplicate nodes, and new graphs, often missing connections between existing nodes of curated databases as Reactome ... I think that often the problem is due to different languages ...... anyway….advice?
Also I would like to know if there is any good software for automating the process of metabolic network reconstruction for Mycobacterium smegmatis.
Many top publications present results from "advanced" Pathway analysis tools such as IPA or MetaCore as shown on vendors' website.
While this is a convenient, and pretty, way to summarise a mechanism I wonder whether it has ever proved essential in the discovery process.
On the one hand biologists usually know their pathways of interest which are mainly public knowledge. On the other, no fancy tool is required to perform a test for enrichment.
I am therefore curious to find publications were novel, validated, finding was achieved with such a tool that could not have be done with a simpler, cheaper, approach.
Does any one know the exact formula(s) for convex objective functions which their solution is SVD decomposition of matrix X?