Data Science Journal

Published by Committee on Data for Science and Technology
Online ISSN: 1683-1470
Publications
A typical Spectral Energy Distribution (SED) generated automatically by the NASA/IPAC Extragalactic Database (NED) using data collected by many different authors and instruments. 
Article
Astronomy is one of the most data-intensive of the sciences. Data technology is accelerating the quality and effectiveness of its research, and the rate of astronomical discovery is higher than ever. As a result, many view astronomy as being in a 'Golden Age', and projects such as the Virtual Observatory are amongst the most ambitious data projects in any field of science. But these powerful tools will be impotent unless the data on which they operate are of matching quality. Astronomy, like other fields of science, therefore needs to establish and agree on a set of guiding principles for the management of astronomical data. To focus this process, we are constructing a 'data manifesto', which proposes guidelines to maximise the rate and cost-effectiveness of scientific discovery.
 
Article
We describe the current status of CATS (astrophysical CATalogs Support system), a publicly accessible tool maintained at Special Astrophysical Observatory of the Russian Academy of Sciences (SAO RAS) (http://cats.sao.ru) allowing one to search hundreds of catalogs of astronomical objects discovered all along the electromagnetic spectrum. Our emphasis is mainly on catalogs of radio continuum sources observed from 10 MHz to 245 GHz, and secondly on catalogs of objects such as radio and active stars, X-ray binaries, planetary nebulae, HII regions, supernova remnants, pulsars, nearby and radio galaxies, AGN and quasars. CATS also includes the catalogs from the largest extragalactic surveys with non-radio waves. In 2008 CATS comprised a total of about 10e9 records from over 400 catalogs in the radio, IR, optical and X-ray windows, including most source catalogs deriving from observations with the Russian radio telescope RATAN-600. CATS offers several search tools through different ways of access, e.g. via web interface and e-mail. Since its creation in 1997 CATS has managed about 10,000 requests. Currently CATS is used by external users about 1500 times per day and since its opening to the public in 1997 has received about 4000 requests for its selection and matching tasks. Comment: 8 pages, no figures; accepted for publication in Data Science Journal, vol. 8 (2009), http://dsj.codataweb.org; presented at Special Session "Astronomical Data and the Virtual Observatory" on the conference "CODATA 21", Kiev, Ukraine, October 5-8, 2008; replaced incorrect reference arXiv:0901.2085 with arXiv:0901.2805
 
Article
Wikipedia is a prime example of today's value production in a collaborative environment. Using this example, we model the emergence, persistence and resolution of severe conflicts during collaboration by coupling opinion formation with article edition in a bounded confidence dynamics. The complex social behaviour involved in article edition is implemented as a minimal model with two basic elements; (i) individuals interact directly to share information and convince each other, and (ii) they edit a common medium to establish their own opinions. Opinions of the editors and that represented by the article are characterised by a scalar variable. When the editorial pool is fixed, three regimes can be distinguished: (a) a stable mainstream article opinion is continuously contested by editors with extremist views and there is slow convergence towards consensus, (b) the article oscillates between editors with extremist views, reaching consensus relatively fast at one of the extremes, and (c) the extremist editors are converted very fast to the mainstream opinion and the article has an erratic evolution. When editors are renewed with a certain rate, a dynamical transition occurs between different kinds of edit wars, which qualitatively reflect the dynamics of conflicts as observed in real Wikipedia data.
 
Article
The K index was devised by Bartels et al. (1939) to provide an objective monitoring of irregular geomagnetic activity. The K index was then routinely used to monitor the magnetic activity at permanent magnetic observatories as well as at temporary stations. The increasing number of digital and sometimes unmanned observatories and the creation of INTERMAGNET put the question of computer production of K at the centre of the debate. Four algorithms were selected during the Vienna meeting (1991) and endorsed by IAGA for the computer production of K indices. We used one of them (FMI algorithm) to investigate the impact of the geomagnetic data sampling interval on computer produced K values through the comparison of the computer derived K values for the period 2009, January 1st to 2010, May 31st at the Port-aux-Francais magnetic observatory using magnetic data series with different sampling rates (the smaller: 1 second; the larger: 1 minute). The impact is investigated on both 3-hour range values and K indices data series, as a function of the activity level for low and moderate geomagnetic activity.
 
Article
Independent of established data centers, and partly for my own research, since 1989 I have been collecting the tabular data from over 2600 articles concerned with radio sources and extragalactic objects in general. Optical character recognition (OCR) was used to recover tables from 740 papers. Tables from only 41 percent of the 2600 articles are available in the CDS or CATS catalog collections, and only slightly better coverage is estimated for the NED database. This fraction is not better for articles published electronically since 2001. Both object databases (NED, SIMBAD, LEDA) as well as catalog browsers (VizieR, CATS) need to be consulted to obtain the most complete information on astronomical objects. More human resources at the data centers and better collaboration between authors, referees, editors, publishers, and data centers are required to improve data coverage and accessibility. The current efforts within the Virtual Observatory (VO) project, to provide retrieval and analysis tools for different types of published and archival data stored at various sites, should be balanced by an equal effort to recover and include large amounts of published data not currently available in this way. Comment: 11 pages, 4 figures; accepted for publication in Data Science Journal, vol. 8 (2009), http://dsj.codataweb.org; presented at Special Session "Astronomical Data and the Virtual Observatory" on the conference "CODATA 21", Kiev, Ukraine, October 5-8, 2008
 
Article
The following PDF indicates errata for the original article entitled "Improving the Traditional Information Management in Natural Sciences" by Martin Kühne and Andreas W. Liehr.
 
Collaborative forms of Chinese internationally co-authored papers in the SSCI and A&HCI, 1995-2004
Article
Collaborative research is one of the most noteworthy trends in the development of scientific research, and co-authored papers are some of the most important results of this research. With the speed-up of globalization, wider adoption of computers and advanced communication technologies, and more frequent academic exchanges and co-operation, collaborative research across organizations, regions, and fields has provided greater access to Chinese researchers in the humanities and social sciences. Accordingly, co-authored papers have witnessed considerable growth in number and proportion. The Social Sciences Citation Index (SSCI) and the Arts & Humanities Citation Index (A&HCI), published by the Institute for Scientific Information (USA), enjoy a high reputation worldwide as large-scale and comprehensive retrieval systems for international large comprehensive papers and citations. This article aims to reveal the trends of Chinese collaborative research in the humanities and social sciences from the perspective of bibliometrics and offer advice for Chinese researchers and managers in these fields, by analyzing Chinese co-authored papers in the humanities and social sciences indexed in the SSCI and A&HCI in the last decade (1995-2004).
 
Article
Based on data for the years 1995 to 2002, this paper has established a panel data model that reflects the relationship between China's foreign direct investment and China's exports, and regarding this, empirical analysis is made. The selected countries and regions include: Hong Kong of China, China's Taiwan, Japan, South Korea, the European Union, and the United States. We have found that the relationship between the accumulated FDI (FDE stock) of different countries and regions in China and Chinese exports to the target countries is quite strong.
 
Article
The 20th International CODATA Conference marked the 40th Anniversary of CODATA, and the breadth of the presentations truly reflects how far the importance of scientific and technical (S&T) data has come in that time. CODATA, as the major international organization devoted to S&T data, provides a mechanism for advancing all aspects of data work, including their collection, management, analysis, display, dissemination, and use by sharing across disciplines and across geographic boundaries. Equally important, CODATA addresses economic, political, social, and business issues, including intellectual property rights, the pervasiveness of the internet, the digital divide, national, regional and international data policies, and the impact modern connectivity has on science and society.
 
Article
This paper briefly reviews the activities of the International Council for Science (ICSU) World Data Centers (WDCs) in Japan at a time of great change in the data and information structures of the ICSU - the creation of the World Data System (WDS) in 2009. Seven WDCs are currently operating in Japan: the WDC for Airglow, the WDC for Cosmic Rays, the WDC for Geomagnetism, Kyoto, the WDC for Ionosphere, the WDC for Solar Radio Emission, and the WDC for Space Science Satellites. Although these WDCs are highly active, a long-term support system must be established to ensure the stewardship and provision of quality-assessed data and data services to the international science community.
 
Article
The following PDF indicares errata for the original article entitled "A Data-Showcase System for the Geospace" by A Saito and D Yoshid.
 
Article
The following PDF indicates errata for the original article entitled "Space Weather and Real-Time Monitoring" by S Watari.
 
Article
The following PDF indicates errata for the original article entitled "Fifty Years of HF Doppler Observations" by T. Ogawa and T. Ichinose.
 
Article
The following are errata for the original article entitled "Hypothesis of Piezoelectricity of Inner Core As the Origin of Geomagnetism" by Y. Hayakawa.
 
Article
The following are errata for the original article entitled "Toward Implementation of the Global Earth Observation System of Systems Data Sharing Principles" by Paul F. Uhlir, Robert S. Chen, Joanne Irene Gabrynowicz & Katleen Janssen.
 
CEOP metadata elements for satellite imagery data
UML model of specific metadata elements (an example) valueForMissingData::MD_Band − whether data are not captured though a grid-cell is in the observation area outOfObservation::MD_Band − description of a grid-cell when the grid-cell is out of the observation area observationAreaRatio::MD_ImageDescription − observation area ratio endian::CEOP_Endian − a mixture of endian by organizations and systems that generated the satellite geocoded image product. fromNorth::CEOP_OrderOfDataRecording and fromSouth::CEOP_OrderOfDataRecording − some data products have pixels starting from the north, while some others have pixels starting from the south. The information of "from north" and "from south" is needed to describe the order of data recording special attributes for CEOP application. name::CEOP_Format − name of the data transfer format version::CEOP_Format − version of the format (date, number, etc) blank::CEOP_Format − blank, special attribute for CEOP application
Article
This paper reviews the present status and major problems of the existing ISO standards related to imagery metadata. An imagery metadata model is proposed to facilitate the development of imagery metadata on the basis of conformance to these standards and combination with other ISO standards related to imagery. The model presents an integrated metadata structure and content description for any imagery data for finding data and data integration. Using the application of satellite data integration in CEOP as an example, satellite imagery metadata is developed, and the resulting satellite metadata list is given.
 
Article
The complexity and sophistication of large scale analytics in science and industry have advanced dramatically in recent years. Analysts are struggling to use complex techniques such as time series analysis and classification algorithms because their familiar, powerful tools are not scalable and cannot effectively use scalable database systems. The 2nd Extremely Large Databases (XLDB) workshop was organized to understand these issues, examine their implications, and brainstorm possible solutions. The design of a new open source science database, SciDB that emerged from the first workshop in this series was also debated. This paper is the final report of the discussions and activities at this workshop.
 
Article
The National Institute of Standards and Technology (NIST) is developing a digital library to replace the widely used National Bureau of Standards Handbook of Mathematical Functions published in 1964. The NIST Digital Library of Mathematical Functions (DLMF) will include formulas, methods of computation, references, and links to software for over forty functions. It will be published both in hardcopy format and as a website featuring interactive navigation, a mathematical equation search, 2D graphics, and dynamic interactive 3D visualizations. This paper focuses on the development and accessibility of the 3D visualizations for the digital library. We examine the techniques needed to produce accurate computations of function data, and through a careful evaluation of several prototypes, we address the advantages and disadvantages of using various technologies, including the Virtual Reality Modeling Language (VRML), interactive embedded graphics, and video capture to render and disseminate the visualizations in an environment accessible to users on various platforms.
 
is the 3D satellite cloud image created in Visual C++. The first step is curve interpolation of DEM data; the second is triangular approximation, and the last is color rendering. Having DEM, we can also convert it into another 3D model in Open Flight format with the software Multigen Creator. Thus we can use it as an object in software Multigen Vega. 
The 3D satellite cloud image created in Visual C++ The following C++ function is a procedure to create 3D satellite cloud imaging based on OpenGL. In this function, we put many 3-order Bezier cambers together to create the full image. Suppose the mesh after sampling is an M by N grid, and the horizontal and vertical widths of cells are xStride and yStride respectively, while the number of cells is x Num and y Num. Variable order is the order of camber, and the pointer grid points to the data of the elevation of the image (Zhang Xiu-shan, 1999).
Article
Using satellite cloud images to simulate clouds is one of the new visual simulation technologies in Virtual Reality (VR). Taking the original data of satellite cloud images as the source, this paper depicts specifically the technology of 3D satellite cloud imaging through the transforming of coordinates and projection, creating a DEM (Digital Elevation Model) of cloud imaging and 3D simulation. A Mercator projection was introduced to create a cloud image DEM, while solutions for geodetic problems were introduced to calculate distances, and the outer-trajectory science of rockets was introduced to obtain the elevation of clouds. For demonstration, we report on a computer program to simulate the 3D satellite cloud images.
 
Article
Retrieval systems for 3D objects are required because 3D databases used around the web are growing. In this paper, we propose a visual similarity based search engine for 3D objects. The system is based on a new representation of 3D objects given by a 3D closed curve that captures all information about the surface of the 3D object. We propose a new 3D descriptor, which is a combination of three signatures of this new representation, and we implement it in our interactive web based search engine. Our method is compared to some state of the art methods, tested using the Princeton-Shape Benchmark as a large database of 3D objects. The experimental results show that the enhanced curve analysis descriptor performs well.
 
DIR histograms of the table and the car Figure 5 illustrates the DIR of the table and car from Figure 3. We can see that the two DIRs are obviously different. The DIR of the table indicates there are many low ratios because the table is generally a concave model and many line segments lie outside it. In contrast, the car is rather convex, so there are a large number of line segments inside it, and its DIR is dominated by high ratios. Generally DIR provides additional shape information than provided by D2. We can use the DIR to filter out dissimilar models that D2 is unable to distinguish. The dissimilarity between 3D models A and B can be measured by a weighted sum of D2 and DIR:
A mesh model and the corresponding voxel model
Article
With the development of computer graphics and digitalizing technologies, 3D model databases are becoming ubiquitous. This paper presents a method for content-based searching for similar 3D models in databases. To assess the similarity between 3D models, shape feature information of models must be extracted and compared. We propose a new 3D shape feature extraction algorithm. Experimental results show that the proposed method achieves good retrieval performance with short computation time.
 
Geology model of a complicated cavity based on RC data 
3D modeling method (Wu, 2003)
Transparent model of a complicated cavity based on data 
3D transparent geology model based on MT data 
Article
3D modeling and visualization of geology volume is very important to interpret accurately and locate subsurface geology volume for mining exploration and deep prospecting. However, it faces a lack of information because the target area is usually unexplored and lacks geological data. This paper presents our experience in applying a 3D model of geology volume based on geophysics. This work has researched and developed a 3D visualization system. It is based on an OO (orientated object) approach and modular programming, uses the C ++ language and Microsoft .NET platform. This system has built first a high resistivity method and MT database. The system uses irregular tetrahedrons to construct its model and then finally has built the 3D geological model itself.
 
3D modelling of а part of Varna city in 
A monument, 3D symbols and buildings in a 3D map of small historical city in Bulgaria, created by Dimitar Rashev Environmental pollution – 3D maps can be used to illustrate the distribution of different kind of pollutants, simulate global warming and noise distributions. 
3D map “a street in Vienna”, created by ICG, TUGraz and 3D symbols created by T. Bandrova 
A near view in 3D map of small historical city Koprivshtica in Bulgaria, created by Dimitar Rashev 
DTM (Digital Terrain Model) of small tourist city Koprivshtica by Dimitur Rashev 
Article
"From Paper to Virtual Map" is an innovative technology for creating 3D (three-dimensional) maps. The technology proposed as a very cheap and easy way to create 3D maps. A powefulr graphic station is not necessary for this aim. This is very important for countries such as Bulgaria where is not easy to get expensive computer equipment. This technology, proposed by the author was developed from a novel application - a 3D cartographic symbol system. The 3D city maps created consist of 3D geometry, topographic information and photo-realistic texturing and 3D symbols, which contain quantitative and qualitative information about the objects. Animated presentations are also available according to users' needs.
 
Network structure of a material thesaurus subtree which begins from the term “alloy” 
Entries in a metallurgical thesaurus relate to the entry "Creep"
Relations of concepts related to creep properties of materials
A flow of creep data analysis with Ontology and Rule
Article
A standardized data schema for material properties in XML is under development to establish a common and exchangeable expression. The next stage toward the management of knowledge about material usage, selection or processing is to define an ontology that represents the structure of concepts related to materials, e.g., definition, classification or properties of material. Material selection for designing artifacts is a process that translates required material properties into material substances, which in turn requires a definition of data analysis and rules to interpret the result. In this paper, an ontology structure to formalize this kind of process is discussed using an example of the translation of creep property data into design data.
 
Article
The 5 th XLDB workshop brought together scientific and industrial users, developers, and researchers of extremely large data and focused on emerging challenges in the healthcare and genomics communities, spreadsheet-based large scale analysis, and challenges in applying statistics to large scale analysis, including machine learning. Major problems discussed were the lack of scalable applications, the lack of expertise in developing solutions, the lack of respect for or attention to big data problems, data volume growth exceeding Moore's Law, poorly scaling algorithms, and poor data quality and integration. More communication between users, developers, and researchers is sorely needed. A variety of future work to help all three groups was discussed, ranging from collecting challenge problems to connecting with particular industrial or academic sectors.
 
Article
ABSTRACT Successful resource discovery across heterogeneous repositories is strongly dependent on the semantic and syntactic homogeneity of the associated resource descriptions. Ideally, resource descriptions are easily extracted from pre-existing standardized sources, expressed using standard syntactic and semantic structures, and managed and accessed within a distributed, flexible, and scaleable software framework. The Object Oriented Data Technology task has developed a standard resource description scheme that can be used across domains to describe any resource. It uses a small set of generally accepted, broad scope descriptors while also providing a mechanism for the inclusion of domain specific descriptors. In addition this standard description scheme can be used to capture hierarchical, relational, and recursive relationships between resources. In this paper we will present a intelligent resource discovery framework that consists of separate data and technology architectures, the standard resource description scheme, and illustrate the concept using a case study.
 
Article
Data Science was used as an academic discipline in the 1990s, with the initiative taken by some of the pioneers from CODATA. Data science has developed to include the study of the capture of data, their analysis, metadata, fast retrieval, archiving, exchange, and mining to find unexpected knowledge and data relationships, visualization in two or three dimensions including movement, and management, intellectual property rights and other legal issues. Data is now held in large databases on centrally located main frames but has become scattered across an Internet, instantly accessible by personal computers, that can themselves store gigabytes of data. Measurement technologies have also improved in quality and quantity with measurement times reduced by orders of magnitude. Every area of science, be it astronomy, chemistry, geoscience, physics, biology is becoming based on models depending on large bodies of data. Data Science is now being included in the text books of graduate students.
 
Data-sharing Among 14 CAS Institutes  
Data-Sharing Procedure Perspective 3 METADATA STANDARDS FOR RESOURCE AND ENVIRONMENT SCIENCE DATA Metadata are data that describe the content, quality, condition, and other characteristics of datasets (FGDC, 1997; SDI, 2001). Metadata, the core of the system, are necessary for users to locate,  
Meta-database Configuration Figure 7. Data Server Configuration  
Common Metadata Model Configuration  
Main User Interfaces  
Article
The data sharing system for resource and environment science databases of the Chinese Academy of Science (CAS) is of an open three-tiered architecture, which integrates the geographical databases of about 9 institutes of CAS by the mechanism of distributive unstructured data management, metadata integration, catalogue services, and security control. The data tiers consist of several distributive data servers that are located in each CAS institute and support such unstructured data formats as vector files, remote sensing images or other raster files, documents, multi-media files, tables, and other format files. For the spatial data files, format transformation service is provided. The middle tier involves a centralized metadata server, which stores metadata records of data on all data servers. The primary function of this tier is catalog service, supporting the creation, search, browsing, updating, and deletion of catalogs. The client tier involves an integrated client that provides the end-users interfaces to search, browse, and download data or create a catalog and upload data.
 
Structure of Fuzzy-Rule generation process
The Fuzzy Membership Function (MF) for the sensory scores. AV is low; AA is average, and HA is high. 
Linguistic terms in the data set
Article
The prediction of product acceptability is often an additive effect of individual fuzzy impressions developed by consumer on certain underlying attributes characteristic of the product. In this paper we present the development of a data-driven fuzzy-rule-based approach for predicting the overall sensory acceptability of food products in this case composite cassava-wheat bread. The model was formulated using the Takagi-Sugeno and Kang (TSK) fuzzy modeling approach. Experiments with the model derived from sampled data were simulated on Windows 2000XP running on Intel 2Gh environment. The fuzzy membership function for the sensory scores is implemented in MATLAB 6.0 using the fuzzy logic toolkit and weights of each linguistic attribute were obtained using a Correlation Coefficient formula. The results obtained are compared to those of human judgments. Overall assessments suggest that if implemented this approach will facilitate a better acceptability of composite bread.
 
Comparison of CODATA 2006 recommended values with those calculated by using converted dimensionless values in zero zone system of units.
Article
Finding out how many parameters are necessary to explain and describe complex and various phenomena of nature has been a challenge in modern physics. This paper introduces a new formal system of units, which maintain compatibility with SI units, to express all seven SI base units by dimensionless numbers with acceptable uncertainties and to establish the number one as the fundamental parameter of everything. All seven SI base units are converted successfully into the unified dimensionless numerical values via normalization of s, c, h, k, e/me, NA, and b by unity (1). In the proposed system of units, even the unlike-dimensioned physical quantities can be convertible and hence added, subtracted, or compared to one another. It is very simple and easy to analyze and validate physical equations by substituting every unit with the corresponding number. Furthermore, it is expected to find new relationships among unlike-dimensioned physical quantities, which is extremely difficult or even impossible in SI units.
 
Article
This research investigates the applicability of Davis's Technology Acceptance Model (TAM) to agriculturist's acceptance of a knowledge management system (KMS), developed by the authors. It is called AGROWIT. Although the authors used previous Technology Acceptance Model user acceptance research as a basis for investigation of user acceptance of AGROWIT, the model had to be extended and constructs from the Triandis model that were added increased the predictive results of the TAM, but only slightly. Relationships among primary TAM constructs used are in substantive agreement with those characteristic of previous TAM research. Significant positive relationships between perceived usefulness, ease of use, and system usage were consistent with previous TAM research. The observed mediating role of perceived usefulness in the relationship between ease of use and usage was also in consonance with earlier findings. The findings are significant because they suggest that the considerable body of previous TAM-related information technology research may be usefully applied to the knowledge management domain to promote further investigation of factors affecting the acceptance and usage of knowledge management information systems such as AGROWIT by farmers, extension workers, and agriculture researchers.
 
Article
Ministers of science and technology asked the OECD in January 2004 to develop international guidelines on access to research data from public funding. The resulting Principles and Guidelines for Access to Research Data from Public Funding were recently approved by OECD governments and are discussed below. They are intended to promote data access and sharing among researchers, research institutions, and national research agencies. OECD member countries have committed to taking these principles and guidelines into account in developing their own national laws and research policies, taking account of differences in their respective national context.
 
Components of a Data Access Regime
Article
Access to and sharing of data are essential for the conduct and advancement of science. This article argues that publicly funded research data should be openly available to the maximum extent possible. To seize upon advancements of cyberinfrastructure and the explosion of data in a range of scientific disciplines, this access to and sharing of publicly funded data must be advanced within an international framework, beyond technological solutions. The authors, members of an OECD Follow-up Group, present their research findings, based closely ontheir report to OECD, on key issues in data access, as well as operating principles and management aspects necessary to successful data access regimes.
 
Article
A distributed infrastructure that would enable those who wish to do so to contribute their scientific or technical data to a universal digital commons could allow such data to be more readily preserved and accessible among disciplinary domains. Five critical issues that must be addressed in developing an efficient and effective data commons infrastructure are described. We conclude that creation of a distributed infrastructure meeting the critical criteria and deployable throughout the networked university library community is practically achievable.
 
* (Courtesy of Mark Jordan)
Article
Scholarly data, such as academic articles, research reports and theses/dissertations, traditionally have limited dissemination in that they generally require journal subscription or affiliation with particular libraries. The notion of open access, made possible by rapidly advancing digital technologies, aims to break the limitations that hinder academic developments and information exchange. This paper presents the Electronic Thesis & Dissertation (ETD) Project at the Simon Fraser University Library, British Columbia, Canada, and discusses various technological considerations associated with the Project including selection of software, capture of metadata, and long-term preservation of the digitized data. The paper concludes that a well-established project plan that takes into account not only technological issues but also issues relating to project policies, procedures, and copyright permissions that occur in the process of providing open access plays a vital role for the overall success of such projects.
 
Article
As an important part of the science and technology infrastructure platform of China, the Ministry of Science and Technology launched the Scientific Data Sharing Program in 2002. Twenty-four government agencies now participate in the Program. After five years of hard work, great progress has been achieved in the policy and legal framework, data standards, pilot projects, and international cooperation. By the end of 2005, one-third of the existing public-interest and basic scientific databases in China had been integrated and upgraded. By 2020, China is expected to build a more user-friendly scientific data management and sharing system, with 80 percent of scientific data available to the general public. In order to realize this objective, the emphases of the project are to perfect the policy and legislation system, improve the quality of data resources, expand and establish national scientific data centers, and strengthen international cooperation. It is believed that with the opening up of access to scientific data in China, the Program will play a bigger role in promoting science and national innovation.
 
Mind-Map generated by the NCASRD discussion
Article
In June 2004, an expert Task Force, appointed by the National Research Council Canada and chaired by Dr. David Strong, came together in Ottawa to plan a National Forum as the focus of the National Consultation on Access to Scientific Research Data. The Forum, which was held in November 2004, brought together more than seventy Canadian leaders in scientific research, data management, research administration, intellectual property and other pertinent areas. This article presents a comprehensive review of the issues, and the opportunities and the challenges identified during the Forum. Complex and rich arrays of scientific databases are changing how research is conducted, speeding the discovery and creation of new concepts. Increased access will accelerate such changes even more, creating other new opportunities. With the combination of databases within and among disciplines and countries, fundamental leaps in knowledge will occur that will transform our understanding of life, the world and the universe. The Canadian research community is concerned by the need to take swift action to adapt to the substantial changes required by the scientific enterprise. Because no national data preservation organization exists, may experts believe that a national strategy on data access or policies needs to be developed, and that a "Data Task Force" be created to prepare a full national implementation strategy. Once such a national strategy is broadly supported, it is proposed that a dedicated national infrastructure, tentatively called "Data Canada", be established, to assume overall leadership in the development and execution of a strategic plan.
 
Schematic view of the development of the internet data and information transfer illustrating the history of data and information transfer methods. Adapted from a web page of the now defunct Distributed Systems Technology Center (DSTC) that was supported until June 2006 by the Australian Government's Cooperative Research Center program. 
Article
The vision of the Electronic Geophysical Year (eGY) is that we can achieve a major step forward in geoscience capability, knowledge, and usage throughout the world for the benefit of humanity by accelerating the adoption of modern and visionary practices such as virtual observatories for managing and sharing data and information. eGY has found that the biggest challenges to implementing the vision are educating program mangers and senior scientists on the need for modern data management techniques and providing incentives for practitioners of the new field of geoinformatics.
 
Diagram of a policy chain 
Comparing Administrative Delegation Policy and Access Policy
Data Flow Model of Access Protocol Supporting Decentralized Authorization 
CBL Architecture Overview 
Article
In an e-Science environment, large-scale distributed resources in autonomous domains are aggregated by unified collaborative platforms to support scientific research across organizational boundaries. In order to enhance the scalability of access management, an integrated approach for decentralizing the task from resource owners to administrators on the platform is needed. We propose an extensible access management framework to meet this requirement by supporting an administrative delegation policy. This feature allows administrators on the platform to make new policies based on the original policies made by resources owners. An access protocol that merges SAML and XACML is also included in the framework. It defines how distributed parties operate with each other to make decentralized authorization decisions.
 
Structure of the system 
Article
In this paper the development of a new internet information system for analyzing and classifying melanocytic dat, is briefly described. This system also has some teaching functions, improves the analysis of datasets based on calculating the values of the TDS (Total Dermatoscopy Score) (Braun-Falco, Stolz, Bilek, Merkle, & Landthaler, 1990; Hippe, Bajcar, Blajdo, Grzymala-Busse, Grzymala-Busse, & Knap, et al., 2003) parameter. Calculations are based on two methods: the classical ABCD formula (Braun-Falco et al., 1990) and the optimized ABCD formula (Alvarez, Bajcar, Brown, Grzymala-Busse, & Hippe, 2003). A third method of classification is devoted to quasi-optimal decision trees (Quinlan, 1993). The developed internet-based tool enables users to make an early, non-invasive diagnosis of melanocytic lesions. This is possible using a built-in set of instructions that animates the diagnosis of the four basic lesions types: benign nevus, blue nevus, suspicious nevus and melanoma malignant. This system is available on the Internet website: http://www.wsiz.rzeszow.pl/ksesi.
 
Article
Researchers collect data from both individuals and organizations under pledges of confidentiality. The U.S. Federal statistical system has established practices and procedures that enable others to access the confidential data it collects. The two main methods are to restrict the content of the data (termed "restricted data") prior to release to the general public and to restrict the conditions under which the data can be accessed, i.e., at what locations, for what purposes (termed "restricted access"). This paper reviews restricted data and restricted access practices in several U.S. statistical agencies. It concludes with suggestions for sharing confidential social science data.
 
A schematic diagram of different metadata types.
Article
In the process of implementing a protocol for the transport of science data, the Open Source Project for a Network Data Access Protocol (OPeNDAP) group has learned a considerable amount about the internal anatomy of what are commonly considered monolithic concepts. In order to communicate among our group, we have adopted a collection of definitions and observations about data and the metadata that make them useful: differentiating between "semantic" and "syntactic" metadata, and defining categories such as "translational" and "use" metadata. We share the definitions and categorizations here in the hope that others will find them as useful as we do.
 
Article
The study goal was to investigate thyroid cancer morbidity in population groups affected by the Chernobyl catastrophe. The study period comprised 1994-2006 for clean-up workers and 1990-2006 for Chernobyl evacuees and residents of contaminated territories. A significant increase of thyroid cancer incidence was registered in all observed population groups. The most significant excess over the national level was identified in clean-up workers. This amounted to a factor of 5.9, while it was 5.5 for the evacuees and 1.7 for the residents. The highest thyroid cancer risk was observed in persons exposed to radioiodine in childhood and adolescence.
 
Article
There are massive amounts of process data in the usual course of doing engineering. How to choose and accumulate these data to provide reference for newly-built projects in designing and building is a question that project superintendents face. We propose to construct a knowledge management platform for engineering project management to realize the potential of the accumulated decision-making data and study data classification and knowledge management, using architectural engineering data as an example.
 
Article
KISTI-ACOMS has been developed as a part of the national project for raising the information society index of academic societies that began in 2001. ACOMS automates almost every activity of academic societies, including membership management, journal processing, conference organization, and e-journal management and provides a search system. ACOMS can be customized easily by the system administrator of an academic society. The electronic databases built by ACOMS are serviced to the users through the KISTI website (http://www.yeskisti.net) along with other journal databases created in a conventional way. KISTI plans to raise the usage ratio of the ACOMS database to deliver society services up to 100% in the future.
 
Article
Ultra-low frequency acoustic waves called "acoustic gravity waves" or "infrasounds" are theoretically expected to resonate between the ground and the thermosphere. This resonance is a very important phenomenon causing the coupling of the solid Earth, neutral atmosphere, and ionospheric plasma. This acoustic resonance, however, has not been confirmed by direct observations. In this study, atmospheric perturbations on the ground and ionospheric disturbances were observed and compared with each other to confirm the existence of resonance. Atmospheric perturbations were observed with a barometer, and ionospheric disturbances were observed using the HF Doppler method. An end point of resonance is in the ionosphere, where conductivity is high and the dynamo effect occurs. Thus, geomagnetic observation is also useful, so the geomagnetic data were compared with other data. Power spectral density was calculated and averaged for each month. Peaks appeared at the theoretically expected resonance frequencies in the pressure and HF Doppler data. The frequencies of the peaks varied with the seasons. This is probably because the vertical temperature profile of the atmosphere varies with the seasons, as does the reflection height of infrasounds. These results indicate that acoustic resonance occurs frequently.
 
Article
The Space Environment Data Acquisition equipment (SEDA), which was mounted on the Exposed Facility (EF) of the Japanese Experiment Module (JEM, also known as "Kibo") on the International Space Station (ISS), was developed to measure the space environment along the orbit of the ISS. This payload module, called the SEDA-Attached Payload (AP), began to measure the space environment in August 2009. This paper reports the mission objectives, instrumentation, and current status of the SEDA-AP.
 
Main scientific mission of the QSAT satellite project 
FMT of CHAIN network 
IPS observation network 
Article
The IHY Japanese National Steering Committee (STPP subcommittee of the Science Council of Japan) has been promoting and supporting (1) two satellite missions, (2) five ground-based networks, (3) public outreach, (4) international and domestic workshops, and (5) the nomination of IGY Gold Club members. In the present paper we introduce these IHY activities, briefly summarize them, and suggest several post-IHY activities.
 
Antarctic Data Directory System (Source:  
Collected metadata per sector. Upper panel: Polar Italian Data Center. Lower panel: Antarctic Master Directory (courtesy Melanie Meaux)
Italian metadata growth 
Article
Activities performed to develop an information system for the diffusion of Italian polar research (SIRIA project) are here described. The system collects and shares information related to research projects carried out in both the Antarctic (since 1985) and Arctic (since 1997) regions. It is addressed primarily to dedicated users in order to foster interdisciplinary research but non-specialists may also be interested in the major results. SIRIA is in charge of managing the National Antarctic Data Center of Italy and confers its metadata to the Antarctic Master Directory. Since 2003, the National Antarctic Research Program has funded this project, which, by restyling its tasks, databases, and web site, is becoming the portal of Italian polar research. Issues concerning data management and policy in Italy are also covered.
 
Top-cited authors
Li Cai
  • Fudan University
Dan Seidov
  • National Oceanic and Atmospheric Administration
Alexey V Mishonov
  • University of Maryland, College Park
Daphne R. G. Johnson
  • National Oceanic and Atmospheric Administration
Hernan Garcia
  • National Oceanic and Atmospheric Administration (NOAA)