Working PaperPDF Available

Utilizing Data Mining and Spatial Analysis to Evaluate the Effects of Mineral Extraction on Water Quality in South Africa

Authors:
Michael Hoefer Mining and Water Quality in South Africa
1
Michael Hoefer
Iowa State University
Utilizing Data Mining and Spatial Analysis to
Evaluate the Effects of Mineral Extraction on
Water Quality in South Africa
RWTH Aachen University
Faculty of Business and Economics
Chair of International Economics
MSc. Hanna Krings, Supervisor
UROP Undergraduate Research Opportunities Program
May 26th - July 25th, 2014
Aachen 2014
Michael Hoefer Mining and Water Quality in South Africa
2
Table of Contents
Introduction/Abstract ........................................................................................................... 3
Project Description ............................................................................................................... 4
Data ........................................................................................................................................... 4
Mining .................................................................................................................................... 4
Water Quality ......................................................................................................................... 5
Land, Cities, and Rivers ......................................................................................................... 7
Methodology ........................................................................................................................... 7
Visual Representation............................................................................................................. 7
Quantitative Effects of Mining ............................................................................................... 8
Results and Discussion ....................................................................................................... 9
Regression Output ................................................................................................................ 12
Issues and future research potential .............................................................................14
Summary ................................................................................................................................15
Evaluation / What I learned ...............................................................................................15
Acknowledgements.............................................................................................................16
References.............................................................................................................................16
Michael Hoefer Mining and Water Quality in South Africa
3
Introduction/Abstract
The purpose of this research is to investigate the effects of resource extraction
on the environment. Existing environmental metrics that look at the environmental
condition of a country, such as CO2 emissions or deforestation percentages, cannot
be attributed specifically to resource extraction. The long term goal is examine the
relationship between resource extraction and the environment, and determine which
factors affect that relationship. A geographical information system (GIS) is commonly
used to evaluate the local impacts of resource extraction on the environment (Gray,
1997). If it can be shown, on a small scale, that mining is heavily associated with water
quality, then perhaps it makes sense to look at larger scale economic models for
assessing resource extraction. Once these models are empirically shown to be at
least somewhat plausible, the link between democracy, the environment, and resource
extraction can be investigated.
To do this, however, one must first look closer at a single region to determine
the specific effect of resource extraction on the environment. The country of South
Africa was chosen as a case study for multiple reasons, listed in the project description.
If the effects of resource extraction can be understood for one country, the same
techniques can be applied to other countries. Then, the countries can be compared to
determine the factors that affect the resource-environment relationship. That being
said, the purpose of this article is to examine the effects of mineral extraction on water
quality in South Africa. The hypothesis is that there is a negative correlation between
water quality and proximity to mining locations.
Per results of previous studies, the water quality metric of focus was SO4
concentrations (Silberbauer, 2011). Spatial data with SO4 measurements was
collected via python scripts and analyzed using GIS. The resulting maps visually
indicate a positive correlation between distance from mines and water sulfur
concentrations. It is expected that the closer the water quality measurement is to a
mining location, the higher the sulfur content. A statistically significant coefficient at the
.99 level was discovered, indicating our hypothesis has some merit. However, there
are many additional factors that affect the SO4 levels in water, and our R2 value is quite
low. An area of concern is that some SO4 readings are well over the EPA’s safe level
for drinking water, 250 mg/l (EPA, 2012), and indicate that improved mining regulations
and enforcement may improve overall water quality in areas near mines in South
Michael Hoefer Mining and Water Quality in South Africa
4
Africa. The work in this paper also serves as a stepping stone for further research on
the environmental impact of resource extraction.
Project Description
South Africa is one of the largest mining countries in the world, producing more
chrome, manganese, platinum, vanadium, and vermiculite than any other country.
South Africa produces over 10% of the world’s gold, and produces 224 million tonnes
of coal annually. Over one million South Africans are employed in the mining industry
(Mining Intelligence Database, 2014). While this mineral extraction helps bolster the
South African economy, mining can have negative environmental impacts on water,
air, and land quality. Woldai discusses the various environmental impact of mining, as
well as various GIS methods that can be applied to measure land impact (2001). In a
GIS case study performed by Karimipour et all, the concentration of mines was shown
to have a statistically significant positive effect on the pH level in nearby rivers in Iran
(2005, pp 71). The goal of this paper is to utilize similar techniques and determine a
quantitative effect of mining on water quality.
Data was collected from various sources online, either in precompiled
shapefiles or scraped from a website using a python script. The data was analyzed in
ArcGIS with various visualization features and ordinary least squares regressions.
Data
Mining
The attribute data of 699 mines in South Africa was taken from the U.S.
Geological Survey (USGS, 2014). This data contains coordinate location, production
dates, mineral types, and current status of the mines. However, much of the data is
incomplete (null values), making quantitative analysis difficult. For example, production
dates are only available for 245 of the 699 mining locations.
Michael Hoefer Mining and Water Quality in South Africa
5
Clearly, mines that have not been built yet will have no effect on water quality.
A distribution of the first year of production for each mine is shown in Figure 1. The
latest recorded start date for a mine was 1993, with an average starting year of 1949
(Figure 1). Treating the distribution as normal, 95% of the mines started before the
year 1997 with a tolerance interval of 90% confidence. This shows that it is safe to
assume almost all of the mines were in production before the year 2000.
While the environmental impact of an individual mine will likely change as
production levels change, abandoned mines have been shown to have a significant
effect on the environment well after the mines are no longer under production.
Abandoned mines were often not cleaned up properly, and may continue to leach
chemicals into the surrounding waters (Khalil et all, 2014). For this reason, it is
assumed that all mining activity in the past still has some effect on water quality
measurements taken recently.
The biggest limitation of the data is the lack of production volume information. It
is difficult to distinguish between mines that are producing large volumes and those
that are producing very little. Due to this lack of information, all mining locations are
treated equally. For the calculations, only the geographic coordinates of the mines are
used. In the regressions, the number of mines in each municipality is used, and the
distance from water quality measurement to the nearest mine is used.
Water Quality
Water quality measurements are available from the South Africa Water and
Sanitation Department (South Africa, 2014). Water quality data is available for both
surface water (rivers, ponds, reservoirs, etc) and ground water (drilled boreholes,
wells, etc). Regressions were performed only on surface water quality data.
Figure 1. Distribution of the first year of production for 245 mines in South Africa.
Michael Hoefer Mining and Water Quality in South Africa
6
Regressions using ground water were avoided because of the difficultly involved in
controlling for depth of sample.
While GIS compatible KML files are available for each drainage region, the files
only contain the coordinate location of each water quality measurement point (and no
actual SO4 value). The SO4 measurements are only available as separate CSV files
for each point reading, downloadable as individual zip files from the Water and
Sanitation Department website. To perform spatial analysis, the data must be
combined in one shapefile. To do this, a python script was written that automatically
scrapes each table of data points (Hoefer, 2014). The script temporarily downloads the
zip file, extracts the data from the individual CSV files, and combines the water quality
data with the geographic coordinates found in the online tables. The output of the script
is one CSV file per drainage region, taking the most recent water quality reading from
each geographic station. A windows batch file was used to combine the output of the
python script into one large CSV file, for importing into ArcGIS.
Readings before the year 2000 were ignored for multiple reasons. First, the
python script would take significantly longer to run, and it was infeasible given the time
constraints of the research project. Secondly, the number of data points taken after the
year 2000 is extremely large, with nearly 9000 borehole measurements and over 4000
surface water measurements. Additional data points were not necessary to make
statistically significant inferences. Finally, the data needed to be measured after the
mines were in use. As inferred above from Figure 1, nearly all of the mines were in
production before the year 2000.
Figure 2. Distribution of SO4 Concentrations of surface water measurements that are
within one decimal degree of a mine.
Michael Hoefer Mining and Water Quality in South Africa
7
For all regressions, water data was excluded that had a distance of more than
one decimal degree (about 50 miles) away from the nearest mine. This removed data
points that would not be affected by mining, and would only skew the results. In
addition, only the middle 99 percent of sulfur readings were used, to remove outliers.
This corresponds to values that are between 1.47 and 3858.80 mg/l (Figure 2).
Land, Cities, and Rivers
The base shapefile was taken from the 2011 South African Census (Africa Open
Data, 2011). The 243 municipalities were used as the basic administrative units in
which the maps were created. The shapefile only has information about the geographic
location of each municipality. Additional demographic data for each municipality can
be found on from the 2011 South African Census (Statistics South Africa, 2011). There
is potential for this data to be added as additional control variables in regressions.
However, extracting this data and adding the information to the existing shapefile
would require either a significant amount of hand data entry, or writing another
extraction script.
A shapefile of the cities and rivers in South Africa was downloaded from Natural
Earth and included in the maps (Natural Earth, 2014). The city data also contained
population information.
Methodology
Data was analyzed using the industry standard GIS software ArcGIS. Using the
water quality CSV files produced by the python script, the data was imported into
ArcGIS as a feature class. This created a map of all the water quality measurement
points with the respective sulfur measurements.
Visual Representation
The first step was to visualize the water quality with respect to mining locations.
The goal was to create a shaded map that shows the average water quality level in
each municipality, and overlay the averages with mine locations. The shapefile of
municipalities was spatially joined with the water quality measurements located inside
each municipality. For each municipality, an unweighted average sulfur value was
Michael Hoefer Mining and Water Quality in South Africa
8
assigned based on the value of each data measurement inside that municipality. The
date of the water quality measurement was ignored, i.e., all water quality
measurements between 2000 and 2014 were averaged together and assigned to the
containing municipality. The shade of each municipality was displayed based on the
average sulfur level. Major cities and rivers were also added to the map. A separate
map was created for surface water and ground water. A copy of the maps and
discussion can be found in the results section.
Quantitative Effects of Mining
The second methodology involved spatial regressions to investigate the
quantitative effects of mining on surface water quality. Two approaches were used.
The first approach treats each municipality as a separate data point. Each municipality
is assigned an SO4 value, calculated as an unweighted average of each water quality
measurement inside the respective municipality. In addition, each municipality is
assigned a “Number of Mines value that represents the number of mining locations
geographically inside each respective municipality.
Avg_SO4 = β0+ β1*Number of Mines (1)
The next approach focused on the individual water measurements, rather than
the municipality averages. To do this, the mining data was joined to the water quality
measurements. For each water quality data point, an additional variable was added
called Distance to nearest mine.” This represents the distance between the location
of water quality measurement and the nearest mine. Different ordinary least squares
regressions were run using the following equations.
SO4 = β0+ β1*Distance to nearest mine (2)
A second regression was run using the same data as above, and with a second
explanatory variable called Distance to nearest city, which represents the distance
from the water quality data location to the nearest city.
SO4 = β0+ β1*Distance to nearest mine+ β2*Distance to nearest city (3)
Michael Hoefer Mining and Water Quality in South Africa
9
Results and Discussion
The visual representation of water quality in South Africa are shown in figures 3
and 4 on the following pages.
Michael Hoefer Mining and Water Quality in South Africa
10
Figure 3. Surface water sulfur averages by municipality.
Michael Hoefer Mining and Water Quality in South Africa
11
Figure 4. Ground water sulfur averages by municipality.
Michael Hoefer Mining and Water Quality in South Africa
12
Each map is shown with rivers as blue lines and cities as green dots sized
according to population. Mining locations are shown as purple dots, and the shaded
red areas represent the average sulfur levels in each municipality. The darker the red,
the higher the average sulfur value. A visual inspection of the map of surface water
(Figure 3) indicates the possibility of a correlation between mining locations and sulfur
levels. However, this also appears to correlate with the location of large cities (such as
Johannesburg and Cape Town). For surface water in general, the darker areas on the
map are closer to mines than the lighter areas.
However, sulfur measurements taken from ground water do not appear to
correlate as much with the location of mines (Figure 4). This makes sense, as borehole
water quality measurements are often taken from low depths which may not be affected
by water runoff from mines. In addition, larger bodies of water such as rivers and
reservoirs could potentially have more contributing sources which could be polluted by
the mines.
Regression Output
After seeing the potential for correlation for the surface water quality
measurements, the regression models were ran using ArcGIS.
Table 1. Regression results for Average SO4 (1)
Intercept
92.3148 (0.0000)***
Number of Mines
0.31 (0.8619)
Adjusted R2
-0.0042
P-values are shown beside the coefficients
* , ** and *** denote significance at 10, 5, and 1% levels, respectively.
The sign of the coefficient on Number of Mines is what was expected. Each
additional mine in a municipality increases the sulfur level by about .31 mg/l. However,
the p-value, .8619, shows this number is not significant. In addition, the negative
adjusted R2 value shows this model does not explain the variation in average sulfur
levels in the water. The model is missing important explanatory variables, and cannot
be used to make any significant inferences.
Michael Hoefer Mining and Water Quality in South Africa
13
The second and third regressions attempt to explain individual water
measurements, rather than municipality averages.
Table 2. Regression results for SO4 (2) and (3)
Model
(2)
(3)
Intercept
125.9679 (0.0000)***
151.4724
(0.0000)***
Distance to Nearest Mine
-106.0348 (0.0000)***
-83.2402
(0.0002)***
Distance to Nearest City
N/A
-75.1137
(0.0001)***
Adjusted R2
0.0009
0.0141
P-values are shown beside the coefficients
* , ** and *** denote significance at 10, 5, and 1% levels, respectively.
The coefficient on Distance to Nearest Mine is significant at the 1% level for
both regressions and has the same sign as expected. The negative sign indicates that
the closer the water is to a mine, the higher the SO4 in the water. The R2, 0.0009, of
the first regression indicates important explaining factors are missing.
Adding the Distance to Nearest City as a explanatory factor improves the R2 to
0.0141. While this model only explains less than 2% of the variation in sulfur
concentrations in surface water, both explaining factors are shown to be significant at
the 1% level. Moving one decimal degree away from the mine is associated with a drop
in sulfur levels of about 83.2402 mg/l. Moving one decimal degree away from the
nearest city is associated with a drop in sulfur levels of about 75.1137 mg/l.
The low R2 of these models shows the complexity involved in evaluating the
effects of mining on the environment. None of the models can be used for accurate
prediction of sulfur content based on the regression results. Regardless, these two
regressions show a statistically significant effect of mining proximity on water quality.
Michael Hoefer Mining and Water Quality in South Africa
14
Issues and future research potential
One of the biggest limitations of this study was lack of available data for which
to construct a model. If data could be collected, the following explanatory variables
could be explored to better describe the variation in sulfur levels (Wolfe, n.d.).
Mining production volume within a 10 mile radius of the water measurement.
o Break down mining production by commodity type to see if certain types
of minerals cause more contamination.
Distance from other manufacturing plants that could pollute water, such as
paper production or metal processing.
Agricultural runoff
Fossil fuel consumption concentration
Sewage levels
The locations of the water quality data is also a concern. Simply taking the
average sulfur levels of all water measurements could lead to an inaccurate
representation of a municipality’s water quality situation.
As shown in Figure 5, the distribution of water quality readings is not evenly
distributed throughout the country. Some municipalities have no water quality readings
after the year 2000, while others have hundreds. Depending on the specific location of
these water measurement, the average sulfur reading could be skewed. For example,
Figure 5. Maps of sulfur concentrations in surface and ground water in South Africa
Michael Hoefer Mining and Water Quality in South Africa
15
if there are 20 water quality points in one municipality, and 19 come from the same
clean reservoir, the average will be skewed to the low end regardless of how mining
affects other bodies of water in the municipality.
A big limitation of a large scale (country wide) study like this is the lack of
understanding of causal relationships on a more local level. Simply measuring the
distance from the water to the mine does not take into account water flow or other
physical factors. For example, if water quality is measured upstream from a mine, the
mine will likely have less of an effect than if the water was measured downstream.
Even if the downstream measurement is geographically further away from the mine, it
may receive more pollution than a closer measurement upstream.
This study does not differentiate between readings taken at different points in
time. Adding a time element to the study would allow for a longitudinal evaluation of
mining and environmental degradation. This is possible in ArcGIS using the ArcMap
Time Slider (ESRI, 2012).
Finally the impact of resource extraction is not purely an economic question.
Familiarity with natural sciences, statistics, GIS, and even meteorology would be
required to truly understand how human actions interact with and impact the
environment. Enlisting the help of a cross functional team with a variety of
specializations could help to further understand the problem.
Summary
The results of this study help showcase the complex issues involved in
characterizing the environmental impact of resource extraction. Visual mapping and
spatial analysis of water quality data suggest a correlation between mining locations
and sulfur water concentrations in water. Further data needs to be collected before a
complete quantitative model can be constructed from which predictions are made.
Evaluation / What I learned
My experience in UROP has been interesting, challenging, and eye opening. As
an engineer, it was difficult for me to switch to the mindset of an economist performing
research. The original research question was looking at the impact of democracy on
resource extraction in resource rich countries. After performing a lit review on the
subject and investigating the available data, I determined it was nearly impossible to
Michael Hoefer Mining and Water Quality in South Africa
16
truly characterize the environmental impact of resource extraction without looking on a
more local scale. Available environmental indices have so many contributing factors,
and to assign differences to resource extraction is difficult.
I decided to focus closely in on South Africa for a few reasons. First of all, mining
data and water quality data was readily available online. Secondly, the data was in
English and easy to understand. Lastly, South Africa is one of the largest mining
countries in the world, therefore an environmental impact would be more easily
noticeable.
Much of my time was spent writing the data gathering script in python and
learning GIS software. I first started off by using open source software, which was
difficult to use. In the end, my supervisor helped me get in touch with another faculty
that uses the popular proprietary software ArcGIS.
To help me get started, I contacted a professor who works for the South African
Government in water quality evaluation. He was able to point me in the right direction
and show me resources. This helped me learn the importance of collaboration in
research. Simply asking the right people can save time and help you discover accurate
answers to your questions.
I also learned that research can be a slow process, and a process that can
change at any time. It is important to take time and fully understand the data and
methods you are working with, and not to rush the research process. In addition,
reading the works of other researchers is just as important as doing your own work.
Standing on the shoulders of giants is the best way to reach the sky.
Acknowledgements
I am grateful to Hanna Krings for providing project supervision and research
guidance, and Michael Silberbauer for his help with water quality data.
References
Africa Open Data. 2011. Census 2011 Spatial Geography. Shapefile. Downloaded
from: http://africaopendata.org/dataset/cen
EPA. (2012, March 6). Sulfate in Drinking Water. Retrieved from
http://water.epa.gov/drink/contaminants/unregulated/sulfate.cfm
ESRI. (2012). ArcGIS Help 10.1, Using the Time Slider Window.
Michael Hoefer Mining and Water Quality in South Africa
17
Gray, N. F. "Environmental impact and remediation of acid mine drainage: a
management problem." Environmental Geology 30.1-2 (1997): 62-71.
Hoefer, Michael. (2014). SouthAfricaWaterQualityScraper.py [Python Script].
Available via email request to mjhoefer@iastate.edu.
Karimipour, F., Delavar, M. R., & Kinaie, M. (2005). Water quality management using
GIS data mining. Journal of Environmental Informatics, 5(2), 61-71.
Khalil, A., Hanich, L., Hakkou, R., & Lepage, M. (2014). GIS-based environmental
database for assessing the mine pollution: A case study of an abandoned
mine site in Morocco. Journal of Geochemical Exploration.
Mining Intelligence Database. (2014). Mining in South Africa.
Natural Earth. 2014. 1:10m Cultural Vectors, Populated Places. Shapefile.
Downloaded from: http://www.naturalearthdata.com/downloads/10m-cultural-
vectors/
Natural Earth. 2014. 1:10m Physical Vectors, Rivers + Lake Centerlines. Shapefile.
Downloaded from http://www.naturalearthdata.com/downloads/10m-physical-
vectors/
Silberbauer, M. (2011). Multivariate point data visualisation - Geographical
information systems developments to aid in water quality management. AGILE
Conference.
South Africa Deparment of Water and Sanitation. (2014, June 2). [Resource Quality
Services water quality monitoring sites grouped by primary drainage region].
Unpublished raw data.
South Africa, Statistics South Africa. (2011). Census 2011 Municipal Fact Sheet.
USGS Mineral Resources Online Spatial Data. 2014. South Africa. Shapefile.
Downloaded from: http://mrdata.usgs.gov/mrds/
Woldai, T. (2001). Application of remotely sensed data and GIS in assessing the
impact of mining activities on the environment. 17th International Mining
Congress and Exhibition of Turkey.
Wolfe, V. (n.d.). Lifeboat Earth, What pollutes our water?
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Primary industries such as mining form the backbone of developing economies throughout much of the world. A century of production driven, environmentally insensitive policies however, are leading to massive soil degradation and contamination, toxic vegetation, groundwater {surface and subsurface) pollution, mine dump disposal and landscape defacement around the mining areas. A systematic and multidisciplinary approach of mapping, monitoring and controlling the impact caused by the mining activities is necessary so as to understand the character and magnitude of these hazardous events in an area. This paper first addresses the issues concerning mining and its impact on the environment and on goes to assess the various remotely sensed geo-İnformatioo tools available nowadays for capturing up-to-date and detailed earth observation data, processing and interpretation in mining induced environmental problems.
Article
Full-text available
Nowadays scientists, managers and decision makers have faced with ever increasing production of digital geospatial data acquired at various geometric, thematic and temporal characteristics. Geospatial information systems (GISs) have been widely considered to handle such a diverse range of geospatial data. One of the important issues in geospatial data management is to explore the relationships and future trends of the data, which is possible through geospatial data mining and knowledge discovery. Geospatial data mining, its need and analyses have been investigated in this paper. In addition, applications of geospatial data mining in environ-mental data management and especially in water quality management have been introduced. Finally, regarding the abundance of indus-trial centers in Western and Eastern Azerbaijan Provinces in North-West of Iran and their effects on water quality in this region, correlation between industrial pollutions and water quality indicators through geospatial data mining has been modeled as a case study. The results have clearly identified the relationship between number and location of industrial pollutions and water quality indicators to be used in environmental protection and land use planning.
Article
Full-text available
 Work carried out at the abandoned copper (Cu) and sulphur (S) mine at Avoca (south east Ireland) has shown acid mine drainage (AMD) to be a multi-factor pollutant. It affects aquatic ecosystems by a number of direct and indirect pathways. Major impact areas are rivers, lakes, estuaries and coastal waters, although AMD affects different aquatic ecosystems in different ways. Due to its complexity, the impact of AMD is difficult to quantify and predict, especially in riverine systems. Pollutional effects of AMD are complex but can be categorized as (a) metal toxicity, (b) sedimentation processes, (c) acidity, and (d) salinization. Remediation of such impacts requires a systems management approach which is outlined. A number of working procedures which have been developed to characterise AMD sites, to produce surface water quality management plans, and to remediate mine sites and AMD are all discussed.
Census 2011 Spatial Geography. Shapefile Downloaded from: http://africaopendata.org/dataset/cen EPA Sulfate in Drinking Water. Retrieved from http://water.epa.gov/dri nk/contaminants/unregulated/sulfate
  • Africa Open
  • Data
Africa Open Data. 2011. Census 2011 Spatial Geography. Shapefile. Downloaded from: http://africaopendata.org/dataset/cen EPA. (2012, March 6). Sulfate in Drinking Water. Retrieved from http://water.epa.gov/dri nk/contaminants/unregulated/sulfate.cfm ESRI. (2012). ArcGIS Help 10.1, Using the Time Slider Window.
SouthAfricaWaterQualityScraper.py [Python Script]. Available via email request to mjhoefer@iastate
  • Michael Hoefer
Hoefer, Michael. (2014). SouthAfricaWaterQualityScraper.py [Python Script]. Available via email request to mjhoefer@iastate.edu.
Mining in South Africa
Mining Intelligence Database. (2014). Mining in South Africa.
1:10m Cultural Vectors, Populated Places
Natural Earth. 2014. 1:10m Cultural Vectors, Populated Places. Shapefile. Downloaded from: http://www.naturalearthdata.com/downloads/10m -culturalvectors/
1:10m Physical Vectors, Rivers + Lake Centerlines
Natural Earth. 2014. 1:10m Physical Vectors, Rivers + Lake Centerlines. Shapefile. Downloaded from http://www.naturalearthdata.com/downloads/10m-physicalvectors/