Content uploaded by Yingxu Wang
Author content
All content in this area was uploaded by Yingxu Wang on Oct 10, 2014
Content may be subject to copyright.
㻌
Big Data Analyses for Collective Opinion Elicitation in Social Networks
Yingxu Wang
Dept. of Electrical and Computer Engineering
Schulich School of Engineering, Univ. of Calgary
Calgary, Alberta, Canada T2N 1N4
e-mail: yingxu@ucalgary.ca
Victor J. Wiebe
Dept. of Electrical and Computer Engineering
Schulich School of Engineering, Univ. of Calgary
Calgary, Alberta, Canada T2N 1N4
e-mail: victor_mx@shaw.ca.ca
Abstract—Big data are extremely large-scaled data in terms of
quantity, complexity, semantics, distribution, and processing
costs in computer science, cognitive informatics, web-based
computing, cloud computing, and computational intelligence.
Censuses and elections are a typical paradigm of big data
engineering in modern digital democracy and social networks.
This paper analyzes the mechanisms of voting systems
and collective opinions using big data analysis
technologies. A set of numerical and fuzzy models for
collective opinion analyses is presented for applications
in social networks, online voting, and general elections.
A fundamental insight on the collective opinion
equilibrium is revealed among electoral distributions
and in voting systems. Fuzzy analysis methods for
collective opinions are rigorously developed and applied
in poll data mining, collective opinion determination,
and quantitative electoral data processing.
Keywords-Big data; big data engineering; numerical
methods; fuzzy big data; social networks; voting; opinion
poll; collective opinion; quantitative analyses
I. INTRODUCTION
Big data is one of the representative phenomena of the
information era of human societies [8, 16]. Almost all fields
and hierarchical levels of human activities generate
exponentially increasing data, information, and knowledge.
Therefore, big data engineering has become one of the
fundamental approaches to embody the essences of the
abstraction and induction principles in rational inferences
where discrete data represent continuous mechanisms and
semantics.
A field of big data applications is in human memory and
DNA analyses in neuroinformatics, cognitive biology, and
brain science, where huge amount of data and information
have been obtained and pending for efficient processing [1,
3, 10, 17]. For instance, the biological information contained
in a DNA is identified as up to 33 Peta-bit, i.e.,
32,985,348,833,280,000 bit or 32,985,348 Giga-bit, of
genetic information according to a formal neuroinformatics
model [33].
Another paradigm of big data generated in computing is
the Internet traffic as shown in Table I as of statistics in
2012 [30]. The big data over the Internet indicate human
communication and information searching demands via
digital devices such as over 4.6 billion mobile phones and
equivalent number of tablets and portable computers. The
big data in this domain has pushed the daily traffic from the
rate of Terabyte (1012) to that of Petabyte (1015).
TABLE I. THE BIG DATA TRAFFIC ON INTERNET IN 2012
Data hub Data traffic Rate/day
NYSE 1.0 Terabytes
Twitter 7.0 Terabytes
Facebook 10.0 Terabytes
Google 24.0 Petabytes
Total Internet traffic 667.0 Exabytes (1018)
Censuses and general elections are the traditional and
typical domains that demand efficient big data analysis
theories and methodologies beyond number counting [5, 13].
Among modern digital societies and social networks, popular
opinion collection via online polls and voting systems
becomes necessary for policy confirmation and general
elections.
One of the central sociological principals adopted in
popular elections and voting systems is the majority rule
where each vote is treated with an equal weight [2, 4, 7]. The
conventional methods for embodying majority rule may be
divided into two categories known as the methods of max
counting and average weighted sum. The former is the most
widely used technology that determines the simple majority
by the greatest number of votes on a certain opinion among
multiple or binary options. The latter assigns various weights
to optional opinions, which extends the binary selection to a
wide range of weighted rating. Classic implementations of
these voting methods are proposed by Borda, Condorcet, and
others [5, 11, 12]. Borda introduced a scale-based system
where each casted vote is attached a rank that represents an
individual's preferences [5]. Condorset developed a voting
technology that determines the winner of an election as the
individual who is paired against all alternatives as a run-off
vote [11]. However, formal voting and general elections
mainly adopt the mechanism that implements a selection of
only-one-out-of-n options without any preassigned weight.
In this practice for casting the majority rule in societies, the
average weighted sum method is impractical.
This paper analyzes the formal mechanisms of voting
systems and collective opinion elicitation in the big data
engineering approach. The cognitive and computing
properties of big data in general, and of the electoral big data
Proc. of 2014 IEEE International Conference on Big Data Science and Engineering, Tsinghua Univ., Beijing, China
978-1-4799-6513-7/14 $31.00 © 2014 IEEE
DOI 10.1109/TrustCom/BDSE.2014.81
630
㻌
in particular, are explored in Section II. A set of
mathematical models and numerical algorithms for collective
opinion analyses is developed in Section III and illustrated in
Section IV. Fuzzy models for collective opinion elicitation
and aggregation are rigorously described in Section V. A set
of real-world case studies on applications of the formal
methodologies is demonstrated in big poll data mining,
collective opinion determination, and quantitative electoral
data processing.
II. PROPERTIES OF DATA IN BIG DATA ENGINEERING
This section explores the intentions and extensions of
big data as a term. The sources of big data generation are
analyzed. Special properties of big data are elaborated in
computer science, cognitive informatics, web-based
computing, and computational intelligence.
A. The Computational Properties of Big Data
Definition 1. Data,D, are an abstract representation of
the quantity Q of real-world entities or mental objects by a
quantification mapping fq, i.e.:
q
Df: Q o (1)
Although decimal numbers and systems are mainly
adopted in human civilization, the basic unit of data is a bit
[9, 15], which forms the converged foundation of computer
and information sciences. Therefore, the most fundamental
form of information that can be represented and processed is
binary data. Based on bit, complex data representations can
be aggregate to higher structures such as byte, natural
numbers (), real numbers (), structured data, and
databases.
The physical model of data and data storage in computing
and the IT industry are the container metaphor where each
bit of data requires a bit of physical memory.
Definition 2. Big data are extremely large-scaled data
across all aspects of data properties such as quantity,
complexity, semantics, distribution, and processing costs.
The basic properties of big data are unstructured,
heterogeneous, monotonous growing, mostly nonverbal, and
decay in information consistency or increase of entropy over
time [20]. The inherent complexity and exponentially
increasing demands create unprecedented problems in all
aspects of big data engineering such as big data
representation, acquisition, storage, searching, retrieve,
distribution, standardization, consistency, and security.
The sources of big data are human collective
intelligence. Typical mathematical and computing activities
that generate big data are Cartesian products (O(n2)), sorting
(O(nxlogn)), searching (exhaustive, O(n2)), knowledge base
update (O(n2)), as well as permutation and NP problems
with O(2n), O(n!), or even higher orders [9]. Typical human
activities that produce big data are such as many-to-many
communications, massive downloads of data replications,
digital image collections, and networked opinion forming.
Although the syntax of data is concrete based on
computation and type theories, the semantics of data is
fuzzy [24, 25, 27, 32, 33]. The analysis and interpretation of
big data may easily exceed the capacity of conventional
counting and statistics technologies.
B. The Cognitive Properties of Big Data
The neurophysiological metaphor of data as factual
information and knowledge in human memory is a
relational network [10, 17, 19, 26], which can be
represented by the Object-Attribute-Relation (OAR) model
[19, 29] as shown in Figure 1.
O
1
A
11
O
2
A
12
A
13
A
22
A
23
A
2j
A
1i
A
2m'
A
21
A
1m
r(O
1
, O
2
)
r(A
11
, A
21
)
r(O
1
, A
1m
) r(O
2
, A
2m’
)
r(O
1
, A
2j
) r(O
2
, A
1i
)
Figure 1. The OAR model of data and knowledge in memory
Definition 3. The OAR model of data and knowledge as
retained in long-term memory o the brain is a triple, i.e.:
OAR (O, A, R) (2)
where O is a finite set of objects denoting the extension of a
data concept, A is a finite set of attributes for characterizing
the data concept, and R is a set of relations between the
objects and attributes.
The OAR model enables the estimation of the memory
capacity of human, which revealed the nature of big data as
cognitive and semantics entities in the brain. In cognitive
neurology, it is observed that there are about 1011 neurons in
the brain, each of them is with 103synaptic connections in
average [3, 10]. According to the OAR model, the
estimation of the capacity of human memory for big data
representation can be reduced to a classical combinatorial
problem as follows.
Definition 4. The capacity of human memory Cm is
determined by the total potential relational combinations,
s
n
C, among all neurons n = 1011 and their average synaptic
connections s = 103 to various related subset of entire
neurons, i.e.:
(3)
Eq. 3 provides an analytic explanation of the upper limit
of the potential number of synaptic connections among
neurons in the brain. The model reveals that the brain does
not create new neurons to represent new information;
11
3113
8,432
10 !
10 !(10 -10 )!
10 [bit]
s
mn
C=
=
C
631
㻌
instead, it generates new synapses between existing neurons
in order to represent the newly acquired information.
Both cognitive and computational foundations of data
explored in this section explain the nature of big data and
the need for big data engineering. The notion of big data
engineering is perceived as a field that studies the
properties, theories, and methodologies of big data as well
as efficient technologies for big data representation,
organization, manipulations, and applications in industries
and everyday life. It is noteworthy that, although the
appearance of data is discrete, the semantics and
mechanisms behind them are mainly continuous. This is the
essence of the abstraction and induction principles of natural
intelligence.
III. METHODS FOR BIG ELECTORAL DATA ANALYSES
Mathematical models and numerical methods for
rigorous voting data processing and representation are sought
in this section in order to reveal the nature of big data in
voting and collective opinions. This leads to a set of novel
methods beyond traditional counting technologies such as
regressions of opinion spectrums, adaptive integrations of
collective opinions, and allocation of the opinion
equilibrium.
A. Big Data Interpretation for Embodying the Majority
Rule in Sociology
As reviewed in Section I, the typical method for
implementing the majority rule via voting is used to be the
max finding method.
Definition 5. The max function elicits the greatest
number of votes on a certain opinion, Oi, 1 didn, as the
voting result n
Vamong a set of n options, i.e.:
12
( , , ..., )
n
OO On
VmaxNN N (4)
where NOi is the number of votes casted for opinion Oi.
When there are only two options for the voting, Eq. 4 is
reduced to a binary selection.
Although the conventional max finding method is
widely adopted in almost all kinds of voting systems, it is an
over simplified method for accurate opinion collection. The
major disadvantage of it is that the implied philosophy, the
winner takes all, would often overlook the entire spectrum
of distributed opinions. This leads to a pseudo majority
dilemma [13, 20], which is analyzed as follows.
Definition 6. The pseudo majority dilemma states that
the result of a voting based on the simple max mechanism
may not represent the majority opinion distribution casted in
the voting, i.e.:
_
01234
max
(,, , , )
n
OOOOO
Oi
VmaxNNNNN
N
__
max
1
max max
1
|,
where ( ) -
n
Oi
i
n
Oi Oi Oi
i
Nii
NN N
v
p
(5)
A typical case of the pseudo majority dilemma in voting
can be elaborated in the following example.
Example 1. A voting with a distributed political
spectrum from far right (NO0), right (NO1), neutral (NO2), left
(NO3), and far left (NO4) is shown in Figure 2 where
the vote distribution is 01234
[,, , , ]
OOOOO
XNNNNN
[4000,2500,2600,1200,1100]. According to the max finding
method given in Eq. 4, the voting result is:
5
01234
00
(,, , , )
(4000,2500,2600,1200,1100)
( 4000)
OOOOO
O
VmaxNNNNN
max
ON
º
00.5 11.5 22.5 33. 5 4
1000
1500
2000
2500
3000
3500
4000
Opinion spec trum (x)
Vote c ount f(x)
Project ed votes
Figure 2. Distribution of collective opinions and their votes
The result indicates that opinion O0 is the winner and
the other votes would be ignored. However, in fact, the sum
of the rest opinions 0max
1
| 2500 2600 1200 1100 7300,
n
Oi
i
Nii
v
is significantly greater than O0according to Eq. 5. Although
the maximum vote appears at 0 over the opinion spectrum,
the real representative centroid of the collective opinion is
actually at about l.3 on the spectrum. In other words, the
mean of the entire votes indicated an equilibrium point of
the collective opinion in between those of NO1 and NO2
rather than NO0. Therefore, in order to rationally analyze
popular opinion distributions and the representative
collective opinion on an opinion spectrum, advanced
mathematical models, numerical methods, and fuzzy
analyses [6, 21, 32, 33] are yet to be rigorously studied for
voting data processing and representation.
B. Numerical Regression for Analyzing Opinion
Spectrum Distributions beyond Counting
On the basis of analyses in the preceding subsection, an
overall perspective on the collective opinions casted in a
voting can be rigorously modeled as a nonlinear function
over the opinion spectrum. In order to implement a complex
polynomial regression, a numerical algorithm is developed
in MATLAB as shown in Figure 3, which can be applied to
analyze any popular opinion distribution against a certain
political spectrum represented by bid voting data. In the
632
㻌
analysis program, a 3rd order polynomial is adopted for
curve fitting, while other orders may be chosen when it is
appropriate. The general rule is that the order of the
polynomial regression m must less than the points of the
collected data n. Data interpolation technologies may be
adopted to improve the smoothness or missing points of raw
data in numerical technologies [6, 28, 32].
Figure 3. Algorithm of polynomial regression for opinion distributions
Applying the algorithm VoteRegressionAnalysis(X, Y), a
specific polynomial function and a visualized perception on
the entire opinion distribution can be rigorously obtained.
Example 2. The seats distribution of Canadian parties in
the House of Commons is given in Table II [30]. In Table II,
the relative position of each party on the political spectrum
is obtained based on statistics of historical data such as their
manifesto, policy, and common public perspectives [14, 18].
TABLE II. VOTING DATA DISTRIBUTION BY SEATS IN PARLIAMENT
Political party Seats occupied Relative position on the spectrum
New Democrats 100 -100
Bloc Quebecois 4 -71
Green 1 -43
Liberals 34 -14
Conservatives 160 50
According to the data in Table II, i.e., X = [-100, -71,
-43, -14, 0, 50] and Y = [100, 4, 1, 34, 4, 160], the voting
results can be rigorously represented by the following
function, f(x), as a result of the polynomial regression
implemented in Figure 3:
32
( ) 0.0001 0.005 2.175 65.1182fx x x x
(6)
where m = 3 and n = 5.
The above regression analysis results are visually plotted
in Figure 4. Because the polynomial characteristic function
is a continuous characteristic function, it can be easily
processed for multiple applications such as for opinion
spectrum representation, equilibrium determination, and
analyses of policy gains based on the equilibrium
benchmark as described in the following subsection.
-100 -50 050
-20
0
20
40
60
80
100
120
140
160
Opinion spectrum (x)
Vote c ount f(x)
Projected votes
PolyReg ression
Figure 4. House seats of parties on the political spectrum of Canada
C. The Collective Opinion Equilibrium Elicited from a
Spectrum of Opinion Distributions
It is recognized that the representative collective opinion
on a spectrum of opinion distributions casted in an election
is not a simple average of weighted sum as conventionally
perceived. Instead, it is the centriod covered by the curve of
the characteristic regression function as marked by the red
sign as shown in Figure 4.
Definition 7. The opinion equilibrium
;
is the natural
centroid in a given weighed opinion distribution where the
total votes of the left and right wings reached a balance at
the point k, i.e.:
-
(| ) () = () , , [-,]
kn
xn xk
x x k vxdx vxdx xk nn
9
¨¨
(7)
where
---
11
() = () ( () + ()
22
knkn
xn xn xn xk
v x dx v x dx v x dx v x dx
¨¨¨¨
The integration of distributed opinions based on the
regression function can be carried out using any numerical
integration technologies. For instance, the iterative
Simpson’s integration method for an arbitrary continuous
function f(x) over [a, b] can be described as follows:
123
0
1
1
1
( ) , , 0
3
[ ( ( ) 4 ( ) ( ))]
3
b
oa
kpp
ppp
p
n
Ifxdxk
hfx fx fx
I
II
R
³ (8)
The collective opinion equilibrium method as modeled in
Eq. 7 is implemented in the algorithm as shown in Figure 5.
The core integration method adopted in the algorithm is
based on a built-in function quad() in MATLAB [6] that
implements Eq. 8.
Example 3. Applying the collective opinion equilibrium
determination algorithm to the opinion distribution data in
the Canadian general election as given in Figure 4, the
function VoteRegressionAnalysis(X, Y)
% Curve fitting by polynomial regression
format long;
m = 3;
n = length (X);
p = polyfit(X, Y, m);
% Vote distribution regression
f = @(x) p(1)*x.^4 + p(2)*x.^3 + p(3)*x.^2 + p(4)*x + p(5);
fprintf ('f(x) = %10.3f %s %10.3f %s %10.3f %s %10.3f %s
%10.3f %s\n', p(1), '*x.^4 + (', p(2), ')*x.^3 +
(', p(3), ')*x.^2 + (', p(4),')*x + (', p(5), ')');
plot(X, Y,'*r-'); hold on;
fplot(f, [X(1) X(n)]);
xlabel('Opinion spectrum (x)'); ylabel('Vote count f(x)');
legend ('Projected votes','PolyRegression');
% Find opinion equilibrium
[TotalOpinionIntegration, Equilibrium] =
VoteEquilibriumAnalysis(f, X(1), X(n))% Call subfunction
plot(Equilibrium, 0,'+r');
633
㻌
opinion equilibrium is obtained as
;
1 = 20.3. The result
indicates that the overall national opinion equilibrium was at
the mid-right as casted in 2011.
Figure 5. Algorithm of collective opinion equilibrium determination
Because the collective opinion equilibrium
;
is the
centroid of the opinion integration as defined in Eq. 7, it is
obvious that the equilibrium cannot be simply determined or
empirical allocated without the computational algorithm
(Figure 5) as demonstrated in Example 3.
IV. BIG ELECTORAL DATA PROCESSING
Using the methodologies developed in Section III, useful
applications will be demonstrated in this section with real-
world data. The case studies encompass the analysis of a
series of general elections in order to find out the dynamic
equilibrium shifts and the extrapolation of potential policy
gains based on the historical electoral data.
A. Analysis of a Series of Historical Elections Based
on Equilibrium Benchmarking
A benchmark of opinion equilibrium can be established
on the basis of a series of the historical electoral data. Based
on it, trends of the political equilibriums can be rigorously
analyzed in order to explain: a) What was the extent of
serial shifts as casted in the general elections? and b) Which
party was closer to the political equilibrium represented by
the collective opinions casted in the general elections?
Example 4. The trend in Canadian popular votes over
time can be benchmarked by results from the last four
general elections as given in Table III. Applying the opinion
equilibrium determination algorithm, vote_equilibrium_
analysis as given in Figure 5, the collective opinions
distributed in Figure 6 can be rigorously elicited, which
indicates a dynamic shifting pattern of the collective opinion
equilibriums, i.e., 5.0 o 7.4 o 7.0 o 10.9, on the political
spectrum between [-100, 100] during 2004 to 2011.
The opinion equilibrium determination method provides
insight for revealing the implied trends and the entire
collective opinions distributed on the political spectrum. An
interesting finding in Example 4 is, although several parties
on the left spectrum, -100 dx < 0, had won significant
number of votes as shown in Table III and Figure 6, the
collective opinion equilibrium had mainly remained
unchanged at the area of central-right where
;
= 7.6 in
average.
TABLE III. HISTORICAL ELECTORAL DATA DISTRIBUTIONS OF
CANADIAN GENERAL ELECTIONS
Figure 6. Polynomial regressions for federal elections during 2004 to 2011
B. Extrapolation for Potential Policy Gains Based on
Benchmarked Collective Opinion Equilibrium
The key objective of a party in a general election is to
rationally predict what the potential gain would be for a
certain policy making or shifting. The theory of the
collective opinion equilibrium as developed in preceding
sections suggests that this target can be reached by adapting
current policies towards the equilibrium benchmark.
Definition 8. A target gain in elections can be
extrapolatively projected via a necessary shift of policy
'
x
= n’ - n that satisfies the equilibrium benchmark
;
by an
updated regression of expected opinion distributions v'(x):
'
'
'
a) At the right end: ' ,
when ( ) = / 2, ' [ , ]
b) At the left end: ' ,
when ( ) = / 2, ' [ , ]
c) In between: ' ,and a,a' ( , )
()
when
n
o
xk
k
o
xn
a
xa
xnn
vxdx I n kn
xnn
vxdx I n nk
xaa nn
vxdx
¨
¨
¨
+
+
+
+
+
+
'
=1.1 ( ), '
() =1.1 ( '), '
a
xa
va a a
vxdx va a a
£
¦
¦t
¦
¦
¤
¦
¦t
¦
¦
¥¨+
(9)
where the original position n, point of equilibrium
;
(k |
x=k),and the total integrated votes Ioare known as results
of analyses in Section III.
Political Party Votes Relative
position on the
spectrum
2004 2006 2008 2011
Conservatives 4,019,498 5,374,071 5,209,069 5,832,401 50
Liberals 4,982,220 4,479,415 3,633,185 2,783,175 -14
Green 582,247 664,068 937,613 576,221 -43
Bloc Quebecois 1,680,109 1,553,201 1,379,991 889,788 -71
New Democrats 2,127,403 2,589,597 2,515,288 4,508,474 -100
Opinion
equilibrium
5.0 7.4 7.0 10.9
function [TotalOpinionIntegration, Equilibrium] =
VoteEquilibriumAnalysis(f, Xl, Xu)
% The integration of total opinion in the voting
TotalOpinionIntegration = quad(f, Xl, Xu); % Simpson integration
% To find the opinion equilibrium by iterative integration
h = 0.1;
for MidPoint = Xl : h : Xu
IGi = quad(f, Xl, MidPoint); % Simpson iterative integration
if IGi >= TotalOpinionIntegration / 2
break
end
end
Equilibrium = MidPoint;
634
㻌
The extrapolation method is divided into three cases
according to the position of the party on the spectrum,
which can be at either ends or in the middle of the spectrum.
Example 5. Given a target of a 10% gain in terms of
number of votes in the future election for a party where n =
50 on the spectrum, what kind of policy manipulations may
be needed to contribute towards the expected objective
based on the equilibrium benchmark casted in the latest
election as obtained in Example 4?
Based on the historical data provided in Table III,
i.e., X = [-100, -71, -43, -14, 50], Y2011 = [4508474, 889788,
576221, 2783175, 5832401], and the total opinion integration
Io = 441854649, the projected electoral improvement
problem can be represented as follows:
' 100, 71, 43, 14, '
' 4508474, 889788,576221,2783175, 5832401 1.1 6415641
441854649
o
Xn
Y
I
£¯
¦
¦¡°
¢±
¦
¦
¦¯
q
¤¡°
¢±
¦
¦
¦
¦
¦
¥
Solve the problem according to Eq. 9(a), the following
results are obtained: n’ = 48.2 and
'
x = n’ - n = 48.2 – 50 =
-1.8. That is, in order to gain 10% more votes, the party
would need to shift its policy leaning to the collective
equilibrium
;
=10.9 for 1.8 steps where the negative sign
indicates a move to the middle. In case where other factors
would change as well, the problem becomes a multi-party
gaming system. However, for any given moment, the
system is still determinable based on the same analysis
method and algorithm as presented in Sections III and IV.
V. FUZZY METHODS FOR COLLECTIVE OPINION
ELICITATION AND ANALYSIS BASED ON BIG POLL DATA
Big data analysis technologies for collective opinion
elicitation based on historical data have been demonstrated
in preceding sections, which reveal that a party may gain
more votes by adapting its policy towards the political
equilibrium established in past elections. It is recognized
that a social system is conservative which is not change
rapidly over time because the huge base of population and
human cognitive tendency according to the long-life span
system theory [23]. However, the collective opinion
equilibriums do shift dynamically. Therefore, an advanced
technology for enhancing potential policy gains is to
calibrate the current collective opinion equilibrium by polls
in order to support up-to-date analysis and predication.
A. Fuzzy Elicitation of Collective Opinion from Big
Poll Data Samples
The typical technology for detecting current collective
opinion equilibrium is by polls. A poll may be designed to
test the impact of a potential policy in order to establish a
newly projected equilibrium. The projected equilibrium will
be used to update and adjust the historical benchmark. In
this approach, rational predictions of policy gains towards a
general election or a social network voting can be obtained
in a series of analytic regressions as formally described in
the remainder of this subsection.
Definition 9. An opinion oi on a given policy pi is a
fuzzy set of degrees of weights
j
k
i
Xexpressed by j, 1 djdm,
groups in the uniform scale I, i.e.:
i
ii i
i
12
1
:, [0,1]
{( , ),( , ),...,( , )}
{(, )}
m
j
ii
ii i
ii i
m
i
i
j
ofp
pp p
p
R
XX X
X
l
II
(10)
where the big-R notation represents recurring entities or
repetitive functions indexed by the subscript [20].
The normalized scale for fuzzy analyses is a universal
one because any other scale can be mapped into it.
Definition 10. Acollective opinion
j
P
Oon a set of n
policies pi, 1didn, is a compound opinion as a fuzzy set of
average weights
k
i
1
1
=(,))
q
N
k
ij
k
q
ij
N
WX
on each policy, i.e.:
j
j
k
i
k
k
k
k
k
k
k
k
k
k
11
1
11
111 112 11
221 222 22
12
{}
1
{(,= (,))}, [0,1]
(, ) (, ) ... (, )
( , ) ( , ) ... ( , )
{}
... ... ... ...
( , ) ( , ) ... ( , )
q
nm
Pij
ij
N
nm
k
iij ij
k
q
ij
m
m
nn nn nnm
Oo
pij
N
pp p
pp p
pp p
RR
RR WXW
WW W
WW W
WW W
¯
¡°
¡°
¡°
¡°
¡°
¡°
¡°
¡°
¢±
(11)
where
j
P
Omay be aggregated against the averages of each
row or column that indicate the collective opinions of a
certain policy casted by all groups or that of all policies of a
certain group, respectively, as illustrated in Table IV.
Definition 11. The effect
i
E of a set of policies is a
fuzzy matrix of the average weighted differences between
the current opinion
k
ij
W and the historical ones
k
'
ij
W for the
ith policy on the collective opinion of the jth group, i.e.:
i
i
j
kk
'
11
', 1 , 1
{(, )}
ij ij
nm
iij ij
ij
EO O in jm
p
RR WW
bbbb
(12)
Definition 12. The impact I
of a policy is a fuzzy
matrix of products of effects
i
Eand the corresponding group
sizes
j
g
N, i.e.:
j
kk
'
11 11
( ) { ( , ( ))}
jj
nm nm
gij igij ij
ij ij
INE pN
RR RR WW
t
(13)
where the r sign indicates a positive or negative impact on a
target group, respectively.
Definition 13. The gain of policy impacts,
i
G, is a fuzzy
635
㻌
set of the mathematical means of the cumulative impacts
that each group obtain as results of the series of
aggregations from the initial poll data, i.e.:
i
j
1
1
1
()
mn
ij
i
j
GI
n
R
4
(14)
B. Fuzzy Analyses of Potential Policy Impacts in Votes
The fuzzy methodologies for collective opinion
elicitation and analysis from big poll data as developed in
Section V.A are illustrated in application case studies in the
following examples.
Example 6. The collective opinion
j
P
Oon the set of 3
testing policies against 5 groups on the political spectrum
can be elicited based on a set of large sample poll data as
summarized in Table IV. The current average weights of
opinions
k
ij
W and those of the historical ones
k
'
ij
Ware
aggregated from the sample data of individual opinions
according to Eqs. 10 and 11.
TABLE IV. SAMPLE POLL DATA OF COLLECTIVE OPINIONS
kk
'
,
ij ij
WW
G
1
G
2
G
3
G
4
G
5
p
1
0.2, 0.1 0.4, 0.4 0.9, 0.7 0.7, 0.6 0.5, 0.9
p
2
1.0, 0/9 0.5, 0.7 0.8, 0.9 0.5, 0.4 0.6, 0.5
p
3
0.5, 0.2 0.6, 0.3 0.3, 0.5 0.7, 0.6 1.0, 0.8
Definition 14. The complexity or size of poll data, P
C,
is proportional to the numbers of testing policies |P|, groups
on the spectrum |G|, and number of sample individuals Nq,
i.e.:
|| | |
Pq
CPGNtt (15)
where 2,000 tests in a poll will result in 30,000 raw
individual opinions in the settings of Example 6.
Example 7. Based on the summarized poll data as given
in Table IV with the average collective opinions, the fuzzy
set of effects
i
Eof the ith policy to the collective opinion of
the jth group can be quantitatively determined according to
Eq. 12 as follows:
i
kk
kk
'
11
35
'
11
1
2
3
(, )}, 3, 5
{(, )}
0.2 0.4 0.9 0.7 0.5 0.1 0.4 0.7 0.6 0.9
{ , 1.0 0.5 0.8 0.5 0.6 0.9 0.7 0.9 0.4 0.5
0.5 0.6 0.3 0.7 1.0 0.2 0.3 0.5 0.6 0.8
nm
iij ij
ij
iij ij
ij
Ep nm
p
p
p
p
RR
RR
WW
WW
¯ ¯
¡°¡ °¡
¡°¡ °¡
¡°¡ °¡
¡°¡ °
¡°¡ °
¢±¢ ±¢
1
2
3
}
0.1 0 0.2 0.1 -0.4
{ , 0.1 -0.2 -0.1 0.1 0.1 }
0.3 0.3 -0.2 0.1 0.2
p
p
p
¯
°
°
°
¡°
¡°
±
¯ ¯
¡°¡ °
¡°¡ °
¡°¡ °
¡°¡ °
¡°¡ °
¢±¢ ±
where the most effective policy is p3o{G1,G2} with a 30%
improvement, while the most negatively effective policy is
p1oG5 with a -40% loss.
Example 8. On the basis of Table IV and Example 7,
the impact I
of each tested policy is a fuzzy matrix of the
products of individual group size and the effects that
projects the ith policy on the jth group with the size
j
g
N, i.e.:
j
11
35
'
11
(), 3,5
{ ( , ( ))}
0.1 4508474 0 889788 0.2 576221 0.1 2783175 0.4 5832401
= 0.1 4508474 0.2 889788 0.1 576221 0.1 2783175 0.1 5832401
0.3 4508474 0. 3 889788 0.
j
j
nm
gij
ij
igij ij
ij
INEnm
pN
RR
RR WW
t
tt ttt
ttt t t
tt
2 576221 0. 1 2783175 0.2 5832401
450847 0 115244 278318 -2332960
450847 -177958 -57622 278318 583240
1352542 266936 -115244 278318 1166480
¯
¡°
¡°
¡°
¡°
tt t
¡°
¢±
¯
¡°
¡°
¡°
¡°
¡°
¢±
where [4508474, 889788, 576221, 2783175, 832401]
j
g
Naccording
to the 2011 data in Table III.
Example 9. The potential average gain of policy impacts
i
Gcan be derived according to Eq. 14 based on the results in
Example 8 as follows:
i
j
j
1
1
53
1
1
53
1
1
1
(), 3,5
1
()
3
450847 0 115244 278318 -2332960
1
( 450847 -177958 -57622 278318 583240 )
31352542 266936 -115244 278318 1166480
{751412, 29660, -19207, 278317,
mn
ij
i
j
ij
i
j
i
j
GInm
n
I
R
R
R
¯
¡°
¡°
¡°
¡°
¡°
¢±
4
4
4
-194413}
-100 -50 050
0
1
2
3
4
5
6
7x 10
6
Opinion s pectrum (x)
Votes f(x)
Project ed vot es
Historical votes
Project ed Regression
Figure 7. Polynomial regressions of projected voting gains
The projected gains or losses, G, over the political
spectrum produce a new set of estimated electoral
distributions Y = Y’ + G = [4508474, 889788, 576221,
2783175, 5832401] + [751412, 29660, -19279, 278317,
-194413] = [5259886, 919488, 557014, 3061492, 5637988].
636
㻌
On the basis of the projected gains derived from current
polls of collective opinions, the potential shift of the
collective opinion equilibrium on the political spectrum can
be predicated using the algorithm in Figure 5. The
regression result is plotted in Figure 7, which indicates a
collective opinion equilibrium shift slightly to the middle,
i.e.,
';
=
;
2 -
;
1 = 9.8 – 10.9 = -1.1, by contrasting to that
of the historical vote distributions.
VI. CONCLUSIONS
Big data engineering has been introduced into the field of
sociology for collective opinion elicitation and analyses.
Numerical models and fuzzy methodologies have been
developed for rigorously analyzing voting and electoral data.
This approach has led to the revealing of deep implications,
complex equilibrium, and dynamic trends represented by
popular opinion distributions on a political spectrum. A key
finding in this work has been the existence of the collective
opinion equilibrium over a spectrum of opinion distribution
in big poll data, which is not simply a weighted average
rather than the point of natural centriod at the integrated
areas of opinion distributions. Adaptive policy gains based
on historical and current poll data have been formally
derived from fuzzy collective opinion aggregation, effect
analyses, and quantitative impact estimations. A set of
interesting insights has been demonstrated on the nature of
large-scale collective opinions in poll data mining, collective
opinion equilibrium determination, and quantitative electoral
data processing in big data engineering.
ACKNOWLEDGMENT
This work is supported in part from a discovery fund
granted by the Natural Sciences and Engineering Research
Council of Canada (NSERC). We would like to thank the
anonymous reviewers for their valuable suggestions and
comments on the previous version of this paper.
REFERENCES
[1] C. Benham and S. Mielke, "DNA mechanics," Annu Rev Biomed
Eng, vol. 7, 2005, pp. 21–53.
[2] M. Chevallier, M. Warynski, and A. Sandoz, “Success factors of
Geneva's e-voting system,” The Electronic Journal of e-Government,
vol. 4, 2006, pp. 55-61.
[3] M. Chicurel, “Databasing the brain,” Nature, vol. 406, 2000, pp.
822-825.
[4] O. Davis, M. Hinich, and P. Ordeshook, “An expository
development of a mathematical model of the electoral process,”
American Political Science Review, vol. 64, no. 2, 1970, pp. 426-
448.
[5] P. Emerson, “The original Borda count and partial voting,䇿㻌 Social
Choice and Welfare, vol. 40, no. 2, 2013, pp. 353-358.
[6] A. Gilat and V. Subramaniam, Numerical Methods for Engineers
and Scientists: An Introduction with Applications using MATLAB.
2nd ed., MA: John Wiley & Sons, 2011.
[7] B. Goldsmith, Electronic Voting and Counting Technologies: A
Guide to Conducting Feasibility Studies. Washington, D.C.:
International Foundation for Electoral Systems (IFES), 2011.
[8] A. Jacobs, "The pathologies of big data," ACM Queue, July 2009,
pp.1-12.
[9] H. R. Lewis and C. H. Papadimitriou, Elements of the Theory of
Computation, 2nd ed., NY: Prentice Hall, 1998.
[10] E. N. Marieb, Human Anatomy and Physiology, 2nd ed., Redwood,
CA: The Benjamin/Cummings Publishing Co., 1992.
[11] I. McLean and N. Shephard, “A program to implement the
condorcet and Borda rules in a small-nelection,” Technical Report,
Oxford University, UK, 2005.
[12] R. J. Mokken, A Theory and Procedure in Scale Analysis with
Applications in Political Research. Netherlands: Mouton & Co.,
1971, pp. 29-233.
[13] D. G.. Saari, “Mathematical structure of voting paradoxes: II.
positional voting,” Journal of Economic Theory, vol. 15, no. 1, 2000.
[14] G. Sartori, Parties and Party Systems: A Framework for Analysis.
UK: Cambridge University Press, 1976, pp. 291.
[15] C. E. Shannon, “A mathematical theory of communication,” Bell
System Technical Journal, vol. 27, 1948, pp.379-423 and 623-656.
[16] C. Snijders, U. Matzat, and U.-D. Reips. “‘Big data’: Big gaps of
knowledge in the field of Internet,” International Journal of Internet
Science, vol. 7, 2012, pp. 1-5.
[17] R. J. Sternberg, In Search of the Human Mind. 2nd ed., NY:
Harcourt Brace & Co., 1998.
[18] K. Strom, “A behavioral theory of competitive political parties,”
American Journal Political Science, vol. 34, no.2, 1990, pp.565-598.
[19] Y. Wang and Y. Wang, “On cognitive informatics models of the
brain,” IEEE Transactions on Systems, Man, and Cybernetics, vol.
36, no. 2, March, 2006, pp. 203-207.
[20] Y. Wang, Software Engineering Foundations: A Software Science
Perspective. NY: Auerbach Publications, 2007.
[21] Y. Wang, “On cognitive computing,” International Journal on
Software Science and Computational Intelligence, vol. 1, no.3. pp.
1-15.
[22] Y. Wang, “In search of denotational mathematics: novel
mathematical means for contemporary intelligence, brain, and
knowledge sciences,” Journal of Advanced Mathematics and
Applications, vol. 1, no. 1, 2012, pp. 4-25.
[23] Y. Wang, “On long lifespan systems and applications,” Journal of
Computational and Theoretical Nanoscience, vol. 9, no. 2, 2012, pp.
208-216.
[24] Y. Wang, “Formal rules for fuzzy causal analyses and fuzzy
inferences,” International Journal of Software Science and
Computational Intelligence, vol.4, no.4, 2012, pp.70-86.
[25] Y. Wang and R.C. Berwick, “Towards a formal framework of
cognitive linguistics,”Journal of Advanced Mathematics and
Applications, vol. 1, no. 2, 2012, pp.250-263.
[26] Y. Wang, “Neuroinformatics models of human memory: mapping
the cognitive functions of memory onto neurophysiological
structures of the brain,” International Journal of Cognitive
Informatics and Natural Intelligence, vol. 7, no. 1, 2013, pp. 98-122.
[27] Y. Wang, “Fuzzy causal inferences based on fuzzy semantics of
fuzzy concepts in cognitive computing,” WSEAS Transactions on
Computers, 13, 2014, pp.430-441.
[28] Y. Wang, J. Nielsen, and V. Dimitrov, “Novel optimization theories
and implementations in numerical methods,” Int. J. of Advanced
Mathematics and Applications, vol. 2, no. 1, 2013, pp.2-12.
[29] Y. Wang and G. Fariello, “On neuroinformatics: mathematical
models of neuroscience and neurocomputing.” Journal of Advanced
Mathematics and Applications, vol. 1, no. 2, 2012, pp. 206-217.
[30] Web-J.J, Political Parties of Canada,
http://www.thecanadaguide.com/political-parties, 2013.
[31] Wiki, Internet traffic, 2012,
http://en.wikipedia.org/wiki/Internet_traffic.
[32] L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, 1965,
pp. 338-353.
[33] L. A. Zadeh, “Fuzzy logic and approximate reasoning,” Syntheses,
vol. 30, 1975, pp. 407-428.
637


![The OAR model of data and knowledge in memory Definition 3. The OAR model of data and knowledge as retained in long-term memory o the brain is a triple, i.e.: OAR (O, A, R) (2) where O is a finite set of objects denoting the extension of a data concept, A is a finite set of attributes for characterizing the data concept, and R is a set of relations between the objects and attributes. The OAR model enables the estimation of the memory capacity of human, which revealed the nature of big data as cognitive and semantics entities in the brain. In cognitive neurology, it is observed that there are about 10 11 neurons in the brain, each of them is with 10 3 synaptic connections in average [3, 10]. According to the OAR model, the estimation of the capacity of human memory for big data representation can be reduced to a classical combinatorial problem as follows. Definition 4. The capacity of human memory C m is determined by the total potential relational combinations,](https://www.researchgate.net/profile/Yingxu-Wang-3/publication/266392544/figure/fig1/AS:614114044952587@1523427606480/The-OAR-model-of-data-and-knowledge-in-memory-Definition-3-The-OAR-model-of-data-and_Q320.jpg)













