ArticlePDF Available

Applying Moving Average Filtering for Non-interactive Differential Privacy Settings

Authors:

Abstract

One of the challenges in implementing differential data privacy is that the utility (usefulness) of the privatized data set diminishes even as confidentiality is guaranteed. In such settings, due to excessive noise, original data suffers loss of statistical significance despite the fact that strong levels of data privacy is assured by differential privacy; thus making the privatized data practically valueless to the consumer of the published data. Additionally, researchers have noted that finding equilibrium between data privacy and utility requirements remains intractable, necessitating trade-offs. Therefore, as a contribution, we propose using the moving average filtering model for non-interactive differential privacy settings with the resulting empirical data. In this model, various levels of differential privacy (DP) are applied to a data set, generating various privatized data sets. The privatized data is passed through a moving average filter and the new filtered privatized data sets that meet a set utility threshold are finally published. Preliminary results from this study show that adjustment of ε epsilon parameter in the differential privacy process and the application of the moving average filter might generate better data utility output while conserving privacy in non-interactive differential privacy settings.
Procedia Computer Science 36 ( 2014 ) 409 415
Available online at www.sciencedirect.com
1877-0509 © 2014 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Peer-review under responsibility of scientific committee of Missouri University of Science and Technology
doi: 10.1016/j.procs.2014.09.013
ScienceDirect
&RPSOH[$GDSWLYH6\VWHPV3XEOLFDWLRQ
&LKDQ+'DJOL(GLWRULQ&KLHI
&RQIHUHQFH2UJDQL]HGE\0LVVRXUL8QLYHUVLW\RI6FLHQFHDQG7HFKQRORJ\
3KLODGHOSKLD3$
$SSO\LQJ0RYLQJ$YHUDJH)LOWHULQJIRU1RQLQWHUDFWLYH
'LIIHUHQWLDO3ULYDF\6HWWLQJV
.DWR0LYXOHDDQG&ODXGH7XUQHUE
abComputer Science Department, Bowie State University, Bowie, MD, USA
Abstract
2QH RI WKHFKDOOHQJHV RI LPSOHPHQWLQJGLIIHUHQWLDO GDWD SULYDF\LV WKDW WKH XWLOLW\ XVHIXOQHVVRI WKH SULYDWL]HG GDWD WHQGV WRGLPLQLVK HYHQ DV
FRQILGHQWLDOLW\LVJXDUDQWHHG,QVXFKVHWWLQJVGXHWRH[FHVVLYHQRLVHRULJLQDOGDWDVXIIHUVORVVRIVWDWLVWLFDOVLJQLILFDQFHGHVSLWHWKHVWURQJOHYHOVRI
FRQILGHQWLDOLW\DVVXUHGE\GLIIHUHQWLDO SULYDF\.7KLV LQ WXUQ PDNHVWKH SULYDWL]HGGDWD SUDFWLFDOO\YDOXHOHVVWR WKH FRQVXPHURI WKHSXEOLVKHG GDWD
$GGLWLRQDOO\UHVHDUFKHUVKDYHQRWHGWKDWILQGLQJHTXLOLEULXPEHWZHHQGDWDSULYDF\DQGXWLOLW\UHTXLUHPHQWVUHPDLQVLQWUDFWDEOHQHFHVVLWDWLQJWUDGH
RIIV 7KHUHIRUH DV D FRQWULEXWLRQZH SURSRVHXVLQJ WKH PRYLQJ DYHUDJHILOWHULQJ PRGHO IRU QRQLQWHUDFWLYHGLIIHUHQWLDO SULYDF\ VHWWLQJV ,Q WKLV
PRGHOYDULRXVOHYHOVRIGLIIHUHQWLDOSULYDF\'3DUHDSSOLHGWR DGDWDVHW JHQHUDWLQJD YDULHW\RISULYDWL]HGGDWDVHWV7KHSULYDWL]HGGDWDLVSDVVHG
WKURXJKDPRYLQJDYHUDJHILOWHU DQGWKH QHZILOWHUHG SULYDWL]HGGDWDVHWVWKDWPHHWDVHWXWLOLW\WKUHVKROGDUHILQDOO\SXEOLVKHG3UHOLPLQDU\UHVXOWV
IURPWKLV VWXG\ VKRZWKDWDGMXVWPHQW RI İ HSVLORQ SDUDPHWHU LQ WKHGLIIHUHQWLDO SULYDF\SURFHVVDQG WKH DSSOLFDWLRQ RIWKH PRYLQJ DYHUDJHILOWHU
PLJKWJHQHUDWHEHWWHUGDWDXWLOLW\RXWSXWZKLOHFRQVHUYLQJSULYDF\LQQRQLQWHUDFWLYHGLIIHUHQWLDOSULYDF\VHWWLQJV
7KH$XWKRUV3XEOLVKHGE\(OVHYLHU%9
6HOHFWLRQDQGSHHUUHYLHZXQGHUUHVSRQVLELOLW\RIVFLHQWLILFFRPPLWWHHRI0LVVRXUL8QLYHUVLW\RI6FLHQFHDQG7HFKQRORJ\
Keywords:'LIIHUHQWLDO3ULYDF\0DFKLQH/HDUQLQJ6LJQDO3URFHVVLQJ0RYLQJ$YHUDJH)LOWHULQJ
1. Introduction
:KLOHGLIIHUHQWLDOSULYDF\KDVFDSWXUHGWKHLQWHUHVWRIPDQ\GDWDSULYDF\UHVHDUFKHUVGXHWRWKHDELOLW\WRJXDUDQWHHFRQILGHQWLDOLW\
WKHGDWDSULYDF\WHFKQLTXHLVVWLOO IDFHGZLWKWKHFKDOOHQJHRISULYDF\YHUVXVXWLOLW\WKDWKDVEHHQVKRZQWREHLQWUDFWDEOH >@>@>@
2IUHFHQWGLIIHUHQWLDOSULYDF\KDVFRPHXQGHUKHDY\FULWLFLVPEHFDXVHRI WKHGLVWRUWLRQRIWKHTXHU\UHVXOWVWKDWPDNHWKHSULYDWL]HG
GDWDUHVXOWV YLUWXDOO\ XVHOHVV GHVSLWH SULYDF\ JXDUDQWHHV )RULQVWDQFH%DPEDXHU 0XUDOLGKDUDQG 6DUDWK\  REVHUYHGLQ WKHLU
&RUUHVSRQGLQJDXWKRU7HOID[
E-mail address:NPLYXOH#JPDLOFRP
© 2014 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Peer-review under responsibility of scientific committee of Missouri University of Science and Technology
410 Kato Mivule and Claude Turner / Procedia Computer Science 36 ( 2014 ) 409 – 415
H[WHQVLYHFULWLTXHWKDWGLIIHUHQWLDOSULYDF\ ZLOOHLWKHUSURGXFHYHU\ZURQJUHVHDUFKUHVXOWVRURIIHUQRSURWHFWLRQGXHWRWKHYHU\ KLJK
OHYHOV RI GLVWRUWLRQV LQ WKH SULYDWL]HG TXHU\ UHVXOWV >@ +RZHYHU 'ZRUN  ZKR ILUVW SURSRVHGGLIIHUHQWLDO SULYDF\
DFNQRZOHGJHGWKLVWHQVLRQEHWZHHQGDWDSULYDF\DQGXWLOLW\E\VXFFLQFWO\VWDWLQJWKDW³SHUIHFWSULYDF\FDQEHDFKLHYHGE\SXEOLVKLQJ
QRWKLQJ DW DOO EXW WKLV KDV QR XWLOLW \ SHUIHFW XWLOLW\ FDQ EH REWDLQHG E\ SXEOLVKLQJ WKH GDWD H[DFWO\ DV UHFHLYHG EXW WKLV RIIHUV QR
SULYDF\´>@ 7KHUHIRUHRQHRI WKHFKDOOHQJHVLQ HPSOR\LQJ GLIIHUHQWLDOGDWD SULYDF\ LV WKDW WKHXWLOLW\RIWKHSULYDWL]HG GDWD VKULQNV
HYHQ DV FRQILGHQWLDOLW\ LV JXDUDQWHHG  ,QVXFK VHWWLQJVG XH WR H[FHVVLYH QRLVH RULJLQDO GDWD VXIIHUVORVV RI VWDWLVWLFDO  VLJQLILFDQFH
GHVSLWH WKH IDFW WKDW VWURQJ OHYHOV RI GDWD SULYDF\ LV DVVXUHG WKXV PDNLQJ WKH SULYDWL] HG GDWD SUDFWLFDOO\ YDOXHOHVV $GGLWLRQDOO\
UHVHDUFKHUVKDYHQRWHGWKDWILQGLQJHTXLOLEULXPEHWZHHQGDWDSULYDF\DQGXWLOLW\UHTXLUHPHQWVUHPDLQVLQWUDFWDEOHQHFHVVLWDWLQJWUDGH
RIIV 7KHUHIRUH DV D FRQWULEXWLRQ ZH DWWHPSW WR DGGUHVV WKLV SUREOHP E\ SURSRVLQJ D PRYLQJ DYHUDJH ILOWHULQJ PRGHO IRU QRQ
LQWHUDFWLYHGLIIHUHQWLDOSULYDF\ VHWWLQJV,QWKLV PRGHOYDULRXVOHYHOV RIGLIIHUHQWLDOSULYDF\'3 DUHDSSOLHGWRDGDWDVHWJHQHUDWLQJ
DQDVVRUWPHQWRISULYDWL]HGGDWD VHWV 7KH SULYDWL]HG GDWD LV SDVVHGWKURXJKD PRYLQJDYHUDJHILOWHUDQGWKHQHZILOWHUHG SULYDWL]HG
GDWDVHWVDUHSDVVHGWKURXJKDVHULHVRIPDFKLQHOHDUQLQJFODVVLILHUVWRJDXJHIRUGDWDXWLOLW\,IWKHFODVVLILFDWLRQDFFXUDF\PHHWVDGDWD
XWLOLW\WKUHVKROGWKHILOWHUHGGDWDVHWLVWKHQSXEOLVKHG7KHUHVWRIWKHSDSHULVRUJDQL]HGDVIROORZV,Q6HFWLRQDEULHIRYHUYLHZRI
GLIIHUHQWLDO SULYDF\ DQG WKH PRYLQJ DYHUDJH ILOWHULQJ WHFKQLTXH LV JLYHQ ,Q 6HFWLRQ H[SHULPHQW DQG HPSLULFDO UHVXOWV RI WKH
SURSRVHGPRGHODUHSUHVHQWHG)LQDOO\LQ6HFWLRQWKHFRQFOXVLRQLVJLYHQ
2. Differential privacy
,QWKHIROORZLQJVHFWLRQZHJLYHDQRYHUYLHZRQWKHZRUNLQJVRIGLIIHUHQWLDOSULYDF\DVZH KDYHDOUHDG\GHVFULEHGSUHYLRXVO\LQ
>@>@ >@ 3URSRVHG E\ 'ZRUN  GLIIHUHQWLDO SULYDF\ LV D FXUUHQW GDWD SULYDF\ SHUWXUEDWLYH SURFHVV LQ ZKLFK PDVNLQJ RI
VHQVLWLYHGDWDLVHQIRUFHGE\DGGLQJ/DSODFHQRLVHWRTXHU\ DQVZHUVIURPWKHGDWDEDVHVXFKWKDWWKHXVHUVRIWKHGDWDEDVHFDQQRWWHOO
DSDUWLIDSDUWLFXODUYDOXHKDVEHHQGLVWRUWHGLQWKDWGDWDEDVHDQGWKXVEHFRPLQJFRPSOH[IRUDQDWWDFNHUWRGHFRGHGDWDYDOXHVLQWKH
GDWDEDVH>@>@>@(VVHQWLDOO\GLIIHUHQWLDOSULYDF\FDQEHLPSOHPHQWHGXVLQJWKHIROORZLQJVWHSVDVZDVGHVFULEHGLQ>@>@>@>@
L TXHU\D GDWDEDVH LL FRPSXWHWKH PRVW VLJQLILFDQWREVHUYDWLRQ LLL GHWHUPLQHWKH /DSODFH QRLVH GLVWULEXWLRQ LYDGG /DSODFH
QRLVH GLVWULEXWLRQWR WKH TXHU\ UHVXOWVDQG Y GLVVHPLQDWH WKH SHUWXUEHG TXHU\RXWFRPH LQ HLWKHU DQ LQWHUDFWLYHRU QRQLQWHUDFWLYH
PRGH$QRQLQWHUDFWLYHPRGHRIGLVVHPLQDWLRQLVXVHGLQWKLVVWXG\
$VZDVGHVFULEHGLQ>@DQGIXUWKHUQRWHGLQ>@DQG>@WZRGDWDEDVHVD1DQGD2DUHLQGLVFHUQLEOHLIWKH\DUHGLVVLPLODUE\RQO\RQH
YDOXHVXFK WKDW ܦ ο ܦ =1$Q\GDWD SULYDF\PHWKRGݍPHHWVWKH VSHFLILFDWLRQVRIߝ-differential privacyLIWKHOLNHOLKRRGRIWKH
TXHU\UHVSRQVHRQGDWDEDVHD1DQGD2VKRXOGEHDQDORJRXVDQGVDWLVI\WKHFRQGLWLRQEHORZ>@
[()אோ]
[()אோ] ݁
7KH V\PEROVD1DQGD2UHSUHVHQWWKH WZR GDWDEDVHV PLV WKH SUREDELOLW\ RI WKH TXHU\ UHVXOW V RQD1DQGD2 qn()LVWKHSULYDF\
SHUWXUEDWLRQPHWKRGqnD1DQGqnD2LVWKHSULYDF\SHUWXUEDWLRQ PHWKRGRQTXHU\UHVXOWVIURPGDWDEDVHD1DQGD2UHVSHFWLYHO\
RLV WKH SULYDWL]HGTXHU\ UHVXOWV IURP D1DQG D2 DQG݁
P
UHSUHVHQWVWKHVPDOOH[SRQHQWLDOߝ HSVLORQ YDOXH7KHOLNHOLKRRGRI VDPH
TXHU\UXQRQD1DQG WKHQRQD2VKRXOGEHDQDORJRXV>@>@ 'LIIHUHQWLDO3ULYDF\'3ZRXOG WKXVEH LPSOHPHQWHGDV IROORZVDQG
DVLOOXVWUDWHGLQ)LJ>@>@>@>@
ܦܲ൫݂(ݔ)= ݂(ݔ)+ ܮܽ݌݈ܽܿ݁(0, ܾ)
)LJ$QRYHUYLHZRIDQLQWHUDFWLYHGLIIHUHQWLDOSULYDF\WHFKQLTXH
411
Kato Mivule and Claude Turner / Procedia Computer Science 36 ( 2014 ) 409 – 415
:KHUHDP (f(x))LVWKHGLIIHUHQWLDOSULYDF\SURFHGXUHRQWKHRULJLQDOGDWDf(x);DQG ܮܽ݌݈ܽܿ݁(0, ܾ)UHSUHVHQWVWKH /DSODFH QRLVH
JHQHUDWHGEHWZHHQDUDQJHRIDQGb, DQGXVHGWRSULYDWL]HWKHRULJLQDOTXHU\UHVXOWV7KHYDOXHbLVWKHQFRPSXWHGIRU/DSODFHQRLVH
DVIROORZV
ܾ=ο௙

7KHV\PERObLVFRPSXWHGE\WKHPD[GLIIHUHQFH ο݂GLYLGHGE\DVPDOOHSVLORQ YDOXH ߝ7KHPD[GLIIHUHQFH ο݂WKHPRVWGRPLQDQW
IRUDJLYHQTXHU\LVWKHQFRPSXWHGE\JHWWLQJWKHGLIIHUHQFHEHWZHHQWKHKLJKHVWDQGORZHVWREVHUYDWLRQVLQD1DQGD2DVIROORZV
ο݂ =ܯܽݔ|݂(ܦ) ݂(ܦ)|
)LJ7KHPRYLQJDYHUDJHILOWHULQJPRGHOIRUGL IIHUHQWLDOSULYDF\
2.1. Moving average filtering
0RYLQJDYHUDJHILOWHULQJWHFKQLTXHLVRQHRIWKHPRVWXVHGILOWHUV LQ GLJLWDO VLJQDOSURFHVVLQJD FRQYROXWLRQWKDWHPSOR\VD VLPSOH
ILOWHULQJNHUQHODQGLGHDOIRUUHGXFLQJQRLVHZKLOHNHHSLQJWKHPDLQWUDLWVRIWKHVLJQDO>@0RYLQJDYHUDJHILOWHUVZRUNE\DYHUDJLQJ
DQXPEHURISRLQWVIURPWKHLQSXWVLJQDOWRSURGXFHHDFKSRLQWLQWKHRXWSXWVLJQDODVVKRZQLQWKHIROORZLQJHTXDWLRQ>@
ݕ[݅]=
σݔ[݅+݆]
ெିଵ
௝ୀ଴ 
7KHV\PEROݔ[݅+݆]LVWKH LQSXWVLJQDOݕ[݅ ]LV WKHRXWSXW VLJQDOܯLV WKHQXPEHU RISRLQWVXVHG LQ WKH PRYLQJDYHUDJH >@$V
ZDVVKRZQE\.RYHVLUHSHDWHGXVHRIWKH PRYLQJDYHUDJHILOWHU JHQHUDWHVDQDSSUR[LPDWLRQ RIWKH*DXVVLDQILOWHU>@)LJ 
LOOXVWUDWHVWKHSURFHVVXVHGLQWKLVVWXG\WRILOWHUH[FHVVLYHQRLVHLQGLIIHUHQWLDOO\SULYDWHGDWD
3. Differential privacy (DP) experiment and empirical results
,QWKLVVHFWLRQHPSLULFDOUHVXOWVIURPLPSOHPHQWLQJERWKGLIIHUHQWLDO SULYDF\DQGILOWHULQJDUHSUHVHQWHG>@,QWKLVH[SHULPHQW
GLIIHUHQWOHYHOV RI /DSODFH QRLVH ZHUH DGGHGWRWKHRULJLQDO)LVKHU ,ULV GDWD VHWIURPWKH 8&, UHSRVLWRU\ >@JHQHUDWLQJSHUWXUEHG
SULYDWL]HGGDWDVHWV7KHİHSVLORQSDUDPHWHUXVHGWRJHQHUDWH/DSODFHQRLVHZDVILQHWXQHGXVLQJYDULRXVYDOXHVDVVKRZQLQ7DEOH
IURPİ WRİ 
7DEOH/DSODFHbYDOXHVYHUVHVİHSVLORQYDOXHV>@
Epsilon Value 6HSDO/E ¨Iİ 6HSDO:E ¨Iİ 3HWDO/E ¨Iİ 3HWDO:E ¨Iİ
İ 
İ 
İ 
İ 
İ 
İ 
İ 
İ 
İ 
İ 
İ 
İ   
İ 
İ 
$VVKRZQLQ 7DEOH DVWKHİHSVLORQ YDOXHJHWV VPDOOHUWKH /DSODFHbYDOXHJHWVELJJHUWKXVJHQHUDWLQJPRUHQRLVH)RULQVWDQFH
ZKHQWKHİHSVLORQYDOXHZDVWKHbYDOXHIRU/DSODFHQRLVHZDVDWIRUWKH6HSDOOHQJWKDWWULEXWH+RZHYHUZKHQWKH İHSVLORQ
YDOXHZDVDWWKHbYDOXHIRU/DSODFHQRLVHMXPSHGWRDPDVVLYHYDOXHIRUWKH6HSDOOHQJWKDWWULEXWH7KHVDPHFDQEH
VDLGRI WKH3HWDOOHQJWKDWWULEXWHZKHQ WKH İHSVLORQYDOXH ZDV DWWKH bYDOXHIRU/DSODFHQRLVHZDV DWYDOXH<HWZKHQWKHİ
HSVLORQ YDOXHZDV DW  WKH bYDOXHIRU /DSODFH QRLVH ZDV DW IRU WKH 3HWDO OHQJWK ,Q VXFKFDVHV WKH QRLVH OHYHO IRU
SULYDF\SURWHFWLRQZRXOGKDYHWREHJHQHUDWHGEHWZHHQDQGWRFRQFHDOYDOXHVLQWKH3HWDOOHQJWKDWWULEXWH7KLVPHDQVWKDW
412 Kato Mivule and Claude Turner / Procedia Computer Science 36 ( 2014 ) 409 – 415
IRUWKHİHSVLORQYDOXHDW/DSODFHbUDQGRPQRLVHZRXOGEHJHQHUDWHGEHWZHHQDQGbYDOXHRI+RZHYHUIRUWKHİHSVLORQ
YDOXHRIWKH/DSODFH bUDQGRPQRLVHZRXOGKDYH WREHJHQHUDWHGEHWZHHQ DQG WR SURYLGH FRQILGHQWLDOLW\ IRU
WKH6HSDO OHQJWKDWWULEXWH7KHVPDOOHUWKHİHSVLORQYDOXHVWKH JUHDWHUWKH/DSODFHQRLVHWKDW LVJHQHUDWHG :KLOHVXFKKLJKHUQRLVH
OHYHOV PLJKWJXDUDQWHH VWURQJHU OHYHOV RI SULYDF\ WKHFKDOOHQJH LV LQ ILQG LQJ WKH DSSURSULDWH İHSVLORQ YDOXH V WKDW ZRXOG JHQHUDWH
VXLWDEOH/DSODFHQRLVHOHYHOVIRUFRQILGHQWLDOLW\ZKLOHPHHWLQJDFFHSWDEOHOHYHOVRIGDWDXWLOLW\
3.1. DP classification accuracy results
,QWKLV VHFWLRQFODVVLILFDWLRQ DFFXUDF\UHVXOWV RIGLIIHUHQWLDO SULYDF\EDVHGGDWDVHWV DWYDULRXVİHSVLORQOHYHOVDUHSUHVHQWHG>@,Q
WKHH[SHULPHQWIRUHDFK İHSVLORQLQ7DEOH DSHUWXUEHG SULYDWL]HGGDWD VHWZDVJHQHUDWHG7KHJHQHUDWHG SHUWXUEHGGDWD VHWZDV
WKHQSDVVHG WKURXJKD VHULHVRIPDFKLQHOHDUQLQJFODVVLILHUV DQGWKHFODVVLILFDWLRQDFFXUDF\ZDV PHDVXUHG7KHGDWD VHWWKDW PHWWKH
WKUHVKROG FULWHULD ZDV FKRVHQ IRUGLVVHPLQDWLRQ RWKHUZLVH SDUDPHWHUV LQ WKH GDWD SULYDF\ SURFHVV DUH UHILQHG LQ WKLV FDVH WKH İ
HSVLORQYDOXH
7DEOH&ODVVLILFDWLRQDFFXUDF\RI'3GDWDVHWV>@
Epsilon Value KNN NN NB DT AdaBoost
İ 
İ 
İ 
İ 
İ 
İ 
İ(0.1009) 
İ 
İ 
İ 
İ 
İ 
İ 
İ 
,Q7DEOHWKHFODVVLILFDWLRQDFFXUDF\IRUWKHGLIIHUHQWLDOO\SULYDWHGDWDVHWVLV VKRZQIRU.111HXUDO1HWV1DwYH%D\HV'HFLVLRQ
7UHHV DQG $GD%RRVW FODVVLILHUV 7KHUHVXOWVSUHVHQWHGLQ7DEOH  DUH UHSUHVHQWDWLYH RI GLIIHUHQWLDOO\ SULYDWL]HG GDWD VHWV EHIRUH
ILOWHULQJZDVDSSOLHG$WRWDORIWULDOVZHUHUXQIRUWKLVH[SHULPHQWRQHIRUHDFKRIWKHGLIIHUHQWLDOO\SULYDWL]HGGDWDVHWV
)LJD&ODVVLILFDWLRQDFFXUDF\RI'3GDWDVHWV)LJE&ODVVLILFDWLRQDFFXUDF\IRUILOWHUHG'3EDVHGGDWDVHWV>@
(DFKYDOXHLQ7DEOHUHSUHVHQWVWKHFODVVLILFDWLRQDFFXUDF\RIDGLIIHUHQWLDOO\SULYDWL]HGGDWDVHW,WLVLQWHUHVWLQJWRQRWHWKDWIURPWKH
FODVVLILFDWLRQDFFXUDF\UHVXOWV SUHVHQWHG LQ 7DEOH  QRQH RI WKH GLIIHUHQWLDOO\ SULYDWL]HG GDWD VHWVDFKLHYHG FODVVLILFDWLRQ DFFXUDF\
DERYHSHUFHQW7KLVFDQEHFOHDUO\VHHQLQ)LJDLQZKLFKWKHxD[LVUHSUHVHQWVWKHYDULRXVİHSVLORQYDOXHVIURPWKHODUJHVWWR
WKH VPDOOHVW İHSVLORQ YDOXH 7KH yD[LV UHSUHVHQWV WKH FODVVLILFDWLRQ DFFXUDF\ DQG HDFK GLIIHUHQW VHULHV RU OLQHV UHSUHVHQWV WKH
FODVVLILFDWLRQDOJRULWKP XVHGLQWKHH[SHULPHQW$V)LJDVKRZVDVWKHİHSVLORQYDOXH GHFUHDVHGWKHFODVVLILFDWLRQ DFFXUDF\RI
WKDWSDUWLFXODUGDWDVHWGURSSHGIURPDERXWSHUFHQWFODVVLILFDWLRQDFFXUDF\WR DERXWDQ DYHUDJHRI SHUFHQW 7KLVPHDQVWKDWRQ
DYHUDJH DERXW  SHUFHQW RI WKH UHFRUGV ZHUH PLVFODVVLILHG :KLOH WKH UHVXOWV VKRZ WKDW GDWD XWLOLW \ ZDV ORZ EDVHG RQ WKH ORZ
FODVVLILFDWLRQ DFFXUDF\ UHVXOWV GLIIHUHQWLDO SULYDF\ LV VKRZQ E\ WKHVH UHVXOWV WR SUHVHQWVWURQJ SULYDF\ JXDUDQWHHV WKDW DQ DWWDFNHU
ZRXOG ILQG LW GLIILFXOWWR UHFRQVWUXFW VXFK D GDWD VHW 7KH FKDOOHQJH WKH Q LV WR ILQG DQ RSWLPDO EDOD QFH EHWZHHQ WKH VWUR QJ SULYDF\
SURYLGHGE\GLIIHUHQWLDOSULYDF\DQGGDWDXWLOLW\
413
Kato Mivule and Claude Turner / Procedia Computer Science 36 ( 2014 ) 409 – 415
3.2. Classification results after filtering DP-based data
,QWKLVVHFWLRQH[SHULPHQWDOUHVXOWVIURPDSSO\LQJILOWHULQJRQGLIIHUHQWLDOO\SULYDWL]HGGDWDVHWVDUHSUHVHQWHG>@,QWKHH[SHULPHQW
ILOWHULQJZDVDSSOLHGWRHDFKRIWKHGLIIHUHQWLDOO\SULYDWL]HGGDWDVHWV7KHQHZILOWHUHGGLIIHUHQWLDOO\SULYDWL]HGGDWDVHWVZHUHWKHQ
VXEMHFWWRDVHULHVRIPDFKLQHOHDUQLQJFODVVLILHUVDQGWKHFODVVLILFDWLRQDFFXUDF\ZDVUHWXUQHGDVVKRZQLQ7DEOH
7DEOH&ODVVLILFDWLRQDFFXUDF\UHVXOWVRI'3EDVHGGDWDDIWHUDSSO\LQJILOWHULQJ>@
Epsilon Value KNN NN NB DT AdaBoost
İ 
İ(0.9998) 
İ 
İ 
İ 
İ 
İ 
İ(0.1) 
İ 
İ 
İ 
İ 
İ 
İ 
5HVXOWVLQ 7DEOH  VKRZWKH FODVVLILFDWLRQDFFXUDF\RIHDFKILOWHUHG GLIIHUHQWLDOO\SULYDWL]HGGDWD VHWZLWKJUHDWLPSURYHPHQWZKHQ
FRPSDUHG WR WKH QRQILOWHUHG GLIIHUHQWLDOO\ SULYDWL]HG GDWD VHWV LQ 7DEOH  )RU LQVWDQFH WKH FODVVLILFDWLRQ DFFXUDF\ RI WKH
GLIIHUHQWLDOO\ SULYDWL]HG GDWD VHW JHQHUDWHGXVLQJ İ WKH 1HXUDO 1HW FODVVLILFDWLRQ DFFXUDF\ ZDV REVHUYHG DW  SHUFHQW IRU QRQ
ILOWHUHG GLIIHUHQWLDOO\ SULYDWL]HG GDWD ,Q FRPSDULVRQ WKH FODVVLILFDWLRQ DFFXUDF\ IRU WKH VDPH GDWD VHW ZDV DW S HUFHQWDIWHU
DSSO\LQJILOWHULQJDQ LPSURYHPHQW RI DSSUR[LPDWHO\  :KLOHWKH W\SHRI FODVVLILFDWLRQ DOJRULWKP XVHG GRHV PDWWHU GXH WRWKH
LQKHUHQW SDUDPHWHUV LQ WKDW FODVVLILHU RYHUDOO WKH FODVVLILFDWLRQ DFFXUDF\ RI WKH GLIIHUHQWLDOO\ SULYDWL]HG GDWD VHWV GLG LPSURYH
VLJQLILFDQWO\ DV REVHUYHG LQ WKHVH SUHOLPLQDU\ UHVXOWV 7KH RYHUDOO LPSURYHPHQW LQ FODVVLILFDWLRQ DFFXUDF\ UHVXOWV DIWHU DSSO\LQJ
ILOWHULQJDUHVKRZQLQ)LJEWKHxD[LVUHSUHVHQWVWKHYDULRXVİHSVLORQYDOXHVIURPWKHODUJHVWWRWKHVPDOOHVWİHSVLORQYDOXH7KH
yD[LV UHSUHVHQWV WKH FODVVLILFDWLRQ DFFXUDF\ DQG HDFK GLIIHUHQW VHULHV RU OLQHV UHSUHVHQWV WKH FODVVLILFDWLRQ DOJRULWKP XVHG LQWKH
H[SHULPHQW$V)LJEVKRZVDVWKHİHSVLORQYDOXHGHFUHDVHGWKHFODVVLILFDWLRQDFFXUDF\RIWKDWSDUWLFXODUGDWDVHWLPSURYHG)RU
H[DPSOHWKH DYHUDJHFODVVLILFDWLRQDFFXUDF\IRU WKH GLIIHUHQWLDOO\SULYDWL]HGGDWDVHWV DIWHU ILOWHULQJ ZDVIRU.11DQG 
IRU1HXUDO 1HWV ,Q FRPSDULVRQ WKH FODVVLILFDWLRQ DFFXUDF\ EHIRUH ILOWHULQJ ZDV DSSOLHG ZDV DSSUR[LPDWHO\ DQ DYHUDJH RI IRU
.11DQGRQDYHUDJHIRU1HXUDO1HWZRUNV7KHLPSURYHPHQWZDVDERXWSRLQWVIRUWKH.11DIWHUDSSO\LQJILOWHULQJ
3.3. Threshold determination for the filtered DP-based data
,QWKLVVHFWLRQH[SHULPHQWDOUHVXOWVRQWKUHVKROGGHWHUPLQDWLRQDUHSUHVHQWHG>@7KHWKUHVKROGZDVKHXULVWLFDOO\FKRVHQDVWKHPD[
YDOXHEHWZHHQWKHPD[PLGSRLQWDQGPD[ PHDQYDOXHVDVVKRZQLQ 7DEOH7KHJRDOZDV WRVHOHFWGDWD VHWVWKDW PHWWKHWKUHVKROG
FULWHULDIRUGDWDXWLOLW\2QO\ILOWHUHGGLIIHUHQWLDOO\SULYDWL]HGGDWDVHWVZHUHXVHGLQIRUWKLVSRUWLRQRIWKHH[SHULPHQW
7DEOH7KUHVKROG'HWHUPLQDWLRQIRU)LOWHUHG'3EDVHGGDWD>@
KNN NN NB DT ADABOOST MAX
0,'
32,17

0HDQ
MAX 94.57 88.90 76.38 75.99 60.48 94.57
7KHUHVXOWVLQ7DEOH  VKRZ WKH PHDQ DQG PLGSRLQW YDOXHV IRU WKHFODVVLILFDWLRQDFFXUDF\UHVXOWV IURP HDFK ILOWHUHG GLIIHUHQWLDOO\
SULYDWL]HGGDWDVHW7KHPD[YDOXHRIWKH PHDQDQGPLGSRLQWYDOXHVZHUHVHOHFWHGDQGWKHPD[RIWKHPD[PHDQDQGPD[PLGSRLQW
FKRVHQVXEVHTXHQWO\$VVKRZQLQ7DEOHWKHFKRVHQWKUHVKROGYDOXHLQWKLVFDVHWKHFODVVLILFDWLRQDFFXUDF\ZDV,QWKLVFDVH
D GDWD VHW WKDW PHHWV WKH WKUHVKROG FULWHULD ZDV FKRVHQ IRU GLVVHPLQDWLRQ $ GLIIHUHQWLDOO\ SULYDWH GDWD VHW ZLWK  SHUFHQW
FODVVLILFDWLRQDFFXUDF\ZRXOGRIIHUERWKSULYDF\DQGGDWDXWLOLW\7KHWUDGHRIILQWKLVFDVHZRXOGLQFOLQHWRZDUGVPRUHDFFXUDF\DQG
WKXVXWLOLW\VLQFHWKHJRDOLVWRSURYLGHXVHIXOV\QWKHWLFGDWDVHWVZKLOHRIIHULQJDFFHSWDEOHOHYHOVRIFRQILGHQWLDOLW\
3.4. Statistical analysis of DP-based data
,Q WKLV VHFWLRQDQ DVVHVV PHQW RI WKH VWDWLVWLFDOWUDLWV RI WKH RULJLQDO GDWD GLIIHUHQWLDOSULYDWL]HG GDWDDQG WKH ILOWHUHG GLIIHUHQWLDO
SULYDWL]HG GDWD LV GRQH>@ 7KH JRDO ZDV WR ILQG RXW LI GLIIHUHQWLDO SULYDF\ PDLQWDLQHG VRPH RI WKH GHVFULSWLYH VWDWLVWLFV RI D
414 Kato Mivule and Claude Turner / Procedia Computer Science 36 ( 2014 ) 409 – 415
SULYDWL]HGGDWDVHW7KHGHVFULSWLYHVWDWLVWLFVDUHVKRZQLQ7DEOHZLWKWKHVWDWLVWLFYDOXHVIRUWKHRULJLQDOGDWDGLIIHUHQWLDOSULYDWL]HG
GDWDDQGWKHILOWHUHGGLIIHUHQWLDOSULYDWL]HGGDWD7KHVWDWLVWLFDOYDOXHVIRUHDFKDWWULEXWHDUHVKRZQQDPHO\6HSDOOHQJWK6HSDOZLGWK
3HWDOOHQJWKDQG3HWDOZLGWK)URPWKHREVHUYDWLRQVLQ7DEOHWKHPHDQYDOXHIRUWKHRULJLQDO6HSDOOHQJWKZDVKRZHYHUDIWHU
DSSO\LQJ GLIIHUHQWLDO SULYDF\ RQ WKH RULJLQDO GDWD VHW WKH PHDQ YDOXH IRU WKH 6HSDO OHQJWK GURSSHG WR  0HDQZKLOHWKH PHDQ
YDOXHVIRUWKH6HSDOZLGWKLQWKHRULJLQDOGDWDZDVEXWDOPRVWGRXEOHGWRIRUWKHGLIIHUHQWLDOSULYDWL]HGGDWD,WLVLQWHUHVWLQJ
WR QRWH WKDW WKH PHDQ YDOXHV DIWHU D SSO\LQJ ILOWHULQJ RQ WKH GLIIHUHQWLDO SULYDWL]HG GDWD DUH PDLQWDLQHGDQG DUH QRW IDU RII IURP WKH
GLIIHUHQWLDOSULYDWL]HGGDWDZLWKRXWILOWHULQJ
7DEOH'HVFULSWLYH6WDWLVWLFVIRUERWK'3DQG'3EDVHGGDWDDIWHUDSSO\LQJILOWHULQJ>@
Statistics Sepal L Sepal W Petal L Petal W
2ULJLQ0HDQ
2ULJLQ0RGH
2ULJLQ0HGLDQ
2ULJLQ0D[
2ULJLQ0LQ
2ULJLQ6W'HY
2ULJLQ9DU
'36\QWK0HDQ
'36\QWK0RGH1$1$1$1$
'36\QWK0HGLDQ
'36\QWK0D[
'36\QWK0LQ   
'36\QWK6W'HY
'36\QWK9DU
)LOWHUHG'30 HDQ
)LOWHUHG'30 RGH1$1$1$1$
)LOWHUHG'30 HGLDQ
)LOWHUHG'30 D[
)LOWHUHG'30 LQ   
)LOWHUHG'36W 'HY
)LOWHUHG'39DU
)RULQVWDQFHWKHPHDQYDOXHVIRUWKH6HSDOOHQJWK6HSDOZLGWK3HWDOOHQJWKDQG3HWDOZLGWKDWWULEXWHVDUHDQG
UHVSHFWLYHO\ IRU QRQ)LOWHUHG '3 GDWD +RZHYHU IRU WKH )LOWHUHG '3 GDWD WKH PHDQ YD
OXHVZHUHDQG
UHVSHFWLYHO\7KHGLIIHUHQFHVEHWZHHQWKHPHDQYDOXHVIRUWKHQRQILOWHUHG'3DQGILOWHUHG'3GDWDLVPLQLPDO7KHUHIRUHIURPWKHVH
UHVXOWV LW FRXOG EH VXJJHVWHG WKDW ILOWHULQJ '3 GDWD G RHV PDLQWDLQ WKH PHDQ VWDWLVWLFDO SURSHUW\ ,W LV LPSRUWDQW WR QRWH WKDW RWKHU
GHVFULSWLYHVWDWLVWLFDOWUDLWVVXFKDVWKHVWDQGDUGGHYLDWLRQZHUHQRWPDLQWDLQHGDVVKRZQLQ7DEOH
)LJ'HVFULSWLYHVWDWLVWLFVIRU'3DQG)LOWHUHG'3EDVHGGDWD>@
$GGLWLRQDOO\DV SUHVHQWHGLQ)LJ ZKLOH '3GRHVQRW PDLQWDLQWKHVNHOHWDOVWDWLVWLFDOVWUXFWXUH RIWKH RULJLQDOGDWDILOWHUHG'3GLG
PDLQWDLQWKH VWDWLVWLFDO VWUXFWXUH RI WKH QRQILOWHUHG'3GDWD,Q )LJWKHxD[LV UHSUHVHQWVWKHYDULRXV VWDWLVWLFDO WUDLWVVXFKDVWKH
PHDQVWDQGDUGGHYLDWLRQ DQGPD[YDOXH7KH yD[LVUHSUHVHQWVWKHQXPHULFDO YDOXHVRI WKHVWDWLVWLFDOWUDLWV2QWKHxD[LVDUHWKUHH
VXEJUDSKV ZLWK WKH ILUVW VXEJUDSK UHSUHVHQWLQJ WKH VWDWLVWLFDO WUDLWV RI WKH RULJLQDO GDWD WKH PLGGOH VXEJUDSK UHSUHVHQWLQJ WKH
VWDWLVWLFDOWUDLWVRIWKH '3EDVHGGDWD ZLWKRXWILOWHULQJDQG ODVWO\WKHWKLUGVXEJUDSKRQ WKHULJKWUHSUHVHQWLQJWKH ILOWHUHG'3EDVHG
GDWD(DFKOLQHVHULHVLQHDFKVXEJUDSKUHSUHVHQWVWKH6HSDOOHQJWK6HSDOZLGWK3HWDOOHQJWKDQG3HWDOZLGWK
7DEOH,QIHUHQFH6WDWLVWLFVIRU'3EDVHGGDWD>@
415
Kato Mivule and Claude Turner / Procedia Computer Science 36 ( 2014 ) 409 – 415
Statistics Sepal L Sepal W Petal L Petal W
&RUU'36\QWK2ULJLQ'DWD 
&RUU)LOWHUHG'32ULJLQ'D WD
&RY'36\QWK2ULJLQ'DWD 
&RY)LOWHUHG'32ULJLQ'DWD
,Q7DEOHWKHFRUUHODWLRQDQGFRYDULDQFHYDOXHVEHWZHHQWKH'3GDWD)LOWHUHG'3GDWDDQGWKHRULJLQDOGDWDDUHSUHVHQWHG>@7KH
FRUUHODWLRQEHWZHHQ'3EDVHGGDWDDQGWKHRULJLQDOGDWDLVDWIRUWKH 6HSDOOHQJWKDWWULEXWHYDOXHVZKLOH WKHFRUUHODWLRQYDOXHV
EHWZHHQWKHILOWHUHG'3EDVHGGDWDDQGWKHRULJLQDOLVDW ,QWKLVFDVHERWKFRUUHODWLRQYDOXHVDUHYHU\ORZDQGLQGLFDWHWKDWWKHUH
LV QR UHODWLRQVKLS EHWZHHQ WKH GDWD VHWV DQ LQGLFDWLRQ RI VWUR QJGDWD SULYDF\ VLQFH WKH DWWDFNHU ZRXOG KDYH D YHU\ GLIILFXOW WLPH
UHFRQVWUXFWLQJ VXFKGDWDVHWV+RZHYHUVXFKORZFRUUHODWLRQYDOXHVFRXOGDOVRVXJJHVWWKDWWKHGDWD XWLOLW\OHYHOVRI'3 DQGILOWHUHG
'3GDWDDUHYHU\ORZ
4. Conclusion
(PSLULFDOUHVXOWVIURPWKLVVWXG\VKRZHGWKDWWKHVLJQDOSURFHVVLQJWHFKQLTXHRI ILOWHULQJPLJKWKDYHDQHIIHFWRQUHGXFLQJH[FHVVLYH
QRLVH GXH WRWKH DSSOLFDWLRQ RI GLIIHUHQWLDO SULYDF\RQ GDWD )LOWHUHG '3EDVHG SULYDWL]HG GDWD RXWSHUIRUPHG UHJXODU'3EDVHG
SULYDWL]HG GDWD ZLWK KLJKHU FODVVLILFDWLRQ DFFXUDF\+RZHYHU WKH SUREOHP RI SULYDF\ YHUVXV XWLOLW\ VWLOO SHUVLVWV DQG PRUH
H[SHULPHQWDODQG HPSLULFDOUHVHDUFKZLWKD PXOWLIDFHWHGDSSURDFKLVQHHGHGWRILQGVROXWLRQVRQDFDVHE\FDVHEDVLVDV HDFKGDWD
VHW PLJKW KDYH YDULRXVSULYDF \UHTXLUH PHQWV <HW VWLOO  DQXPEHU RI SDUDPHWHUV DUH SUHVHQW LQ ERWK WKH G DWD SULYDF\DQG  ILOWHULQJ
SURFHVVHVWKDWZRXOGUHTXLUHDQLQYHVWLJDWLRQLQWR KRZWRRSWLPDOO\ILQHWXQH VXFK SDUDPHWHUVIRUHYHQ EHWWHUUHVXOWV)XWXUH ZRUNV
LQFOXGHLQYHVWLJDWLQJWKHDSSOLFDWLRQRIRWKHUILOWHULQJDQGVLJQDOSURFHVVLQJWHFKQLTXHVQRWFRYHUHGLQWKLVSDSHU
References
 $.UDXVHDQG(+RUYLW]³$8WLOLW\7KHRUHWLF$SSURDFKWR3ULYDF\LQ2QOLQH6HUYLFHV´-$UWLI,QWHOO5HVYROSS±
 5&::RQJ$ :&)X.:DQ JDQG -3HL ³0LQLPDOLW \$WWDFN LQ3ULYDF \3UHVHUYLQJ'DWD3XEOLVKLQJ´3URFUG,QW&RQI9HU\ODUJHGDWDEDVHVSS
±
 7 /L DQG1 /L ³2QWKH WUDGHRIIEHWZHHQ SULYDF\DQG XWLOLW\LQGDWDSXEOLVKLQJ´LQ3URFHHGLQJVRIWKHWK$&0,QWHUQDWLRQDO&RQIHUHQFHRQ .QRZOHGJH
'LVFRYHU\DQG'DWD0LQLQJSS±
 -5%DPEDXHU.0XUDOLGKDUDQG56DUDWK\³)RRO¶V*ROGDQ,OOXVWUDWHG&ULWLTXHRI'LIIHUHQWLDO3ULYDF\´9DQGHUELOW-(QWHUWDLQ7HFKQRO/DZYRO
S3DSHU1R±
 & 'ZRUN ³'LIIHUHQWLDO3ULYDF\´ LQ $XWRPDWD ODQJXDJHVDQG SURJUDPPLQJ YRO QR G 0 %XJOLHVL % 3UHQHHO9 6DVVRQH DQG , :HJHQHU (GV
6SULQJHUSS±
 . 0LYXOH ³8WLOL]LQJ 1RLVH $GGLWLRQ IRU 'DWD 3ULYDF\  DQ 2YHUYLHZ´ LQ 3URFHHGLQJV RI WK H ,QWHUQDWLRQDO &RQ IHUHQFH RQ ,QIRUPDWLRQ DQG .QRZOHGJH
(QJLQHHULQJ,.(SS±
 .0LYX OH& 7XUQHU DQG6 <-L³7RZDUGV$'LIIHUHQWLDO3ULYDF\DQG8WLOLW\3UHVHUYLQJ0DFKLQH/HDUQLQJ&ODVVLILHU´LQ3URFHGLD&RPSXWHU6FLHQFH
YROSS±
 .0LYXOHDQG & 7XUQHU³$&RPSDUDWLYH $QDO\VLVRI'DWD 3ULYDF\DQG 8WLOLW\3DUDPHWHU$GMXVWPHQW8VLQJ 0DFKLQH/HDUQLQJ&ODVVLIL FDWLRQ DV D *DXJH´
3URFHGLD&RPSXW6FLYROSS±
 .0XUDOLGKDUDQG 5 6DUDWK\³'RHV'LIIHUHQWLDO3ULYDF\3URWHFW 7HUU\ *URVV ¶ 3ULYDF\"´ LQ,Q 3ULYDF\LQ 6WDWLVWLFDO'DWDEDVHV YRO 6SULQJHU9HUODJ
%HUOLQ+HLGHOEHUJSS±
 & 'ZRUN ³'LIIHUHQWLDO 3ULYDF\ $ 6XUYH\ RI 5HVXOWV´ LQ 7KHRU\ DQG $SSOLFDWLRQV RI 0RGHOV RI &RPSXWDWLRQ /1&6  6SULQJHU9HUODJ %HUOLQ
+HLGHOEHUJSS±
 56DUDWK\DQG.0XUDOLGKDU³6RPH$GGLWLRQDO,QVLJKWVRQ$SSO\LQJ 'LIIHUHQWLDO 3ULYDF\ IRU1XPHULF 'DWD´ LQ3UL YDF\LQ 6WDWLVWLFDO'DWDE DVHV9RO 
QR'ZRUN6SULQJHU%HUOLQ+HLGHOEHUJSS±
 6:6PLWK7KH6FLHQWLVWDQG(QJLQHHU¶V*XLGHWR'LJLWDO6LJQDO3URFHVVLQJ&DOLIRUQLD7HFKQLFDO3XEOLVKLQJSS±
 3.RYHVL³)DVW$OPRVW*DXVVLDQ)LOWHULQJ´LQ,QWHUQDWLRQDO&RQIHUHQFHRQ'LJLWDO,PDJH&RPSXWLQJ7HFKQLTXHVDQG$SSOLFDWLRQVSS±
 .%DFKHDQG0 /LFKPDQ³,ULV)LVKHU'DWDVHW 8&,0DFKLQH/HDUQLQJ5HSRVLWRU\´8Q LYHUVLW\RI &DOLIRUQLD 6FKRRO RI,QIRUPDW LRQDQG &RPSXWHU 6FLHQF H
,UYLQH&$
 .0LYXOH³$Q ,QYHVWLJDWLRQ2I'DWD3ULYDF\$QG 8WLOLW\8VLQJ0DFKLQH/HDUQLQJ$V $*DXJH´'LVVHUWDWLRQ&RPSXWHU6FLHQFH 'HSDUWPHQW%RZLH6WDWH
8QLYHUVLW\3UR4XHVW1R$YDL ODEOHRQOLQHKWWSSTGWRSHQSURTXHVWFRPSXEQXPKWPO
... Another utility improvement is instead of sending just masked profiles, sending filtered masked profiles. Considering that the attacker has the ability to discover and use the best filter setting and that even analyzing the filtered data, the privacy of the consumer is still preserved due the differential privacy constraint [25], one may argue that consumers may disclose already filtered profiles for utility improvement. ...
... For other types of data sets (i.e, not electrical consumption data), Mivule et al. [25] suggested that, filtering differentially private data maintains some statistical properties such as sum. In fact, according to our experiments, there is no utility improvement for billing purposes because the computed total consumption in the end of the month using a filtered masked profile is statistically the same of using a non-filtered one. ...
Article
Full-text available
Smart meters read electric usage and enable power providers to collect detailed consumption data from consumers. Based on this data, power providers can perform and improve many services such as differential tariffs and load monitoring. However, these readings also gather personal information that can be intrusive and threaten consumers’ privacy. Consequently there is an urgent need to address how to protect consumers’ privacy when using smart meter systems. We propose a lightweight approach for offering privacy using noise addition. Since the consumer behavior is very correlated with appliance usage, we measure the privacy level achieved by appliances through the state of the art in privacy (i.e., differential privacy model) and evaluate a filtering attack to eliminate the added noise. The utility is validated in a discussion regarding the smart meter benefits and an evaluation if they can still be provided when using our proposed approach.
... As one of choices to describe a future trend, an analysis time series can be applied to reflect dynamic variable from one time to another [7]. From the previous researches using Least Square [1][2][3][4][5][6] and Moving Average [7][8][9][10][11][12][13] Methods, some results of analyzed data have been gained to help future prediction. Prediction is a management process to help decision making. ...
Article
Full-text available
p>Poor population in South Kalimantan recently shows a decreased number for the last three years, compared to few previous years. The number of poor population differs from time to time. This dynamical scaled number has actually been a problem for South Kalimantan local government to take proper policies to solve this matter. It will then be necessary to predict potential number of poor population in the next year as the basis of subsequent policy making. This research will apply both Least Square and Moving Average methods as measurement to count prediction values. From the result, we can say that prediction analysis using those two methods is valid for predicting acquired number of potential people population based on its previous data due to its closest result to the actual condition. Reviewing the test result of last three years, the applied least square method shows validity of 92, 8%. Meanwhile, the applied moving average method shows validity of 98,8% both are considered valid.</p
... Time domain smoothing reduces the contribution of high-frequency noise in the data, whereas spatial smoothing averages signals from poor channels with the surrounding fNIRS channels, reducing the effect of the noisy channel while still preserving some of its signal [46]. The moving average type of smoothing works by averaging a number of data points together, reducing high-frequency fluctuations [47]. Gaussian smoothing involves a Gaussian weighting function, which multiplies the value of each point according to where it is on the distribution. ...
Article
Full-text available
FNIRS pre-processing and processing methodologies are very important—how a researcher chooses to process their data can change the outcome of an experiment. The purpose of this review is to provide a guide on fNIRS pre-processing and processing techniques pertinent to the field of human motor control research. One hundred and twenty-three articles were selected from the motor control field and were examined on the basis of their fNIRS pre-processing and processing methodologies. Information was gathered about the most frequently used techniques in the field, which included frequency cutoff filters, wavelet filters, smoothing filters, and the general linear model (GLM). We discuss the methodologies of and considerations for these frequently used techniques, as well as those for some alternative techniques. Additionally, general considerations for processing are discussed.
... For convenience, the start time of step signal is set to 1000 s. Before compensation, the moving-average filter (The filter order is 9) is used to reduce the impact of noise [17]. ...
Article
Full-text available
Co Self-Powered Neutron Detector is confronted with the problem of material consumption, which causes the response current can neither reflect the change of neutron flux in time nor be proportional to the neutron flux. In this paper, a deconvolution-based method is established to solve this problem. First of all, a step signal of neutron flux is taken as an example to analyze its performance. When the material consumption of Co SPND is 10%, after compensation, the response current can be in correspondence of neutron flux. Finally, the effects of this model in different Signal-to-Noise Ratio are analyzed, which fully confirms the truth of its excellent performance for compensating Co SPND's signal.
Article
The fast digital lock-in amplifier (FDLIA) realizes the detection of signal amplitude and phase at a high speed, but the signal in odd harmonic will interfere with the algorithm. In order to improve the defects of the FDLIA and enhance its practicability, in this paper, moving average filters with different lengths are set for each uncorrelated interference, which can effectively suppress each interference and its related interferences through addition operation. When other interferences of one channel signal is suppressed, its data can be continuously used for signal processing of other channels, keeping the efficiency of FDLIA. Simulated and practical experiments show this method can almost completely suppress channel interference, which provides a scheme for high-speed and high-precision detection of multi-channel signals.
Article
The collusion attack combines multiple multimedia files into one new file to erase the user identity information. The traditional anti-collusion methods (which aim to trace the traitors) can defend the collusion attack, but they cannot well defend some hybrid collusion attacks (e.g., a collusion attack combined with desynchronization attacks). To address this issue, we propose a frequency spectrum modification process (FSMP) to defend the collusion attack by significantly downgrading the perceptual quality of the colluded file. The severe perceptual quality degradation can demotivate the attackers from launching the collusion attack. Because FSMP is orthogonal to the existing traitor-trace-based methods, it can be combined with the existing methods to provide a double-layer protection against different attacks. In FSMP, after several signal processing procedures (e.g., uneven framing and smoothing), multiple signals (called FSMP signals) can be generated from the host signal. Launching collusion attack using the generated FSMP signals would lead to the energy disturbance and attenuation effect (EDAE) over the colluded signals. Due to the EDAE, FSMP can significantly degrade the perceptual quality of the colluded audio file, thereby thwarting the collusion attack. In addition, FSMP can well defend different hybrid collusion attacks. Theoretical analysis and experimental results confirm the validity of the proposed method.
Chapter
With rapid economic growth, industrial processes are confronted with great pressure on power reservation and emission reduction owing to legal requirements and environmental problems. These problems can be solved by accurately electric load forecasting. These forecasting results can be used for production planning, fault diagnose, process optimization, etc. Based on the hybrid PSO-LSSVM (Particle Swarm Optimization—Least Squares Support Vector Machine) algorithm, this study develops a data-driven approach of energy consumption forecasting model for the industrial short-term electric load. The lag autocorrelation function is applied to choose input variables. The electricity loads from two diverse paper mills are acquired to validate the model. The forecasting models based on other hybrid machine learning algorithms are also analyzed as the contrastive cases. The forecasting results show that PSO-LSSVM model has higher precision than the other two hybrid forecasting models. The forecasting performance with 0.17% of error could meet the requirement of industrial application of the papermaking process.
Conference Paper
Parallel execution uses the power of multiple system simultaneously thus comes out to be an efficient approach to handle and process complex problems producing result in less execution time. Present paper represents implementation of a Cache-oblivious algorithm for de-noising of corrupted images using parallel processing approach. In present era, there is a need to work with large sized image. Sequential execution of any process will results in long time of execution ultimately degradation of performance. This paper focuses to implement the algorithm on distributed objects by Cluster using RMI and utilize the concept of multithreading to enhance the depth of distributed parallel technology.
Conference Paper
Full-text available
Cloud based distributed systems rely on scheduling and resources allocation to function. In complex distributed systems a distribution of many jobs of different types is required. At the same time, a problem of virtual machines migration to physical servers must be solved. Therefore, configuration of a cloud system may be very dynamic, meaning that not only number of existing computational servers but also their location on physical servers might change. Optimal control strategies aimed to solve these problems are effective only when updated information about system's components is available. However, gathering this information from many distributed components of a cloud system, such as physical nodes or virtual machines may significantly decrease overall performance. These problems can be solved by applying different optimization techniques such as multi-agent approach. Agents decide if the information is outdated and needs to be updated by them. This paper describes a cloud system architecture that uses agents of different types. Agents' algorithms and their interaction schemes are defined. Software implementation in form of software environment is presented. Simulation experiments to compare performance of the system when using default monitoring methods and a multi-agent approach were conducted.
Conference Paper
This paper presents an ontology-based approach to the problem of jobs scheduling in case where jobs are processed by applications running in virtual environments and number of applications and their performance varies over time. Using ontology-based framework brings benefits when system has a varying number of components and their performing properties are also non-constant. The work is focused on ontology model needed to organize information exchange for intelligent agents embedded into virtual machines and gathering information about applications performance. In cases when jobs of one type can be processed by several applications having different performance, the existence of optimal threshold queuing policy has been proven earlier. It can reduce the average job processing time. In order to calculate thresholds we need relevant information about active applications and their current performance, the rate of jobs stream, the number of jobs in the queues, etc. The presented approach solves the problem of effective gathering of relevant information about the system state based on intelligent agents interaction where each intelligent agent uses ontology to publish only information about changes that are relevant to decision making. This reduces the system’s overhead for monitoring of ongoing parameters.
Thesis
Full-text available
Abstract (Summary) The purpose of this investigation is to study and pursue a user-defined approach in preserving data privacy while maintaining an acceptable level of data utility using machine learning classification techniques as a gauge in the generation of synthetic data sets. This dissertation will deal with data privacy, data utility, machine learning classification, and the generation of synthetic data sets. Hence, data privacy and utility preservation using machine learning classification as a gauge is the central focus of this study. Many organizations that transact in large amounts of data have to comply with state, federal, and international laws to guarantee that the privacy of individuals and other sensitive data is not compromised. Yet at some point during the data privacy process, data loses its utility - a measure of how useful a privatized dataset is to the user of that dataset. Data privacy researchers have documented that attaining an optimal balance between data privacy and utility is an NP-hard challenge, thus an intractable problem. Therefore we propose the classification error gauge (x-CEG) approach, a data utility quantification concept that employs machine learning classification techniques to gauge data utility based on the classification error. In the initial phase of this proposed approach, a data privacy algorithm such as differential privacy, Gaussian noise addition, generalization, and or k-anonymity is applied on a dataset for confidentiality, generating a privatized synthetic data set. The privatized synthetic data set is then passed through a machine learning classifier, after which the classification error is measured. If the classification error is lower or equal to a set threshold, then better utility might be achieved, otherwise, adjustment to the data privacy parameters is made and then the refined synthetic data set is sent to the machine learning classifier; the process repeats until the error threshold is reached. Additionally, this study presents the Comparative x-CEG concept, in which a privatized synthetic data set is passed through a series of classifiers, each of which returns a classification error, and the classifier with the lowest classification error is chosen after parameter adjustments, an indication of better data utility. Preliminary results from this investigation show that fine-tuning parameters in data privacy procedures, for example in the case of differential privacy, and increasing weak learners in the ensemble classifier for instance, might lead to lower classification error, thus better utility. Furthermore, this study explores the application of this approach by employing signal processing techniques in the generation of privatized synthetic data sets and improving data utility. This dissertation presents theoretical and empirical work examining various data privacy and utility methodologies using machine learning classification as a gauge. Similarly this study presents a resourceful approach in the generation of privatized synthetic data sets, and an innovative conceptual framework for the data privacy engineering process
Article
Full-text available
During the data privacy process, the utility of datasets diminishes as sensitive information such as personal identifiable information (PII) is removed, transformed, or distorted to achieve confidentiality. The intractability of attaining an equilibrium between data privacy and utility needs is well documented, requiring trade-offs, and further complicated by the fact that making such trade-offs also remains problematic. Given such complexity, in this paper, we endeavor to empirically investigate what parameters could be fine-tuned to achieve an acceptable level of data privacy and utility during the data privacy process, while making reasonable trade-offs. Therefore, we present the comparative classification error gauge (Comparative x-CEG) approach, a data utility quantification concept that employs machine learning classification techniques to gauge data utility based on the classification error. In this approach, privatized datasets are passed through a series of classifiers, each of which returns a classification error, and the classifier with the lowest classification error is chosen; if the classification error is lower or equal to a set threshold then better utility might be achieved, otherwise, adjustment to the data privacy parameters are made to the chosen classifier. The process repeats x times until the desired threshold is reached. The goal is to generate empirical results after a range of parameter adjustments in the data privacy process, from which a threshold level might be chosen to make trade-offs. Our preliminary results show that given a range of empirical results, it might be possible to choose a tradeoff point and publish privacy compliant data with an acceptable level of utility.
Article
Full-text available
Many organizations transact in large amounts of data often containing personal identifiable information (PII) and various confidential data. Such organizations are bound by state, federal, and international laws to ensure that the confidentiality of both individuals and sensitive data is not compromised. However, during the privacy preserving process, the utility of such datasets diminishes even while confidentiality is achieved--a problem that has been defined as NP-Hard. In this paper, we investigate a differential privacy machine learning ensemble classifier approach that seeks to preserve data privacy while maintaining an acceptable level of utility. The first step of the methodology applies a strong data privacy granting technique on a dataset using differential privacy. The resulting perturbed data is then passed through a machine learning ensemble classifier, which aims to reduce the classification error, or, equivalently, to increase utility. Then, the association between increasing the number of weak decision tree learners and data utility, which informs us as to whether the ensemble machine learner would classify more correctly is examined. As results, we found that a combined adjustment of the privacy granting noise parameters and an increase in the number of weak learners in the ensemble machine might lead to a lower classification error.
Conference Paper
Full-text available
Recently Sarathy and Muralidhar (2009) provided the first attempt at illustrating the implementation of differential privacy for numerical data. In this paper, we attempt to provide further insights on the results that are observed when Laplace based noise addition is used to protect numerical data in order to satisfy differential privacy. Our results raise serious concerns regarding the viability of differential privacy and Laplace noise addition as appropriate procedures for protecting numerical data. KeywordsDifferential privacy-Laplace noise addition-Numerical data
Article
The internet is increasingly becoming a standard for both the production and consumption of data while at the same time cyber-crime involving the theft of private data is growing. Therefore in efforts to securely transact in data, privacy and security concerns must be taken into account to ensure that the confidentiality of individuals and entities involved is not compromised, and that the data published is compliant to privacy laws. In this paper, we take a look at noise addition as one of the data privacy providing techniques. Our endeavor in this overview is to give a foundational perspective on noise addition data privacy techniques, provide statistical consideration for noise addition techniques and look at the current state of the art in the field, while outlining future areas of research.
Conference Paper
In data publishing, anonymization techniques such as generaliza- tion and bucketization have been designed to provide privacy pro- tection. In the meanwhile, they reduce the utility of the dat a. It is important to consider the tradeoff between privacy and utility. In a paper that appeared in KDD 2008, Brickell and Shmatikov pro- posed an evaluation methodology by comparing privacy gain with utility gain resulted from anonymizing the data, and concluded that "even modest privacy gains require almost complete destruction of the data-mining utility". This conclusion seems to undermine ex- isting work on data anonymization. In this paper, we analyze the fundamental characteristics of privacy and utility, and sh ow that it is inappropriate to directly compare privacy with utility. We then observe that the privacy-utility tradeoff in data publishi ng is similar to the risk-return tradeoff in financial investment, and pro pose an integrated framework for considering privacy-utility tra deoff, bor- rowing concepts from the Modern Portfolio Theory for financi al investment. Finally, we evaluate our methodology on the Adult dataset from the UCI machine learning repository. Our results clar- ify several common misconceptions about data utility and provide data publishers useful guidelines on choosing the right tra deoff be- tween privacy and utility.
Conference Paper
Data publishing generates much concern over the protection of individual privacy. Recent studies consider cases where the adversary may possess dieren t kinds of knowledge about the data. In this paper, we show that knowledge of the mecha- nism or algorithm of anonymization for data publication can also lead to extra information that assists the adversary and jeopardizes individual privacy. In particular, all known mech- anisms try to minimize information loss and such an attempt provides a loophole for attacks. We call such an attack a min- imality attack. In this paper, we introduce a model called m-conden tiality which deals with minimality attacks, and propose a feasible solution. Our experiments show that min- imality attacks are practical concerns on real datasets and that our algorithm can prevent such attacks with very little overhead and information loss.