ArticlePDF Available

Abstract and Figures

Learning analytics has reserved its position as an important field in the educational sector. However, the large-scale collection, processing, and analyzing of data has steered the wheel beyond the borders to face an abundance of ethical breaches and constraints. Revealing learners’ personal information and attitudes, as well as their activities, are major aspects that lead to identifying individuals personally. Yet, de-identification can keep the process of learning analytics in progress while reducing the risk of inadvertent disclosure of learners’ identities. In this paper, the authors discuss de-identification methods in the context of the learning environment and propose a first prototype conceptual approach that describes the combination of anonymization strategies and learning analytics techniques.
Content may be subject to copyright.
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
456(
De-Identification in Learning Analytics
Mohammad'Khalil'and'Martin'Ebner'
W-K12/,3.24(I*1>.34367(
X52Y(T.,Q*58,/7(30(I*1>.343679(MK8/5,2(
R3>2RR2-'O>24,4Z/K652Y'2/(
ABSTRACT@ L*25.,.6(2.247/,18(>28( 5*8*5Q*-(,/8(?38,/,3.(28(2.(,R?35/2./(0,*4-( ,.(/>*( *-K12/,3.24(
8*1/35'( [3N*Q*59( />*( 4256*+8124*( 1344*1/,3.9( ?531*88,.69( 2.-( 2.247Y,.6( 30( -2/2( >28( 8/**5*-( />*(
N>**4(S*73.-(/>*(S35-*58(/3(021*(2.(2SK.-2.1*( 30( */>,124(S5*21>*8( 2.-( 13.8/52,./8'( \*Q*24,.6(
4*25.*58]( ?*583.24( ,.035R2/,3. ( 2.-( 2//,/K-*89( 28( N*44( 28(/>*,5( 21/,Q,/,*89( 25*( R2C35( 28?*1/8( />2/(
4*2-(/3(,-*./,07,.6(,.-,Q,-K248(?*583.2447'( V*/9( -*+,-*./,0,12/,3.(12.(O**?(/>*(?531*88(30(4*25.,.6(
2.247/,18(,.( ?5365*88( N>,4*( 5*-K1,.6(/>*( 5,8O( 30( ,.2-Q*5/*./(-,81438K5*(30( 4*25.*58]( ,-*./,/,*8'( D.(
/>,8( ?2?*59( />*( 2K/>358( -,81K88(-*+,-*./,0,12/,3.( R*/>3-8( ,.( />*( 13./*B/( 30( />*( 4*25.,.6(
*.Q,53.R*./(2.-(?53?38*(2(0,58/(?53/3/7?*(13.1*?/K24(2??5321>(/>2/(-*815,S*8(/>*(13RS,.2/,3.(
30(2.3.7R,Y2/,3.(8/52/*6,*8(2.-(4*25.,.6(2.247/,18(/*1>.,^K*8'(
(
Keywords:'L*25.,.6(2.247/,189(2.3.7R,Y2/,3.9(-*+,-*./,0,12/,3.9(*/>,189(?5,Q217
(
1 INTRODUCTION
L*25.,.6(2.247/,18(,8(2.(21/,Q*( 25*2( 30(/>*(5*8*251>( 0,*4-(30(3.4,.*( *-K12/,3.(2.-(I*1>.34367(W.>2.1*-(
L*25.,.6(!IWL&'( D/( 2??4,*8(2.2478,8(/*1>.,^K*8( /3( />*(*-K12/,3.( -2/2(8/5*2R( ,.( 35-*5(/3( 21>,*Q*( 8*Q*524(
3SC*1/,Q*8'(I>*8*(3SC*1/,Q*8(R2,.47(2,R(/3(,./*5Q*.*(2.-(?5*-,1/(4*25.*58](?*5035R2.1*(,.(?K58K2.1*(30(
*.>2.1,.6( />*( 4*25.,.6( 13./*B/( 2.-( ,/8( *.Q,53.R*./'( [,6>*5( W-K12/,3.( ![W&( 2.-( 3.4,.*( 13K58*(
,.8/,/K/,3.8(25*(433O,.6( 2/( 4*25.,.6(2.247/,18(N,/>(2.(,./*5*8/(,.(,R?53Q,.6(5*/*./,3.(2.-(-*15*28,.6(/>*(
/3/24( -53?3K/( 52/*( !E42-*( _( X24?,.9( "#$"&'( [3N*Q*59( */>,124(,88K*8( *R*56*( N>,4*( 2??47,.6( 4*25.,.6(
2.247/,18(,.(*-K12/,3.24( -2/2(8*/8( !X5*44*5( _( )521>84*59( "#$"&'( M/( />*( 0,58/( D./*5.2/,3.24(P3.0*5*.1*(3.(
L*25.,.6(M.247/,18(2.-(`.3N4*-6*(!LM`(a$$&9(>*4-(,.(U2.009(M4S*5/29(P2.2-2(,.("#$$9(?25/,1,?2./8(265**-(
/>2/( 4*25.,.6( 2.247/,18(52,8*8( ,88K*8( 5*4*Q2./( /3( */>,18( 2.-( ?5,Q217( 2.-( b,/( 13K4-( S*( 13.8/5K*-( 28(
*2Q*8-53??,.6c(!U53N.9("#$$&'(I>*(R288,Q*(-2/2(1344*1/,3.(2.-(2.2478,8(30(/>*8*(*-K12/,3.24(-2/2(8*/8(
12.( 4*2-( /3( ^K*8/,3.8( 5*42/*-( /3( 3N.*58>,?9( /52.8?25*.179( 2.-( ?5,Q217( 30( -2/2'( I>*8*( ,88K*8( 25*( .3/(
K.,^K*(/3(/>*(*-K12/,3.(8*1/35(3.479(SK/(12.( S*(03K.-(,.(/>*(>KR2.( 5*83K51*(R2.26*R*./(2.-(>*24/>(
8*1/358(!P33?*59( "##:&'( M/( ,/8( O*7(4*Q*49( 4*25.,.6( 2.247/,18(,.Q34Q*8( /521O,.6( 8/K-*./8](8/*?8( ,.(4*25.,.6(
*.Q,53.R*./89( 8K1>( 28( Q,-*38( 30( deeP8( !f21>/4*59( `>24,49( I2526>,( _( WS.*59( "#$%&9( ,.( />*( ,./*5*8/( 30(
,-*./,07,.6( N>3( 25*( />*( 8/K-*./8( b2/( 5,8O9c( 35( /3( >*4?( 8/K-*./8(N,/>( -*1,8,3.8(2S3K/( />*,5( 0K/K5*8'(
F*Q*5/>*4*889( /521O,.6( ,./*521/,3.8( 30( 8/K-*./8( 13K4-( K.Q*,4( 15,/,124( ,88K*8( 5*625-,.6(/>*,5( ?5,Q217(2.-(
/>*,5(,-*./,/,*8(!U37-9("##=&'(
(
W/>,124( ,88K*8( 035( 4*25.,.6( 2.247/,18(0244( ,./3( -,00*5*./( 12/*635,*8'( f*( R2,.47( 8KRR25,Y*( />*R( 28( />*(
03443N,.6( !`>24,4( _( WS.*59( "#$HS&@( $&( /52.8?25*.17( 30( -2/2( 1344*1/,3.9( K826* 9(2 .-( ,.Q34Q*R*./( 30( />,5-(
?25/,*8g("&(2.3.7R,Y2/,3.(2.-(-*+,-*./,0,12/,3.(30(,.-,Q,-K248g(<&(3N.*58>,?(30(-2/2g(h&(-2/2(211*88,S,4,/7(
2.-(211K5217(30(/>*(2.247Y*-(5*8K4/8g(H&(8*1K5,/7(30(/>*(*B2R,.*-(-2/2(8*/8(2.-(8/K-*./(5*135-8(053R(2.7(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
437(
/>5*2/'( I>*8*( 15,/*5,2(?3,./( /3(/>*( N,-*47(S28*-(8*1K5,/7( R3-*4( PDM9( N>,1>( 8/2.-8( 035( P3.0,-*./,24,/79(
D./*65,/7(053R(24/*52/,3.9(2.-(MQ2,42S,4,/7(035(2K/>35,Y*-(?25/,*8'(
I>*( 4*25.,.6( 2.247/,18(13RRK.,/7(.**-8( /3( -*24( 125*0K447( N,/>( />*( ?3/*./,24( ?5,Q217( ,88K*8( N>,4*(
2.247Y,.6(8/K-*./(-2/2'(W-K12/,3.24(-2/2(2.2478,8(/*1>.,^K*8(12.(5*Q*24(?*583.24(,.035R2/,3.9(2//,/K-*89(
2.-(21/,Q,/,*8(5*42/*-(/3(4*25.*58( !U,*.O3N8O,9(i*.69( _( d*2.89("#$"&'( [3N*Q*59(/>*5*( >28(S**.(4,R,/*-(
5*8*251>9(2.-(/>*5*( 25*( 8/,44(.KR*53K8(K.2.8N*5*-( ^K*8/,3.8( 5*42/*-(/3(?5,Q2179(?*583.24(,.035R2/,3.9(
2.-(3/>*5(*/>,124(,88K*8(,.(/>*(13./*B/(30(4*25.,.6(2.247/,18(!U,*.O3N8O,9(i*.69(_(d*2.89("#$"g(X5*44*5(_(
)521>84*59("#$"g(E42-*(_(X24?,.9("#$"g(E42-*(_(j5,.84339("#$<&'(i35(*B2R?4*9(83R*(*-K12/358(142,R(/>2/(
*-K12/,3.24( ,.8/,/K/,3.8( 25*( K8,.6( 2??4,12/,3.8( />2/( 1344*1/( 8*.8,/,Q*( -2/2( 2S3K/( 8/K-*./8( N,/>3K/(
8K00,1,*./47(5*8?*1/,.6( -2/2( ?5,Q217( 2.-( >3N( />*(-2/2(N,44(*Q*./K2447( S*(K8*-(!E,.6*59( "#$h&'( I>K89( -2/2(
-*652-2/,3.(!M.1,2KB( */( 24'9("##=&9( -*+,-*./,0,12/,3.(R*/>3-89(35(-*4*/,3.(30(8?*1,0,1(-2/2( 5*135-89(R27(
S*( 5*^K,5*-( 28( 2( 834K/,3.( /3( ?5*8*5Q*( 4*25.*58]( ,.035R2/,3.'( D.( />,8( ?2?*59( N*( N,44( R2,.47( 031K8( 3K5(
-,81K88,3.( 3.( />*( -*+,-*./,0,12/,3.( ?531*88( ,.( />*( 4*25.,.6( 2.247/,18(2/R38?>*5*( 2.-( 20035-( 2( 0,58/(
?53/3/7?*(13.1*?/K24( 2??5321>( />2/( 13RS,.*8( 4*25.,.6(*.Q,53.R*./9(-*+,-*./,0,12/,3.( /*1>.,^K*89( 2.-(
4*25.,.6(2.247/,18'(
(
I>*(?2?*5(,8(3562.,Y*-(28(03443N8@(E*1/,3.("(13Q*58(/>*(-*+,-*./,0,12/,3.(,.(6*.*524(2.-(/>*(1K55*./(42N8(
28831,2/*-(N,/>(*-K12/,3.9(28(N*44(28(/>*(-5,Q*58(4,.O*-(N,/>(4*25.,.6(2.247/,18'(D.(E*1/,3.(<9(N*(?53?38*(
/>*( -*+,-*./,0,12/,3.;4*25.,.6 ( 2.247/,18(2??5321>'( I>*( 428/( 8*1/,3.( -,81K88*8(/>*(4,R,/2/,3 .8( 30( />*( -*+
,-*./,0,12/,3.(?531*88(,.(4*25.,.6(2.247/,18'(
(
2 BACKGROUND
2.1 Personal Information and De-Identification
j*583.24( ,.035R2/,3.( ,8( 2.7( ,.035R2/,3.( />2/( 12.( ,-*./,07( 2.( ,.-,Q,-K24'( D.( 0,*4-8( 8K1>( 28( />*( >*24/>(
8*1/359( ,/( ,8( .2R*-( j*583.24( [*24/>( D.035R2/,3.(35(j[D'( f>,4*( ,.( 3/>*5( 0,*4-89( 8K1>( 28( />*( *-K12/,3.(
8*1/359( />,8( ,.035R2/,3.( ,8( .2R*-( j*583.24( D-*./,0,2S4*( D.035R2/,3.(35( jDD'( I>*( F2/,3.24( D.8/,/K/*( 30(
E/2.-25-8(2.-(I*1>.34367(!FDEI&(-*0,.*8(jDD(28(b2.7(,.035R2/,3.(2S3K/(2.( ,.-,Q,-K24(R2,./2,.*-( S7(2.(
26*.179( ,.14K-,.6( $&( 2.7( ,.035R2/,3.(/>2/( 12.( S*( K8*-( /3( -,8/,.6K,8>( 35( /521*( 2.( ,.-,Q,-K24]8( ,-*./,/79(
8K1>( 28( .2R*9( 831,24( 8*1K5,/7( .KRS*59( -2/*( 2.-( ?421*( 30( S,5/>9( R3/>*5]8( R2,-*.( .2R*9( 35( S,3R*/5,1(
5*135-8g( 2.-( "&( 2.7( 3/>*5( ,.035R2/,3.( />2/( ,8( 4,.O*-( 35( 4,.O2S4*( /3( 2.( ,.-,Q,-K249( 8K1>( 28( R*-,1249(
*-K12/,3.249( 0,.2.1,249( 2.-( *R?437R*./( ,.035R2/,3.c( !d1P244,8/*59( X52.1*9( _( E12503.*9( "#$#&'( I>*(
?*583.24(,.035R2/,3.(30(4*25.*58(12.(S*(12/*635,Y*-(,./3(-*/2,48(8K1>(28(.2R*9(8*B9(?>3/3652?>9(-2/*(30(
S,5/>9( 26*9( 2--5*889( 5*4,6,3.9( R25,/24( 8/2/K89( *+R2,4( 2--5*889( ,.8K52.1*( .KRS*59(*/>.,1,/79( */( 1*/*529( 35(
*-K12/,3.24(-*/2,48(8K1>(28(^K24,0,12/,3.89(13K58*8(2//*.-*-9(-*65**89(2.-(8/K-7(5*135-8'(M8(2(15,/*5,3.9(2(
4*2O( 30( ,.-,Q,-K248]( ?*583.24( ,.035R2/,3.( 12.( ,.-K1*( R,8K8*( 30( -2/29( *RS255288R*./9( 2.-( 4388( 30(
5*?K/2/,3.'( [3N*Q*59( 3562.,Y2/,3.8( R27( S*( 5*^K,5*-( /3( ?KS4,8>( -*/2,48(*B/521/*-( 053R( ?*583.24(
,.035R2/,3.'(i35(,.8/2.1*9(83R*(*-K12/,3.24(,.8/,/K/,3.8(25*(5*^K,5*-(/3(?53Q,-*(8/2/,8/,18(2S3K/(8/K-*./(
?5365*88g( 4,O*N,8*9( >*24/>( 3562.,Y2/,3.8( R27( .**-( /3( 5*?35/( 8?*1,24( 128*8( 053R(/>*,5( ?2/,*./( 5*135-89(
8K1>( 28( 13RRK.,12S4*( -,8*28*8'( M8( 2( 5*8K4/9( -*+,-*./,0,12/,3.( >*4?8( 3562.,Y2/,3.8( /3( ?53/*1/( ?5,Q217(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
434(
N>,4*( 8/,44( ,.035R,.6(/>*( ?KS4,1'( I>*( -*+,-*./,0,12/,3.( ?531*88( ,8( K8*-( /3( ?5*Q*./( 5*Q*24,.6( ,.-,Q,-K24(
,-*./,/7(2.-(O**?,.6(/>*(jDD(13.0,-*./,24'(
D.(4*25.,.6(2.247/,189( ,/( ,8( 13RR3.(035(8/2O*>34-*58(/3(5*^K*8/( 2--,/,3.24(,.035R2/,3.(2S3K/(/>*( 5*8K4/8(
*B/521/*-( 053R( *-K12/,3.24( -2/2(8*/8'( W-K12/,3.24( -2/2( R,.,.6( 2.-( 4*25.,.6( 2.247/,18( R2,.47( 2,R( /3(
*.>2.1*(/>*( 4*25.,.6(*.Q,53.R*./( 2.-( *R?3N*5( 4*25.*58(2.-( ,.8/5K1/358(!X5*44*5( _( )521>84*59("#$"&'(
I>*5*035*9(/>*(2.2478,8(30(/>*8*(-2/2(R27(>2Q*(,./*5*8/,.6(/5*.-8(/>2/(13K4-(4*2-(/3(0K5/>*5(2.-(-**?*5(
2.2478,8(S7( 3/>*5( ,.8/,/K/,3.8(35(5*8*251>*58'(\*^K*8/8(035(R35*(*B/*.8,Q*(2.2478,8(R27(,.Q34Q*(/>*( K8*(
30(8/K-*./+4*Q*4( -2/2'(M1135-,.6479(*/>,124( ,88K*8( 25,8*9( 8K1>(28( ?5,Q217( -,81438K5*9(2.-( />*(.**-( /3(-*+
,-*./,07(/>*(-2/2(S*13R*8(?252R3K./'(
(
\*1*./479( [25Q25-( 2.-( dDI( K.,Q*58,/,*8( 5*4*28*-( -*+,-*./,0,*-(-2/2( 053R( $%( 13K58*8( 300*5*-( ,.( "#$";
"#$<(053R(/>*,5(N*44+O.3N.(*-k(d288,Q*(e?*.( e.4,.*( P3K58*(!deeP&(!dDI(F*N89( "#$h&'(I>*([25Q25-(
2.-( dDI( *-k( *.8K5*8( />2/( />*( 2.3.7R,/7( 30( />*( 5*4*28*-( -2/2( 13R?4,*8(N,/>( />*( i2R,47( W-K12/,3.24(
\,6>/8(2.-(j5,Q217(M1/(!iW\jM&'$(iK5/>*5R35*9(j5,.8433(2.-(E42-*(!"#$H&(8K66*8/*-(-,00*5*./(2??5321>*8(
/>2/(,.035R(8/K-*./8(,.(>,6>*5(*-K12/,3.(30(/>*(,R?4,12/,3.8(30(4*25.,.6(2.247/,18(3.(/>*,5(?5,Q2/*(-2/2'(
(
2.2 De-Identification Legislation
)*+,-*./,0,12/,3.( 30( 8/K-*./( 5*135-8(>28( S**.( 5*6K42/*-(,.(/>*( T.,/*-(E/2/*8(2.-( />*(WK53?*2.(T.,3.'(
I>*(T.,/*-(E/2/*8(2-3?/*-(iW\jM(5*625-,.6(/>*(?5,Q217(30(8/K-*./(*-K12/,3.24(5*135-8'(D.(/>*(WK53?*2.(
T.,3.9(/>*()2/2(j53/*1/,3.(),5*1/,Q*(!)j)g(:HAh%AWP"&(5*6K42/*8(/>*(?531*88,.6(30(?*583.24(-2/2(2.-(/>*(
R3Q*R*./(30(8K1>(,.035R2/,3.'(iW\jM(l::'<$!S&(-*248(N,/>(/>*(-*+,-*./,0,12/,3.(30(-2/2( 5K4*'( D/( 14*2547(
8/2/*8(/>2/(,.8/,/K/,3.8(bR27(5*4*28*9(N,/>3K/(13.8*./9(*-K12/,3.(5*135-89(35(,.035R2/,3.(053R(*-K12/,3.(
5*135-89(/>2/(>28(S**.(-*+,-*./,0,*-(/>53K6>(/>*(5*R3Q24(30(244(j*583.2447(D-*./,0,2S4*(D.035R2/,3.(!jDD&'c(
I>,8(8*1/,3.(30(iW\jM(5*^K,5*8(,.8/,/K/,3.8(/3(K8*(5*283.2S4*(R*/>3-8( /3(,-*./,07(/>*(3/>*5(?25/,*8(N>3(
-,81438*( *-K12/,3.( 5*135-8'( e.( />*( 3/>*5(>2.-9( />*( R38/( *B?4,1,/( 1,/2/,3.( 30( -*+,-*./,0,12/,3.( ,.( />*(
WK53?*2.()j)(,8(M5/,14*("%(3.(2.3.7R,Y2/,3.9(,.(N>,1>(b?5,.1,?4*8(30(-2/2(?53/*1/,3.(8>244(.3/(2??47(/3(
-2/2( 5*.-*5*-( 2.3.7R3K8( ,.( 8K1>( 2( N27( />2/( />*( -2/2( 8KSC*1/( ,8( .3( 43.6*5( ,-*./,0,2S4*'c(d35*3Q*59(
?25/,*8( 25*( *.13K526*-( /3( K8*( -*+,-*./,0,12/,3.( /*1>.,^K*8( /3( 5*.-*5( ,-*./,0,12/,3.( 30( -2/2( 8KSC*1/8(
,R?388,S4*'( D/( ,8( .3/( 3SQ,3K89( >3N*Q*59(N>2/( 4*Q*4( 30( -*+,-*./,0,12/,3.( ,8( 5*^K,5*-( /3( 2.3.7R,Y*(
*-K12/,3.(5*135-8(K.-*5(WK53?*2.(42N'([3N*Q*59(/>*(M5/,14*(":()2/2(j53/*1/,3.(f35O,.6(j25/7(>28(2.(
3?,.,3.(3.(/>*(,-*./,0,12/,3.(30(-2/2@(be.1*(2(-2/2(8*/(,8(/5K47(2.3.7R,Y*-(2.-(,.-,Q,-K248(25*(.3(43.6*5(
,-*./,0,2S4*9(WK53?*2.(-2/2(?53/*1/,3.(42N(.3(43.6*5(2??4,*8c(!"#$h9(?'(H&'(
(
2.3 Drivers of De-Identification in Learning Analytics
M(8/K-7( S7(j*/*583.( !"#$"&9( 2--5*88*-(/>*( .**-(/3( -*+,-*./,07( -2/2( K8*-( ,. ( 212-*R,1( 2.2478,8( S*035*(
R2O,.6(,/(2Q2,42S4*(/3( ,.8/,/K/,3.89(/3( SK8,.*88*89(35(035( 3?*52/,3.24( 0K.1/,3.8'( j*/*583.( !"#$"&(?3,./*-(
1 >//?@AANNN"'*-'63QA?34,17A6*.A6K,-A0?13A0*5?2A,.-*B'>/R4(!428/(211*88(J2.K257("#$H&(
"(>//?@AA*K5+4*B'*K53?2'*KAL*BT5,E*5QAL*BT5,E*5Q'-3mK5,n(PWLWk@<$::HL##h%@WF@[IdL!428/(211*88(J2.K257("#$H&(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
435(
/3( />*( ,-*2( 30( O**?,.6( 2( K.,^K*( ,-*./,0,*5( ,.( 1 28*( 2( 5*8*251>*5( R27( .**-( /3( 8/K-7( />*( S*>2Q,3K5( 30( 2(
?25/,1K425(,.-,Q,-K24'(E42-*( 2.-( j5,.8433( !"#$<&9( >3N*Q*59(-5*N( 2//*./,3.( /3( />*( 2RS,6K,/7( 30( -2/2(
R,.,.6( /*1>.,^K*8( ,.( R3.,/35,.6( 8/K-*./( S*>2Q,3K5( ,.( *-K12/,3.24( 8*//,.68'( I>*( 2K/>358( 4,.O*-( -*+
,-*./,0,12/,3.(N,/>( 13.8*./(2.-(?5,Q217( 2.-(8/5*88*-(/>*(.**-(/3( 6K252./**(8/K-*./(2.3.7R,/7( ,.( />*,5(
*-K12/,3.( 5*135-8(,.( 35-*5( /3( 21>,*Q*( 4*25.,.6( 2.247/,18(3SC*1/,Q*8( 8K1>( 28( ,./*5Q*./,3.8( S28*-( 3.(
8/K-*./( 1>2521/*5,8/,18'( M.( *B2R?4*( 30( />*( 4,.O(S*/N**.( 13.8*./( 2.-( -*+,-*./,0,12/,3.(N3K4-( S*(2(
^K*8/,3..2,5*( 35( 8K5Q*7( />2/(/>38*( 0,44,.6( ,/( 3K/(25*( /34-( N,44( S*( K8*-( 035( 5*8*251>( 3.47'( D.( />2/( 128*9(
14*2547( />*(4,R,/2/,3.( 30( K8,.6( />*,5( -2/2( N,44( S*( CK8/( />*( 3.*( 8/K-7'( D0( />*( 8K5Q*7( ,.14K-*8( ?*583.24(
,.035R2/,3.9(>3N*Q*59(/>*.(288K52.1*8(30(2.3.7R,Y,.6(/>*,5(-2/2(8>3K4-(S*(13.8,-*5*-'(
(
\72.( U2O*5( !"#$<&( -,81K88*-( />*( -*R2.-8( 30( -*+,-*./,07,.6( *-K12/,3.24( -2/2(8*/8( ,.( >,8( bL*25.,.69(
E1>334,.69(2.-()2/2(M.247/,18c(1>2?/*5(,.(/>*(8&%9:"";("%(<%%"=&0,"%2(,%(*+&$%,%-()"$(>0&0+2?(@,20$,102?(
&%9( >1A""'2'( )*+,-*./,0,12/,3.( 30( />*8*( -2/2(8*/8( R*2.8( S*,.6( 2S4*( /3(8>25*( />*R( 2R3.6( 3/>*5(
5*8*251>*58(N,/>3K/(Q,342/,.6(iW\jM(5*6K42/,3.8'(U2O*5(8/5*88*-(/>2/(*-K12/,3.24(?34,1,*8(8>3K4-(,.14K-*(
5K4*8( 035( 2.3.7R,Y,.6( -2/2( ,.( 35-*5( /3( ?5*Q*./( ,-*./,0,2S4*( ,.035R2/,3.( 053R( S*,.6( 4*2O*-( N,/>3K/(
2K/>35,Y2/,3.'(iK5/>*5R35*9()521>84*5(2.-(X5*44*5(13Q*5*-(/>*(/3?,1(30(2.3.7R,Y2/,3.(,.(/>*,5()WLDPMIW(
2??5321>(!)521>84*5(_( X5*44*59("#$%&'( M( b8/5,1/47( 6K25-*-( O*7c(8>3K4-( S*( >*4-(83( />2/( 5*8*251>*58( R27(
4,.O(/>*,5(5*8K4/8(053R(4*25.,.6(2.247/,18(2.-(*-K12/,3.24(-2/2(R,.,.6(N,/>(,.-,Q,-K24(8/K-*./8(,.(35-*5(/3(
S*.*0,/(/>*(8/K-*./8'( )*+,-*./,0,12/,3.( /*1>.,^K*8( >2Q*( S**.( 5*Q,*N*-(28( 2(5,6>/( 30( 211*88( ?5,.1,?4*(,.(
4*25.,.6(2.247/,18(-*?437R*./(!j25-3(_(E,*R*.89("#$h&'(D.(2--,/,3.9(j25-3(2.-(E,*R*.8(0K5/>*5(8K66*8/(
/>2/(8*R2./,1(2.2478,8(R,6>/(S*(5*^K,5*-(/3(-*/*1/(,-*./,0,2S4*(5*135-8(,.(2.3.7R,Y*-(-2/2(8*/8'(
(
3 PROPOSED APPROACH
'
D.( />,8( 8*1/,3.9( N*( ?53?38*( 2( 13.1*?/K24( -*+,-*./,0,12/,3.;4*25.,.6( 2.247/,18(052R*N35O( 28( 8>3N.( ,.(
i,6K5*( $'( I>*( 052R*N35O( S*6,.8(N,/>( 4*25.*58( ,.Q34Q*-( ,.( 4*25.,.6( *.Q,53.R*./8'( PK55*./479( 2( 4256*(
.KRS*5(30( 4*25.,.6( *.Q,53.R*./8( 8K??35/( 3.4,.*( 4*25.,.69( 8K1>( 28( deePE9( L*25.,.6( d2.26*R*./(
E78/*R8( !LdE&9( DRR*58,Q*( L*25.,.6( E,RK42/,3.8( !DLE&9( R3S,4*( 4*25.,.69( 2.-( j*583.24,Y*-( L*25.,.6(
W.Q,53.R*./8( !jLW&'( I>*8*( ?42/035R8( 300*5( *.Q,53.R*./8( N,/>( 5,1>9( Q28/( 2R3K./8( 30( -2/2( />2/( 12.( S*(
^K2./,/2/,Q*47A^K24,/2/,Q*47(2.247Y*-(/3(S*.*0,/(4*25.*58(2.-(*.>2.1*(/>*(4*25.,.6(13./*B/'(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
433(
'
Figure'1:'The'proposed'conceptual'de-identificationlearning'analytics'framework'
I>*( .*B/( 8/*?( ,8( />*( -*+,-*./,0,12/,3.( ?531*88( N>*5*( /*1>.,^K*8( /3( 13.Q*5/( ?*583.24( 2.-( ?5,Q2/*(
,.035R2/,3.( ,./3( 2.3.7R,Y*-( -2/2(/2O*( ?421*'( )*+,-*./,0,12/,3.( /*1>.,^K*8( ,.14K-*( 8K1>(R*/>3-8(28(
2.3.7R,Y2/,3.9(R28O,.69(S4K55,.69(2.-(?*5/K5S2/,3.'(I>*(428/(8/*?(,.14K-*8(/>*(-*+,-*./,0,*-(-2/2(4,.O*-(
N,/>( 2( K.,^K*( -*815,?/35( />2/( R27(S*(*B2R,.*-( S7( 4*25.,.6( 2.247/,18(5*8*251>*58(2.-( S*.*0,/(
8/2O*>34-*589(SK/(K4/,R2/*47(RK8/(S*(K8*-(3.47(/3(/>*(2-Q2./26*(30(8/K-*./8'(
'
3.1 De-Identification Techniques
D.(3K5(?53?38*-(-*+,-*./,0,12/,3.;4*25.,.6(2.247/,18(13.1*?/K24(052R*N35O9(/>*5*(25*(8*Q*524(/*1>.,^K*8(
2Q2,42S4*( /3( -*+,-*./,07(8/K-*./( -2/2( 5*135-8'( i,6K5*( <(4,8/8( 8*Q*524(R*/>3-8( 30( -*+,-*./,0,12/,3.(2.-(
?53Q,-*8( *B2R?4*8( !S28*-( 3.( M5/,14*( ":( )2/2( j53/*1/,3.( f35O,.6(j25/79( "#$hg( P35R3-*( _( E5,Q28/2Q29(
"##:g(WK538/2/9($::%g(j*/*58*.9("#$"&'(
(
.%"%/B,C&0,"%(
)2/2(2.3.7R,Y2/,3.(/*1>.,^K*8( >2Q*( 5*1*./47( S**.( O**.47( 5*8*251>*-(,.( -,00*5*./( 8/5K1/K5*-( -2/2(
5*135-8(N,/>(/>*(6324(30(6K252./**,.6(/>*(?5,Q217(30(8*.8,/,Q*(,.035R2/,3.(262,.8/(K.,./*.-*-(-,81438K5*(
2.-( 2( Q25,*/7( 30( 2//21O8(!P35R3-*( _( E5,Q28/2Q29( "##:&'( e>R( !"#$#&( -*0,.*-( 5*283.8( S*>,.-(
2.3.7R,Y2/,3.(N>*.(3562.,Y2/,3.8(N2./( /3( 5*4*28*(/>*(-2/2(/3(/>*(?KS4,19(8*44(/>*( ,.035R2/,3.( /3( />,5-(
?25/,*89(35(8>25*(/>*(,.035R2/,3.(N,/>,.(/>*(82R*( 3562.,Y2/,3.'(I>*(-,00*5*.1*(S*/N**.(2.3.7R,Y2/,3.(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43D(
2.-(-*+,-*./,0,12/,3.9(>3N*Q*59(,8(^K,/*(R,8K.-*58/33-'(M.3.7R,Y2/,3.(?5,.1,?4*8(25*(2(8KS8*/(30(>34,8/,1(
-*+,-*./,0,12/,3.( R*/>3-3436,*8'( )2/2( 2.3.7R,Y2/,3.( ,8( />*( ?531*88( 30( -*+,-*./,07,.6( -2/2( N>,4*(
?5*8*5Q,.6(,/8(35,6,.24(035R2/(!\26>K.2/>2.9("#$<&'(D.(/>*(*-K12/,3.24(13./*B/9(2.3.7R,Y2/,3.(5*0*58(/3(
-,00*5*./( ?531*-K5*8( /3(-*+,-*./,07( 8/K-*./( -2/2( ,.( 8K1>( 2( N27( />2/(,/( 12..3/( S*( 5*+,-*./,0,*-( !/>*(
3??38,/*( 30( -*+,-*./,0,12/,3.&( K.4*88( />*5*( ,8( 2( 5*135-( 13-*'( M.3.7R,Y2/,3.( ,8( .3/( 5*8*5Q*-( 3.47( 035(
/2SK425(-2/2(5*135-89(SK/(12.(2483(S*(2??4,*-(/3(3/>*5(/7?*8(30(-2/2(o(8K1>(28(Q,8K24,Y*-(-2/2(35(652?>8(
o(N>*5*(,.8/,/K/,3.8(,./*.-(/3(?5*8*./(/>*,5(3K/13R*8(N,/>3K/(5*Q*24,.6(8*.8,/,Q*(,.035R2/,3.'(
(
e.( />*( 3/>*5( >2.-9( ,.( 2--,/,3.( /3( 2.3.7R,Y2/,3.9( -*+,-*./,0,12/,3.( ,.14K-*8(R28O,.69( 52.-3R,Y2/,3.9(
S4K55,.69( 2.-( 83( 3.'( i35( ,.8/2.1*9( 5*?421,.6( bU*5.25-c( N,/>( bpppppppc( ,8( 2( R*/>3-( 30( R28O,.6(N>,4*(
24/*5,.6( bU*5.25-c( /3(bf34062.6c( N3K4-( S*( 2.( *B2R?4*( 30( 2.3.7R,Y2/,3.'( [3N*Q*59( R28O,.6( 2.-(
S4K55,.6(25*( .3/( 28( N*44(O.3N.( 28( 2.3.7R,Y2/,3.'(U7( 2.7( R*2.89( -*+,-*./,0,12/,3.9( ?8* K-3.7R,Y2/,3.9(
2.-(2.3.7R,Y2/,3.(25*(,./*51>2.6*2S4*(/3?,18(K.-*5(/>*(,.035R2/,3.(13.1*24,.6(KRS5*442'(I3(1425,07(/>*(
-,00*5*.1*8(,.(8,R?4*(/*5R89( ?8*K-3.7R,Y2/,3.( R*2.8(1432O,.6(/>*(35,6,.24( -2/2(N,/>(0248*(,.035R2/,3.(
N,/>(/>*(2S,4,/7(/3(/521O(,/(S21O(/3(,/8(35,6,.24(035R2/,3.g(2.3.7R,Y2/,3.9(13.Q*58*479(12..3/(S*(5*Q*58*-((
!\26>K.2/>2.9("#$<&'(
(
M8(?5*Q,3K847(R*./,3.*-9(*-K12/,3.24(-2/2( 5*135-8( R27( ,.14K-*( ?5,Q2/*(,.035R2/,3.9( 8K1>( 28( .2R*( 35(
8/K-*./(D)9( N>,1>(8,.6K42547( 25*(1244*-( -,5*1/(,-*./,0,*58'(\*R3Q,.6( 35(>,-,.6( />*8*( ,-*./,0,*58(-3*8(.3/(
288K5*(2(/5K*(-2/2( 2.3.7R,Y2/,3.'( D-*./,0,*58( 13K4-( S*( 4,.O*-(N,/>( 3/>*5(,.035R2/,3.(/>2/(N3K4-(2443N(
,-*./,0,12/,3.( 30( ,.-,Q,-K248(!8**( i,6K5*("&'([3N*Q*59(^K28,+,-*./,0,*58( 12.( S*(K8*-( /3( *.8K5*(S*//*5( -*+
,-*./,0,12/,3.( 30( -2/2'( b)2 /*( 30( U,5/>( q( E*B( q( F2R*c( ,8( 2.( *B2R?4*( 30( 2( ^K28,+,-*. /,0,*5'( D.( "##%9( MeL(
5*4*28*-(/>*(8*251>(5*135-8(30(H##9###(30(,/8(K8*58'(E*Q*524(-278(20/*5(MeL]8(-2/2S28*(5*4*28*9(E+F(G"$;(
H,B+2(C3K5.24,8/8(N*5*(2S4*(/3(5*Q*24(/>*(,-*./,/7(30(2(%"+7*25+34-(N,-3N(K8,.6(2(8,R,425(?531*88(/3(/>2/(
8>3N.(,.(i,6K5*("(!E36>3,2.9("##G&'(MeL(2-R,//*-(/>2/(/>*(-2/2(5*4*28*(N28(2(R,8/2O*(2.-(/>*(5*8*251>(
/*2R(5*8?3.8,S4*(035(8>25,.6(/>*(-2/2(N28(0,5*-'(
(
(
Figure'2:'Linking'data'sources'leads'to'name'identification'
(
M.3/>*5(*B2R?4*(30(,-*./,07,.6( ,.-,Q,-K248(N28(5*?35/*-(,.("###(N>*.(-*R3652?>,1(,.035R2/,3.(4*-(/3(
5*/5,*Q,.6(/>*(.2R*8(2.-(13./21/(,.035R2/,3.(30(?2/,*./8(N>38*(R*-,124(-2/2(>2-(S**.(5*4*28*-(,.(/>*(
T.,/*-(E/2/*8(!EN**.*79("###&'((
(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43I(
E2R252/,( 2.-( EN**.*7(!$::=&( ?53Q,-*-( 2(N*44+O.3N.( 2.3.7R,Y2/,3.( /*1>.,^K*9( .2R*47( ;+
2.3.7R,Y2/,3.'( I>,8( R*/>3-( 2--5*88*8( />*( ?53S4*R( 30( 4,.O,.6( 5*135-8( /3( ,-*./,07( />*( ,.-,Q,-K24]8(
,.035R2/,3.(N>*.(5*4*28,.6( -2/29( />K8(820*6K25-,.6(2.3.7R,/7'(I>*( ;+2.3.7R,/7( /*1>.,^K*(031K8*8(3.(
2Q3,-,.6(2(-2/2(5*135-(053R(S*,.6(,-*./,0,*-(N,/>(;(,.-,Q,-K248(!P35R3-*(_(E5,Q28/2Q29("##:&'(
(
(
Figure'3:'Examples'of'de-identification'techniques'
J&2;,%-(
d28O,.6( ,8( 2( -*+,-*./,0,12/,3.( /*1>.,^K*( />2/( 5*?421*8( 8*.8,/,Q*( -2/2( N,/>( 0,1/,3.24(-2/2( ,.( 35-*5(/3(
-,81438*(5*8K4/8( 3K/8,-*(/>*( ,.8/,/K/,3.'()2/2( R28O,.6(12.( R3-,07(/>*( -2/2( 5*135-8(83( />2/( />*7(5*R2,.(
K82S4*( N>,4*(O**?,.6( ?*583.24( ,.035R2/,3.( 13.0,-*./,24'( i35( ,.8/2.1*9( 1>2521/*5( R28O,.6( 5*?421*8( 2(
8/5,.6(N,/>(8?*1,24(1>2521/*58'(
(
K'#$$,%-(
U4K55,.6( ,.Q34Q*8( 5*-K1,.6( ?5*1,8,3.( /3( R,.,R,Y*( />*( ,-*./,0,12/,3.( 30( -2 /2'( I>*5*( 25*( 8*Q*524( N278( /3(
21>,*Q*( S4K55,.69( 8K1>( 28( -,Q,-,.6( />*( -2/2( ,./3( 8KS12/*635,*89(52.-3R,Y,.6(/>*(-2/2( 0,*4-89( 35( 2--,.6(
.3,8*(/3(-2/2(5*135-8'(
(
3.2 Coding Data Records
(
D.( 81,*./,0,1( 5*8*251>9( -2/2 (K8K2447( 5*^K,5*8(0K5/>*5( ,.Q*8/,62/,3.( N,/>(5*8*251>*58 ( 433O,.6(-**?*5( ,./3(
/>*(-*/2,48'([2Q,.6(-*+,-*./,0,*-(-2/2(R,6>/(S*(,.8K00,1,*./(035(/>*8*(?K5?38*8g(5*8*251>*58(R27(5*^K,5*(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43L(
2--,/,3.24(,.035R2/,3.( ,.(35-*5(/3( -3( R35*(2.2478,8'(I>*( MR*5,12.(0*-*524([*24/>( D.8K52.1*(j35/2S,4,/7(
2.-(M113K./2S,4,/7(M1/(![DjMM&9(N>,1>(,8(5*8?3.8,S4*(035(?53/*1/,.6(/>*(13.0,-*./,24,/7(30(?2/,*./(5*135-89(
2K/>35,Y*8( K8,.6(2.( b288,6.*-( 13-*c( />2/( 12.( S*( 2??*.-*-( /3( />*( 5*135-8(,.( 35-*5( /3( ?*5R,/( />*(
,.035R2/,3.(/3(S*(5*+,-*./,0,*-( 035( 5*8*251>(?K5?38*8'3(U28*-(3.(/>2/([DjMM(5K4*9(N*(03K.-(/>2/(iW\jM(
::'<$!S&(2443N8(035(K8,.6(2(K.,^K*(-*815,?/35(035(8/K-*./(-2/2(5*135-8(,.(35-*5(/3( R2/1>(2.(,.-,Q,-K24]8(
,.035R2/,3.(035(5*8*251>(2.-(,.8/,/K/,3.24(K8*'(M1135-,.6479(N*(13.14K-*(/>2/(288,6.,.6(2(13-*(/3(8/K-*./(
5*135-8( ,.( 3K5( ?53?38*-( 052R*N35O( 12.( 652./( 4*25.,.6( 2.247/,18(5*8*251>*58( />*( 2S,4,/7( /3( 8/K-7(
S*>2Q,3K58( 30( 8?*1,0,1( 8/K-*./8( 2.-9( />*5*035*9( 12.( S*.*0,/( 4*25.*58'()*8?,/*( />*( 021/( />2/( 4*25.,.6(
2.247/,18(?38*8(*/>,124(1>244*.6*89(/>*(R2,.( 6324( ,8( 8/,44(/3( S*.*0,/(4*25.,.6(*.Q,53.R*./8( 2.-(8/K-*./89(
8K1>( 28( R2O,.6( 5*13RR*.-2/,3.89( 14288,07,.6( 8/K-*./8( ,./3(?530,4*8(35( ?5*-,1/,.6( />*,5(?*5035R2.1*(
!WS.*5(_(E1>r.9( "#$<g( X5*44*5(_()521>84*59("#$"g( E42-*( _( j5,.84339("#$<g( `>24,4( _( WS.*59( "#$H2g( `>24,49(
`28/4(_(WS.*59("#$%&'(
'
4 LIMITATIONS
)*8?,/*( />*( 021/( />2/( -*+,-*./,0,12/,3.( ?53/*1/8( 13.0,-*./,24(,.035R 2/,3.( 2.-( ?5,Q2179( />*( -*+,-*./,0,*-(
-2/2( 8/,44( ?38*8(83R*( ?5,Q217( 5,8O8( !j*/*58*.9( "#$"&'( D.( R2.7( 128*89( 83R*( 2//5,SK/*8( 25*( 12?2S4*( 30(
,-*./,07,.6( ,.-,Q,-K248g( ,.( 3/>*5( 128*89( 2//21O*58( 12.( 4,.O( 5*135-8( /36*/>*5( 053R ( -,00*5*./( 83K51*8( 2.-(
/>*5*035*(b13-*(S5*2Oc(/>*(-*+,-*./,0,12/,3.'(e.(/>*(3/>*5(>2.-9(,.(/>*,5(?2?*5(bj5,Q2179(M.3.7R,/79(2.-(
U,6( )2/2( ,.( />*( E31,24( E1,*.1*89c()25,*8( */( 24'( !"#$h&(288K5*-( />2/( N,/>( -*+,-*./,0,12/,3.9( />*5*( ,8( .3(
6K252./**( 30(O**?,.6(/>*( 2.2478,8( ?531*88( K.1355K?/*-'( j25-3(2.-( E,*R*.8( 265**(/>2/( b-2/2( 12.( S*(
*,/>*5(K8*0K4(35(?*50*1/47(2.3.7R3K89(SK/(.*Q*5(S3/>c(!"#$h9(?'(hhG&'(I>*(S3//3R(4,.*(,8(/>2/(/>*(8/5,1/*5(
/>*(-*+,-*./,0,12/,3.(6K,-*4,.*89(/>*(65*2/*5(/>*(.*62/,Q*(200*1/(3.(/>*(K4/,R2/*(2.2478,8'((
5 CONCLUSION
E,.1*(4*25.,.6(2.247/,18(0,58/(S*12R*(O.3N.(,.("#$$9(,/(>28(>*4?*-(4*25.*58(/3(,R?53Q*(/>*,5(?*5035R2.1*(
S28*-(3.(2.247Y,.6(/>*,5(*-K12/,3.24(-2/2'( F*Q*5/>*4*889(/>,8(0,*4-(52,8*8(R2.7(,88K*8(5*42/*-(/3(*/>,18(
2.-(3N.*58>,?'(I>*(R288,Q*(8124*(30(-2/2(1344*1/,3.(2.-(2.2478,8(4*2-8( /3( ^K*8/,3.8( 2S3K/(/>*(13.8*./(
2.-( ?5,Q217( 30( ?*583.24( ,.035R2/,3.'( I>,8( ?2?*5( R2,.47( -,81K88*8( 3.*( 30( />*( 2//2,.2S4*(834K/,3.8( 035(
?5*8*5Q,.6(4*25.*58](8*.8,/,Q*(,.035R2/,3.9( />*(b-*+,-*./,0,12/,3.(30(-2/2c( /3( 021,4,/2/*(4*25.,.6(2.247/,18(
2??4,12/,3.8'(f*(8>*-(4,6>/(3.(/>,8(/3?,1(Q,2(TE(2.-(WT(5*6K42/,3.8(5*625-,.6(-2/2(?5,Q217'(f*(?53?38*-(
2( 13.1*?/K24( 2??5321>( N,/>(*B2R?4*8( 30( -*+,-*./,0,12/,3.( /*1>.,^K*8( />2/ ( 288,8/(K8(N,/>(3K5( b,d33kc(
?42/035R(!>//?@AANNN',R33B'2/&(2.-(12.(>*4?(4*25.,.6(2.247/,18(8?*1,24,8/8(?5*8*5Q*(13.0,-*./,24(4*25.*5(
,.035R2/,3.'(
M4/>3K6>(-*+,-*./,0,12/,3.(,8( .3/(2( 0334?5330(834K/,3.( 035(?53/*1/,.6(4*25.*5(?5,Q2179( ,/(,8( 2.( ,R?*52/,Q*(
13.8,-*52/,3.(,.(*B2R,.,.6(/>*(*/>,124(-,R*.8,3.8(30(4*25.,.6(2.247/,18'('
'
'
3 (\K4*(hH(P'i'\'(l($%h'H$h!1&'
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43M(
REFERENCES
(
M5/,14*( ":( )2/2( j53/*1/,3.( f35O,.6( j25/7'( !"#$h&'( e?,.,3.( #HA"#$h( 3.( M.3.7R,82/,3.( I*1>.,^K*8(
!#=":A$hAWF( fj"$%&'( \*/5,*Q*-( 053R( >//?@AA*1'*K53?2'*KACK8/,1*A-2/2+?53/*1/,3.A25/,14*+
":A-31KR*./2/,3.A3?,.,3.+5*13RR*.-2/,3.A0,4*8A"#$hAN?"$%s*.'?-0(
M.1,2KB9(F'9(U3K62.,R9(L'9(t2.([**5-*9(['9(jK1>*5249(j'9(_(M?*589(j'(d'(!"##=&'()2/2(-*652-2/,3.@(d2O,.6(
?5,Q2/*(-2/2(4*88(8*.8,/,Q*(3Q*5(/,R*'(D.(J'(X'(E>2.2>2.9(E'(MR*5+V2>,29(D'(d2.34*81K9(V'(u>2.69()'(
M'( WQ2.89(M'( `341Y9(`'+E'( P>3,9(M'( P>3N->K57(!W-8'&9( N$"1++9,%-2( ")( 4M0A(.OJ( <%0+$%&0,"%&'(
O"%)+$+%1+( "%( <%)"$B&0,"%( &%9( P%"F'+9-+( J&%&-+B+%0( QO<PJ(577RS( !??'($h#$;$h#"&'( F*N(
V35O@(MPd'(>//?@AA-B'-3,'356A$#'$$hHA$hH=#="'$hH=<#$(
U2O*59(\'(E'(J'(-'(!"#$<&'(L*25.,.69(81>334,.69(2.-(-2/2(2.247/,18'(D.(d'(dK5?>79(E'(\*--,.69(_(J'(IN7R2.(
!W-8'&9(8&%9:"";( "%( ,%%"=&0,"%2( ,%( '+&$%,%-( )"$( 20&0+2?( 9,20$,102?( &%9( 21A""'2( !??'($G:;$:#&'(
j>,42-*4?>,29(jM@(P*./*5(3.(D..3Q2/,3.8(,.(L*25.,.69(I*R?4*(T.,Q*58,/7'(
U,*.O3N8O,9( d'9( i*.69( d'9( _( d*2.89( U'( !"#$"&'( T%A&%1,%-( 0+&1A,%-( &%9( '+&$%,%-( 0A$"#-A( +9#1&0,"%&'(
9&0&(B,%,%-(&%9( '+&$%,%-( &%&'/0,12U(.%( ,22#+( :$,+)'( \*/5,*Q*-( 053R(/>*( N*S8,/*( 30( />*(e00,1*(30(
W-K12/,3.24( I*1>.343679( TE( )*?25/R*./( 30( W-K12/,3.9(>//?8@AA/*1>'*-'63QAN?+
13./*./AK?432-8A"#$hA#<A*-R+42+S5,*0'?-0(
U53N.9( d'( !"#$$&'( *+&$%,%-( &%&'/0,12U( HA+( 1"B,%-( 0A,$9( F&=+( !T@VO.V>T( *+&$%,%-( <%,0,&0,=+( K$,+)S'(
\*/5,*Q*-(053R(W)TPMTEW(4,S5257(>//?8@AA.*/'*-K12K8*'*-KA,5A4,S5257A?-0AWLDU$$#$'?-0(
P33?*59( F'( !"##:&'( f35O0351*( -*R3652?>,1( 2.247/,18( 7,*4-( >*24/>+125*( 82Q,.68'( TBW'"/B+%0( X+'&0,"%2(
H"9&/(3L!<&9($<;$='(>//?@AA-B'-3,'356A$#'$##"A*5/'"#"H%(
P35R3-*9( X'9( _( E5,Q28/2Q29( )'( !"##:&'(M.3.7R,Y*-( -2/2@( X*.*52/,3.9( R3-*489( K826*'( D.( P '( U,..,6( _( U'(
)26*Q,44*(!W-8'&9(N$"1++9,%-2(")(0A+(3I0A(<%0+$%&0,"%&'( O"%)+$+%1+( "%( J&%&-+B+%0( ")( @&0&(!??'(
$#$H;$#$=&'(F*N(V35O@(MPd'(>//?@AA-B'-3,'356A$#'$$#:ADP)W'"#$#'HhhGG"$(
U37-9( )'( !"##=&'( i21*S33O]8( ?5,Q217( /52,.N5*1O@( WB?38K5*9( ,.Q28,3.9( 2.-( 831,24( 13.Q*56*.1*'(
O"%=+$-+%1+U(HA+(<%0+$%&0,"%&'(!"#$%&'(")(X+2+&$1A(,%0"(E+F(J+9,&(H+1A%"'"-,+29(4D!$&9($<;"#'(
>//?@AA-B'-3,'356A$#'$$GGA$<Hh=H%H#G#=hh$%(
)25,*89(J'(j'9(\*,1>9(J'9(f24-39(J'9(V3K.69(W'(d'9(f>,//,.6>,449(J'9([39(M'()'9('''(_(P>K2.69( D'( !"#$h&'( j5,Q2179(
2.3.7R,/79( 2.-( S,6( -2/2( ,.( />*( 831,24( 81,*.1*8'( O"BB#%,1&0,"%2( ")( 0A+( .OJ9( IM!:&9( H%;%<'(
>//?@AA-B'-3,'356A$#'$$hHA"%h<$<"(
)521>84*59(['(_(X5*44*59(f'(!"#$%&'(j5,Q217(2.-( 2.247/,18(;(,/v8(2()WLDPMIW(,88K*'((M(1>*1O4,8/(/3(*8/2S4,8>(
/5K8/*-(4*25.,.6(2.247/,18'(N$"1++9,%-2(")(0A+(L0A(<%0+$%&0,"%&'(O"%)+$+%1+("%(*+&$%,%-(.%&'/0,12(
&%9(P%"F'+9-+(Q*.P(Y4LS9(=:;:='(>//?@AA-B'-3,'356A$#'$$hHA"==<=H$'"==<=:<(
WS.*59(d'9(_(E1>r.9(d'(!"#$<&'(f>7(4*25.,.6(2.247/,18(,.(?5,R257(*-K12/,3.(R2//*58'(D.(P'(`2526,2..,-,8(
_(E'(X520(!W-8'&9(K#''+0,%(")(0A+(H+1A%,1&'(O"BB,00++("%(*+&$%,%-(H+1A%"'"-/9(4I!"&9($h;$G'(
WK538/2/'(!$::%&'(d2.K24(3.(-,81438K5*(13./534(R*/>3-8'(*#Z+B:"#$-U([)),1+()"$([)),1,&'( N#:',1&0,"%2(")(
0A+( T#$"W+&%( O"BB#%,0,+2'( \*/5,*Q*-( 053R(
>//?@AA*1'*K53?2'*KA*K538/2/A52R3.A8/2/R2.K248A0,4*8AR2.K24s3.s-,81438K5*s13./534sR*/>3-
8s$::%'?-0(
X5*44*59(f'9(_()521>84*59(['(!"#$"&'(I52.842/,.6(4*25.,.6(,./3(.KRS*58@(M(6*.*5,1(052R*N35O(035(4*25.,.6(
2.247/,18'(T9#1&0,"%&'(H+1A%"'"-/(&%9(>"1,+0/9(4I!<&9(h";HG'(
`>24,49(d'9(_(WS.*59(d'(!"#$H2&'(M(EIWd(deeP(035(81>334(1>,4-5*.@(f>2/(-3*8(4*25.,.6(2.247/,18(/*44(K8m(
<%0+$%&0,"%&'( O"%)+$+%1+( "%( <%0+$&10,=+( O"''&:"$&0,=+( *+&$%,%-(!DPL("#$H&9( !??'( $"$G;$""$&'(
i435*.1*9(D/247@(DWWW'(
`>24,49(d'9(_(WS.*59(d'(!"#$HS&'(L*25.,.6(2.247/,18@(j5,.1,?4*8(2.-(13.8/52,./8'(D.(E'(P254,.*59(P'(iK4035-9(_(
F'( e8/28>*N8O,( !W-8'&9( N$"1++9,%-2( ")( T9J+9,&U( \"$'9( O"%)+$+%1+( "%( T9#1&0,"%&'( J+9,&( &%9(
H+1A%"'"-/?(574IQ4S?($G=:;$G::'(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43R(
`>24,49( d'9( `28/49( P'9( _( WS.*59( d'( !"#$%&'( j35/527,.6( deeP8( 4*25.*58@( M( 14K8/*5,.6( *B?*5,*.1*( K8,.6(
4*25.,.6(2.247/,18'(D.(d'(`>24,49(d'(WS.*59(d'(`3??9(M'(L35*.Y9(_(d'(`24Y(!W-8'&9(N$"1++9,%-2(")(0A+(
T#$"W+&%( >0&;+A"'9+$( >#BB,0( "%( +ZW+$,+%1+2( &%9( :+20( W$&10,1+2( ,%( &%9( &$"#%9( J[[O2(
QTJ[[O>(574L&(!??'("%H+"G=&'(F35-*58/*-/g(X*5R2.7@(U33O8(,.()*R2.-(XRS['(
d1P244,8/*59( W'9( X52.1*9( I'9( _( E12503.*9( `'( !"#$#&'( ]#,9+( 0"( W$"0+10,%-( 0A+( 1"%),9+%0,&',0/( ")( W+$2"%&''/(
,9+%0,),&:'+( ,%)"$B&0,"%(QN<<S(!\*13RR*.-2/,3.8( 30( />*( F2/,3.24( D.8/,/K/*( 30(E/2.-25-8( 2.-(
I*1>.34367&((\*/5,*Q*-(053R(/>*(N*S8,/*(30(P3R?K/*5(E*1K5,/7(),Q,8,3.(30(/>*(F2/,3.24(D.8/,/K/*(
30(E/2.-25-8(2.-(I*1>.34367(>//?@AA1851'.,8/'63QA?KS4,12/,3.8A.,8/?KS8A=##+$""A8?=##+$""'?-0(
dDI( F*N8'( !"#$h9( d27( <#&'( dDI( 2.-( [25Q25-( 5*4*28*( -*+,-*./,0,*-( 4*25.,.6( -2/2( 053R( 3?*.( 3.4,.*(
13K58*8'( wf*S( ?38/( S7( />*( dDI( F*N8( e00,1*x'( \*/5,*Q*-( 053R( >//?@AA.*N8'R,/'*-KA"#$hAR,/+
2.-+>25Q25-+5*4*28*+-*+,-*./,0,*-+4*25.,.6+-2/2+3?*.+3.4,.*+13K58*8(
e>R9(j'(!"#$#&'(U53O*.(?53R,8*8(30(?5,Q217@(\*8?3.-,.6(/3(/>*(8K5?5,8,.6(02,4K5*(30(2.3.7R,Y2/,3.'(VO*.(
*&F(X+=,+F9(IM9($G#$'(
j25-39(M'9( _(E,*R*.89( X'( !"#$h&'( W/>,124(2.-( ?5,Q217( ?5,.1,?4*8(035( 4*25.,.6( 2.247/,18'(K$,0,2A(!"#$%&'( ")(
T9#1&0,"%&'(H+1A%"'"-/9(DI9(h<=;hH#'(>//?@AA-B'-3,'356A$#'$$$$ASC*/'$"$H"(
j*/*58*.9( \'(J'( !"#$"9( JK47( $=&'( j34,17( -,R*.8,3.8( 30( 2.247/,18( ,.( >,6>*5( *-K12/,3.'(T@VO.V>T( X+=,+F'(
\*/5,*Q*-( 053R( >//?@AA*5'*-K12K8*'*-KA25/,14*8A"#$"AGA?34,17+-,R*.8,3.8+30+2.247/,18+,.+
>,6>*5+*-K12/,3.(
j5,.84339(j'9(_(E42-*9(E'(!"#$<&'(M.(*Q24K2/,3.(30(?34,17(052R*N35O8(035(2--5*88,.6(*/>,124(13.8,-*52/,3.8(
,.(4*25.,.6( 2.247/,18'(N$"1++9,%-2(")(0A+( 3$9(<%0+$%&0,"%&'(O"%)+$+%1+("%(*+&$%,%-(.%&'/0,12(&%9(
P%"F'+9-+9("h#;"hh'(>//?@AA-B'-3,'356A$#'$$hHA"h%#":%'"h%#<hh(
\26>K.2/>2.9(U'( !"#$<&'(HA+(1"BW'+0+(:"";( ")( 9&0&( &%"%/B,C&0,"%U(^$"B(W'&%%,%-(0"( ,BW'+B+%0&0,"%'(
U312(\2/3.9(iL@(P\P(j5*88'(
E2R252/,9(j'9(_(EN**.*79(L'(!$::=&'(N$"0+10,%-(W$,=&1/(FA+%(9,21'"2,%-(,%)"$B&0,"%U(;_&%"%/B,0/(&%9(,02(
+%)"$1+B+%0( 0A$"#-A( -+%+$&',C&0,"%( &%9( 2#WW$+22,"%( !I*1>.,124( 5*?35/&'(\*/5,*Q*-( 053R( />*(
W4*1/53.,1( j5,Q217( D.035R2/,3.( P*./*5( N*S8,/*(
>//?8@AA*?,1'356A?5,Q217A5*,-*./,0,12/,3.AE2R252/,sEN**.*7s?2?*5'?-0(
E,.6*59( F'( !"#$h9( F3Q*RS*5( $=&'( P4288)3C3( 2-3?/8( -*4*/,3.( ?34,17( 035( 8/K-*./( -2/2'( E+F( G"$;(H,B+2'(
\*/5,*Q*-(053R(>//?@AAS,/8'S4368'.7/,R*8'13RA"#$hA$$A$=A14288-3C3+2-3?/8+-*4*/,3.+?34,17+035+
8/K-*./+-2/2A(
E42-*9(E'9(_(X24?,.9(i'(!"#$"&'(L*25.,.6(2.247/,18(2.-(>,6>*5(*-K12/,3.@(W/>,124(?*58?*1/,Q*8'(N$"1++9,%-2(
")( 0A+( 5%9(<%0+$%&0,"%&'( O"%)+$+%1+( "%( *+&$%,%-( .%&'/0,12( &%9( P%"F'+9-+(Q*.P( `45S9( $%;$G'(
>//?@AA-B'-3,'356A$#'$$hHA"<<#%#$'"<<#%$#(
E42-*9( E'9( _( j5,.84339( j'( !"#$<&'( L*25.,.6( 2.247/,18@( W/>,124( ,88K*8( 2.-( -,4*RR28'( .B+$,1&%( K+A&=,"$&'(
>1,+%0,20?(IM?($H#:;$H"='(>//?@AA-B'-3,'356A$#'$$GGA###"G%h"$<hG:<%%(
E36>3,2.9( P'( !"##G9( )*1*RS*5( $&'( MeL9( F*/04,B( 2.-( />*( *.-( 30( 3?*.( 211*88( /3( 5*8*251>( -2/2'( O_E+0a(
\*/5,*Q*-( 053R( >//?@AANNN'1.*/'13RA.*N8A234+.*/04,B+2.-+/>*+*.-+30+3?*.+211*88+/3+
5*8*251>+-2/2A(
EN**.*79(L'(!"###&'(E,R?4*(-*R3652?>,18(30/*.(,-*./,07(?*3?4*(K.,^K*47'(8+&'0A(Q>&%(^$&%1,21"S9(LM49($;
<h'(E2.(i52.1,8139(PM'(
f21>/4*59(J'9(`>24,49(d'9(I2526>,9(U'9(_(WS.*59(d'(!"#$%&'(e.(K8,.6(4*25.,.6(2.247/,18(/3(/521O(/>*(21/,Q,/7(30(
,./*521/,Q*( deeP( Q,-*38'( D.( d'( X,2..2O389( )'X'( E2R?83.9( L'( `,-Y,.8O,9( M'( j25-3( !W-8'&9(
N$"1++9,%-2( ")( 0A+( *.P( 574L( \"$;2A"W( "%( >B&$0( T%=,$"%B+%02( &%9( .%&'/0,12( ,%( b,9+"_K&2+9(
*+&$%,%-( !??'=;$G&( W-,.SK56>9( E13/42.-@(PWT\E+fE'( \*/5,*Q*-( 053R( >//?@AA1*K5+N8'356At34+
$HG:A?2?*5<'?-0(
... It is common, and even educationally desirable, for contributors in online discussions to refer to one another by name and to sign their own posts [21,22,24]. Before using such data for research purposes in learning analytics, it is good ethical practice -and often a strict requirement [4,10,13,18] -that personally identifying information (PII) is removed. The category of PII is not limited to names and also includes email addresses, phone numbers, user names, dates of birth, places of work or study, and other pieces of data that could be used to identify an individual [13]. ...
... Before using such data for research purposes in learning analytics, it is good ethical practice -and often a strict requirement [4,10,13,18] -that personally identifying information (PII) is removed. The category of PII is not limited to names and also includes email addresses, phone numbers, user names, dates of birth, places of work or study, and other pieces of data that could be used to identify an individual [13]. The content of PII is generally of little interest to educational researchers, who have no need for private information such as dates of birth. ...
... While metadata can be removed and other elements of PII can simply be redacted, personal names are often used to indicate the intended recipient of a message and to refer back to points raised by others in earlier messages. Masking, where a single replacement token (e.g., NAME) is used to redact all names throughout the data set [13] might be sufficient for some use cases [17,23], but it discards important information [19] and can harm performance on subsequent analysis tasks [2]. In order to identify the same individual across different messages, personal names must instead be replaced consistently with alternative identifiers, or pseudonyms. ...
... However, there are some shortcomings in the current technical solutions-for example, some methods are only used in MOOC and cannot be used in other broader scenarios, lacking generalisability. Furthermore, technical solutions come at the expense of data utility and are usually expensive (Khalil, 2018;Khalil & Ebner, 2016). In addition, staff who use technical tools will be a problem, because the tools will be used by humans, and technical solutions do not have a regulatory framework for staff who will use this tool. ...
Article
Full-text available
The field of learning analytics has advanced from infancy stages into a more practical domain, where tangible solutions are being implemented. Nevertheless, the field has encountered numerous privacy and data protection issues that have garnered significant and growing attention. In this systematic review, four databases were searched concerning privacy and data protection issues of learning analytics. A final corpus of 47 papers published in top educational technology journals was selected after running an eligibility check. An analysis of the final corpus was carried out to answer the following three research questions: (1) What are the privacy and data protection issues in learning analytics? (2) What are the similarities and differences between the views of stakeholders from different backgrounds on privacy and data protection issues in learning analytics? (3) How have previous approaches attempted to address privacy and data protection issues? The results of the systematic review show that there are eight distinct, intertwined privacy and data protection issues that cut across the learning analytics cycle. There are both cross‐regional similarities and three sets of differences in stakeholder perceptions towards privacy and data protection in learning analytics. With regard to previous attempts to approach privacy and data protection issues in learning analytics, there is a notable dearth of applied evidence, which impedes the assessment of their effectiveness. The findings of our paper suggest that privacy and data protection issues should not be relaxed at any point in the implementation of learning analytics, as these issues persist throughout the learning analytics development cycle. One key implication of this review suggests that solutions to privacy and data protection issues in learning analytics should be more evidence‐based, thereby increasing the trustworthiness of learning analytics and its usefulness. Practitioner notes What is already known about this topic Research on privacy and data protection in learning analytics has become a recognised challenge that hinders the further expansion of learning analytics. Proposals to counter the privacy and data protection issues in learning analytics are blurry; there is a lack of a summary of previously proposed solutions. What this study contributes Establishment of what privacy and data protection issues exist at different phases of the learning analytics cycle. Identification of how different stakeholders view privacy, similarities and differences, and what factors influence their views. Evaluation and comparison of previously proposed solutions that attempt to address privacy and data protection in learning analytics. Implications for practice and/or policy Privacy and data protection issues need to be viewed in the context of the entire cycle of learning analytics. Stakeholder views on privacy and data protection in learning analytics have commonalities across contexts and differences that can arise within the same context. Before implementing learning analytics, targeted research should be conducted with stakeholders. Solutions that attempt to address privacy and data protection issues in learning analytics should be put into practice as far as possible to better test their usefulness.
... There is ample research setting out the costs and benefits of sharing personal data (e.g., see [10]). Additionally, it is well established that individuals are anything but rational when they routinely accept a platform's Terms and Conditions without reading them [30]. ...
Chapter
Over the past decade or so, learning analytics (LA) has matured as a research field and as operational practice within many educational institutions, mostly in the Global North. Learning analytics is commonly defined as the measurement, collection, analysis and use of students’ data to improve students’ learning. Until recently, the main sources of data for LA were restricted to institutional datasets gathered from, for example, learning management systems (LMSs) registration data, etc. Since such data gathering took place within a relatively closed digital ecosystem, institutions held the responsibility to maintain student privacy and to restrict their data collection to that needed to carry out their educational duties. The increasing digitisation and datafication of higher education combined with increased commercialisation of teaching and learning support systems and applications, acts to destabilise this understanding of learning analytics as a digital ecosystem. Given these continuing changes, agreements with plat-form providers and the roles of social media, applications, plugins, and mobile learning in teaching and learning now prompt us to consider learning analytics as data ecology rather than as a ‘closed’ ecosystem. This paper first maps learning analytics as data ecology before illustrating the need to think differently about its ethical implications.Keywordsdata ecologyethicsdata interestsprivacylearning analytics
... The granularity of the data in terms of the data subject and the specificity of the data have already been emphasized in research in education (Jones et al., 2020). Moreover, aggregating data on a higher level could speak to the ethical issues proposed by Khalil & Ebner (2016) and Taylor et al. (2016) about de-identifying personal data or analysing data on a coarser level to accommodate for ethical challenges. We propose the following hypotheses: ...
Article
Full-text available
Algorithmic systems such as Learning Analytics (LA) are driving the datafication and algorithmization of education. In this research, we focus on the appropriateness of LA systems from the perspective of parents and students in secondary education. Anchored in the contextual integrity framework (Nissenbaum, Washington Law Review, 79, 41, 2004), we conducted two survey studies (Nstudents=277, Nparents=1013) in Flanders to investigate how they evaluate the appropriateness of the data flows in LA systems, and how both populations differ in their evaluations. The results show that the most-used student-centered LA are perceived less appropriate than the less-used teacher-centered LA by both students and parents. The usage of personal characteristics in LA is perceived as least appropriate, in contrast to coarser class characteristics. Sharing insights of LA with institutions that are part of the traditional educational context, such as the school, is seen as the most appropriate, and more appropriate than sharing it with learning platforms or third parties (e.g., Big Tech). Overall, we found that parents evaluated the different elements of the dataflows embedded in LA as less appropriate than students. In the discussion, we argue that educational institutions should include the evaluation of both parents and students to further manage expectations and construct shared norms and practices when implementing LA in education.
... Під час впровадження проєктів навчальної аналітики має бути забезпечена конфіденційність залучених сторін як невід'ємна частина цілісності їх особистості відповідно до основних прав людини в розвинених країнах. Для вирішення пов'язаних з цим проблем наукова спільнота застосовує різні підходи в галузі захисту даних у процесі їх використання, досліджує можливість вжиття заходів із забезпечення анонімності даних [32]. Проте навіть під час збереження приватності у роботі з ініціативами навчальної аналітики від самого початку ситуація може ускладнитися під час об'єднання та інтеграції даних, взятих із різних джерел. ...
Article
Full-text available
Стаття присвячена дослідженню проблем впровадження Learning Analytics – навчальної аналітики у сферу вищої освіти. Розкрито зміст поняття «Learning Analytics», проаналізовано досвід її впровадження у діяльність вищих навчальних закладів країн світу. Установлено, що навчальна аналітика як галузь наукового дослідження є поєднанням інформаційних технологій, цифрового викладання і навчання та методів інтелектуального аналізу даних, що обумовлює специфіку її формування та проблематику. Виявлено задачі, які вона дозволяє розв’язувати стосовно різних аспектів електронного навчання: прогнозування, виявлення структури, виявлення зв’язків та асоціацій на основі аналізу цифрових слідів студентів у освітніх електронних середовищах. З’ясовано перспективні напрями досліджень на сучасному етапі. Установлено, що впровадження у діяльність закладів вищої освіти основних типів навчальної аналітики: описової, прогностичної та пропонуючої, дає можливість отримувати інформацію про поточний стан електронного навчання та оперативно приймати рішення стосовно його корекції й оптимізації. Сформульовано перелік проблем, пов’язаних зі стратегічним плануванням і політикою впровадження навчальної аналітики у діяльність вищих навчальних закладів: недосконалість керівництва в реалізації проектів; нерівномірне залучення різних зацікавлених сторін; недостатній рівень педагогічних підходів при інтерпретації отримуваних даних; недостатній рівень підготовки персоналу; недостатня кількість досліджень, емпірично підтверджуючих вплив на ефективність навчального процесу; недосконалість нормативного регулювання. Показано, що ці проблеми є міждисциплінарними, а їх вирішення потребує тісної співпраці та узгоджених дій адміністраторів, ІТ-фахівців, викладачів та педагогів-дослідників упродовж усіх етапів реалізації проекту. Теоретично обґрунтовано пропозиції щодо заходів, спрямованих на подолання міждисциплінарного бар’єру у процесі розробки та експлуатації проектів навчальної аналітики: чіткість та прозорість цілей і ініціатив; задоволення потреб усіх зацікавлених сторін; забезпечення необхідної ІТ-інфраструктури; підготовка співробітників, які будуть надавати допомогу в інтерпретації отриманих результатів; забезпечення безпеки конфіденційних даних; розробка нормативних положень стосовно функціонування та використання навчальної аналітики.
... Naturalmente, o processamento dos dados dos estudantes, muitos dos quais são identificáveis, levantam sérias questões éticas. Por um lado, a privacidade dos estudantes é posta em causa com a agregação e análise dos seus dados de aprendizagem (Khalil & Ebner, 2016). Por outro lado, a utilização da análise de dados, especialmente na produção de medidas preditivas, pode limitar a autonomia dos estudantes e impor padrões contrários às expectativas normativas e aos valores aceites nos sistemas educativos . ...
Chapter
Full-text available
No contexto de mediatização profunda e plataformização em que vivemos, o interesse pelos media sociais enquanto fonte ou ferramenta de investigação tem crescido no âmbito das ciências da educação. Enquanto surgem novos métodos adaptados ao ambiente digital (métodos digitais, análise de redes sociais, análise do discurso mediado por computador e etnografia virtual), expandem-se também as preocupações éticas com os sujeitos participantes nestes estudos. Online, questões como a fronteira entre privado e público, consentimento informado, anonimato e risco de dano ganham dimensão própria. Este capítulo levanta os desafios éticos inerentes aos estudos dos media sociais e, com base em questões que cada investigador deve fazer sobre a sua própria investigação, aponta caminhos para garantir a proteção dos participantes nestes estudos. Ressalta-se que cada fase da pesquisa levanta preocupações éticas específicas que precisam ser levadas em consideração. Nesta perspetiva, inspiradas nas recomendações da Association of Internet Researchers (Franzke, Bechmann, Zimmer, Ess & the AoIR, 2020) e em Townsend et al (2017), sugere-se medidas relacionadas a questões legais (consulta dos termos de uso específicos da plataforma que se está a estudar, direcionados aos utilizadores e a terceiros; das orientações da instituição para a qual o investigador trabalha, dos financiadores da investigação e das organizações que representam a área de estudo, além das leis vigentes no país do estudo), privacidade e risco (ter em consideração se os sujeitos participantes na investigação têm ou não expectativa de serem observados por estranhos, se são especialmente vulneráveis e se o tema é especialmente sensível) e reutilização e publicação dos dados (em caso de sujeitos vulneráveis ou temáticas sensíveis, anonimizar os utilizadores na partilha pública de dados, ou seja, não usar imagens e parafrasear os textos dos utilizadores para evitar identificação através de motores de busca).
... Although most present a comprehensive survey on privacy-preserving and cryptographic techniques [13][14][15], they need to link the surveyed de-identification techniques to practical privacy problems. Finally, the majority focus on different contexts, such as healthcare [16][17][18][19][20][21], data mining [22,23], social networks [24,25] and other contexts [26,27], as summarized in Table 1. An exception is [28], which presents a comprehensive review of the privacy-preserving and cryptographic techniques, and briefly elaborates on applying these technologies to some smart city scenarios. ...
Article
Full-text available
Smart cities, leveraging IoT technologies, are revolutionizing the quality of life for citizens. However, the massive data generated in these cities also poses significant privacy risks, particularly in de-anonymization and re-identification. This survey focuses on the privacy concerns and commonly used techniques for data protection in smart cities, specifically addressing geolocation data and video surveillance. We categorize the attacks into linking, predictive and inference, and side-channel attacks. Furthermore, we examine the most widely employed de-identification and anonymization techniques, highlighting privacy-preserving techniques and anonymization tools; while these methods can reduce the privacy risks, they are not enough to address all the challenges. In addition, we argue that de-identification must involve properties such as unlikability, selective disclosure and self-sovereignty. This paper concludes by outlining future research challenges in achieving complete de-identification in smart cities.
... It provides opportunities for data-driven insight aiming to support teachers and learners, and to optimize pedagogy and learning environments. Learning analytics increasingly relies on supervised and unsupervised algorithmic decision-making that may add to concerns around security (Khalil & Ebner, 2016), as well as equity, responsibility, and fairness (Prinsloo et al., 2023). Such concerns might also be considered for learning analytics as a whole (Holstein & Doroudi, 2019). ...
Article
Full-text available
Learning analytics has the capacity to provide potential benefit to a wide range of stakeholders within a range of educational contexts. It can provide prompt support to students, facilitate effective teaching, highlight aspects of course content that might be adapted, and predict a range of possible outcomes, such as students registering for more appropriate courses, supporting students’ self-efficacy, or redesigning a course’s pedagogical strategy. It will do all these things based on the assumptions and rules that learning analytics developers set out. As such, learning analytics can exacerbate existing inequalities such as unequal access to support or opportunities based on (any combination of) race, gender, culture, age, socioeconomic status, etc., or work to overcome the impact of such inequalities on realizing student potential. In this editorial, we introduce several selected articles that explore the principles of fairness, equity, and responsibility in the context of learning analytics. We discuss existing research and summarize the papers within this special section to outline what is known, and what remains to be explored. This editorial concludes by celebrating the breadth of work set out here, but also by suggesting that there are no simple answers to ensuring fairness, trust, transparency, equity, and responsibility in learning analytics. More needs to be done to ensure that our mutual understanding of responsible learning analytics continues to be embedded in the learning analytics research and design practice.
... There are two ways of ensuring privacy in LA. On the one hand, the privacy-preserving data-publishing approach, which consists of applying data de-identification and anonymization techniques (e.g., satisfying the definition of k-anonymity [4]) and then using conventional ML methods [5,6]. On the other hand, in the privacy-preserving data mining or statistical disclosure control approach, the analyst does not directly access the data but uses a query mechanism that adds statistical noise to the response, implementing differential privacy [7]. ...
Article
Full-text available
Federated learning techniques aim to train and build machine learning models based on distributed datasets across multiple devices while avoiding data leakage. The main idea is to perform training on remote devices or isolated data centers without transferring data to centralized repositories, thus mitigating privacy risks. Data analytics in education, in particular learning analytics, is a promising scenario to apply this approach to address the legal and ethical issues related to processing sensitive data. Indeed, given the nature of the data to be studied (personal data, educational outcomes, and data concerning minors), it is essential to ensure that the conduct of these studies and the publication of the results provide the necessary guarantees to protect the privacy of the individuals involved and the protection of their data. In addition, the application of quantitative techniques based on the exploitation of data on the use of educational platforms, student performance, use of devices, etc., can account for educational problems such as the determination of user profiles, personalized learning trajectories, or early dropout indicators and alerts, among others. This paper presents the application of federated learning techniques to a well-known learning analytics problem: student dropout prediction. The experiments allow us to conclude that the proposed solutions achieve comparable results from the performance point of view with the centralized versions, avoiding the concentration of all the data in a single place for training the models.
Technical Report
Full-text available
In data mining and data analytics, tools and techniques once confined to research laboratories are being adopted by forward-looking industries to generate business intelligence for improving decision making. Higher education institutions are beginning to use analytics for improving the services they provide and for increasing student grades and retention. The U.S. Department of Education's National Education Technology Plan, as one part of its model for 21st-century learning powered by technology, envisions ways of using data from online learning systems to improve instruction. With analytics and data mining experiments in education starting to proliferate, sorting out fact from fiction and identifying research possibilitiesand practical applications are not easy. This issue brief is intended to help policymakers and administrators understand how analytics and data mining have been-and can be-applied for educational improvement. At present, educational data mining tends to focus on developing new tools for discovering patterns in data. These patterns are generally about the microconcepts involved in learning: one-digit multiplication, subtraction with carries, and so on. Learning analytics-at least as it is currently contrasted with data mining-focuses on applying tools and techniques at larger scales, such as in courses and at schools and postsecondary institutions. But both disciplines work with patterns and prediction: If we can discern the pattern in the data and make sense of what is happening, we can predict what should come next and take the appropriate action. Educational data mining and learning analytics are used to research and build models in several areas that can influence online learning systems. One area is user modeling, which encompasses what a learner knows, what a learner's behavior and motivation are, what the user experience is like, and how satisfied users are with online learning. At the simplest level, analytics can detect when a student in an online course is going astray and nudge him or her on to a course correction. At the most complex, they hold promise of detecting boredom from patterns of key clicks and redirecting the student's attention. Because these data are gathered in real time, there is a real possibility of continuous improvement via multiple feedback loops that operate at different time scales-immediate to the student for the next problem, daily to the teacher for the next day's teaching, monthly to the principal for judging progress, and annually to the district and state administrators for overall school improvement. The same kinds of data that inform user or learner models can be used to profile users. Profiling as used here means grouping similar users into categories using salient characteristics. These categories then can be used to offer experiences to groups of users or to make recommendations to the users and adaptations to how a system performs. User modeling and profiling are suggestive of real-time adaptations. In contrast, some applications of data mining and analytics are for more experimental purposes. Domain modeling is largely experimental with the goal of understanding how to present a topic and at what level of detail. The study of learning components and instructional principles also uses experimentation to understand what is effective at promoting learning. These examples suggest that the actions from data mining and analytics are always automatic, but that is less often the case. Visual data analyticsclosely involve humans to help make sense of data, from initial pattern detection and model building to sophisticated data dashboards that present data in a way that humans can act upon. K-12 schools and school districts are starting to adopt such institution-level analyses for detecting areas for instructional improvement, setting policies, and measuring results. Making visible students' learning and assessment activities opens up the possibility for students to develop skills in monitoring their own learning and to see directly how their effort improves their success. Teachers gain views into students' performance that help them adapt their teaching or initiate tutoring, tailored assignments, and the like. Robust applications of educational data mining and learning analytics techniques come with costs and challenges. Information technology (IT) departments will understand the costs associated with collecting and storing logged data, while algorithm developers will recognize the computational costs these techniques still require. Another technical challenge is that educational data systems are not interoperable, so bringing together administrative data and classroom-level data remains a challenge. Yet combining these data can give algorithms better predictive power. Combining data about student performance-online tracking, standardized tests, teachergenerated tests-to form one simplified picture of what a student knows can be difficult and must meet acceptable standards for validity. It also requires careful attention to student and teacher privacy and the ethical obligations associated with knowing and acting on student data. Educational data mining and learning analytics have the potential to make visible data that have heretofore gone unseen, unnoticed, and therefore unactionable. To help further the fields and gain value from their practical applications, the recommendations are that educators and administrators: • Develop a culture of using data for making instructional decisions. • Involve IT departments in planning for data collection and use. • Be smart data consumers who ask critical questions about commercial offerings and create demand for the most useful features and uses. • Start with focused areas where data will help, show success, and then expand to new areas. • Communicate with students and parents about where data come from and how the data are used. • Help align state policies with technical requirements for online learning systems.Researchers and software developers are encouraged to: • Conduct research on usability and effectiveness of data displays. • Help instructors be more effective in the classroom with more realtime and data-based decision support tools, including recommendation services. • Continue to research methods for using identified student information where it will help most, anonymizing data when required, and understanding how to align data across different systems. • Understand how to repurpose predictive models developed in one context to another. A final recommendation is to create and continue strong collaboration across research, commercial, and educational sectors. Commercial companies operate on fast development cycles and can produce data useful for research. Districts and schools want properly vetted learning environments. Effective partnerships can help these organizations codesign the best tools.
Conference Paper
Full-text available
It is widely known that interaction, as well as communication, are very important parts of successful online courses. These features are considered crucial because they help to improve students’ attention in a very significant way. In this publication, the authors present an innovative application, which adds different forms of interactivity to learning videos within MOOCs such as multiple-choice questions or the possibility to communicate with the teacher. Furthermore, Learning Analytics using exploratory examination and visualizations have been applied to unveil learners’ patterns and behaviors as well as investigate the effectiveness of the application. Based upon the quantitative and qualitative observations, our study determined common practices behind dropping out using videos indicator and suggested enhancements to increase the performance of the application as well as learners’ attention.
Conference Paper
Full-text available
Massive Open Online Courses are remote courses that excel in their students' heterogeneity and quantity. Due to the peculiarity of being massiveness, the large datasets generated by MOOCs platforms require advance tools to reveal hidden patterns for enhancing learning and educational environments. This paper offers an interesting study on using one of these tools, clustering, to portray learners' engagement in MOOCs. The research study analyse a university mandatory MOOC, and also opened to the public, in order to classify students into appropriate profiles based on their engagement. We compared the clustering results across MOOC variables and finally, we evaluated our results with an eighties students' motivation scheme to examine the contrast between classical classes and MOOCs classes. Our research pointed out that MOOC participants are strongly following the Cryer's scheme of Elton (1996).
Conference Paper
Full-text available
The widespread adoption of Learning Analytics (LA) and Educational Data Mining (EDM) has somewhat stagnated recently, and in some prominent cases even been reversed following concerns by governments, stakeholders and civil rights groups about privacy and ethics applied to the handling of personal data. In this ongoing discussion, fears and realities are often indistinguishably mixed up, leading to an atmosphere of uncertainty among potential beneficiaries of Learning Analytics, as well as hesitations among institutional managers who aim to innovate their institution's learning support by implementing data and analytics with a view on improving student success. In this paper, we try to get to the heart of the matter, by analysing the most common views and the propositions made by the LA community to solve them. We conclude the paper with an eight-point checklist named DELICATE that can be applied by researchers, policy makers and institutional managers to facilitate a trusted implementation of Learning Analytics.
Conference Paper
Full-text available
Massive Open Online Courses (MOOCs) have been tremendously spreading among Science, Technology, Engineering and Mathematics (STEM) academic disciplines. These MOOCs have served an agglomeration of various learner groups across the world. The leading MOOCs platform in Austria, the iMooX, offers such courses. This paper highlights authors’ experience of applying Learning Analytics to examine the participation of secondary school pupils in one of its courses called “Mechanics in everyday life”. We sighted different patterns and observations and on the contrary of the expected jubilant results of any educational MOOC, we will show, that pupils seemingly decided to consider it not as a real motivating learning route, but rather as an optional homework.
Conference Paper
Full-text available
Within the evolution of technology in education, Learning Analytics has reserved its position as a robust technological field that promises to empower instructors and learners in different educational fields. The 2014 horizon report (Johnson et al., 2014), expects it to be adopted by educational institutions in the near future. However, the processes and phases as well as constraints are still not deeply debated. In this research study, the authors talk about the essence, objectives and methodologies of Learning Analytics and propose a first prototype life cycle that describes its entire process. Furthermore, the authors raise substantial questions related to challenges such as security, policy and ethics issues that limit the beneficial appliances of Learning Analytics processes.
Article
Full-text available
Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification procedures necessitate changes to data sets that threaten replication and extension of baseline analyses. To balance student privacy and the benefits of open data, we suggest focusing on protecting privacy without anonymizing data by instead expanding policies that compel researchers to uphold the privacy of the subjects in open data sets. If we want to have high-quality social science research and also protect the privacy of human subjects, we must eventually have trust in researchers. Otherwise, we'll always have the strict tradeoff between anonymity and science illustrated here.
Conference Paper
Full-text available
Higher education institutions have collected and analysed student data for years, with their focus largely on reporting and management needs. A range of institutional policies exist which broadly set out the purposes for which data will be used and how data will be protected. The growing advent of learning analytics has seen the uses to which student data is put expanding rapidly. Generally though the policies setting out institutional use of student data have not kept pace with this change. Institutional policy frameworks should provide not only an enabling environment for the optimal and ethical harvesting and use of data, but also clarify: who benefits and under what conditions, establish conditions for consent and the de-identification of data, and address issues of vulnerability and harm. A directed content analysis of the policy frameworks of two large distance education institutions shows that current policy frameworks do not facilitate the provision of an enabling environment for learning analytics to fulfil its promise.