ArticlePDF Available

Abstract and Figures

Learning analytics has reserved its position as an important field in the educational sector. However, the large-scale collection, processing, and analyzing of data has steered the wheel beyond the borders to face an abundance of ethical breaches and constraints. Revealing learners’ personal information and attitudes, as well as their activities, are major aspects that lead to identifying individuals personally. Yet, de-identification can keep the process of learning analytics in progress while reducing the risk of inadvertent disclosure of learners’ identities. In this paper, the authors discuss de-identification methods in the context of the learning environment and propose a first prototype conceptual approach that describes the combination of anonymization strategies and learning analytics techniques.
Content may be subject to copyright.
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
456(
De-Identification in Learning Analytics
Mohammad'Khalil'and'Martin'Ebner'
W-K12/,3.24(I*1>.34367(
X52Y(T.,Q*58,/7(30(I*1>.343679(MK8/5,2(
R3>2RR2-'O>24,4Z/K652Y'2/(
ABSTRACT@ L*25.,.6(2.247/,18(>28( 5*8*5Q*-(,/8(?38,/,3.(28(2.(,R?35/2./(0,*4-( ,.(/>*( *-K12/,3.24(
8*1/35'( [3N*Q*59( />*( 4256*+8124*( 1344*1/,3.9( ?531*88,.69( 2.-( 2.247Y,.6( 30( -2/2( >28( 8/**5*-( />*(
N>**4(S*73.-(/>*(S35-*58(/3(021*(2.(2SK.-2.1*( 30( */>,124(S5*21>*8( 2.-( 13.8/52,./8'( \*Q*24,.6(
4*25.*58]( ?*583.24( ,.035R2/,3. ( 2.-( 2//,/K-*89( 28( N*44( 28(/>*,5( 21/,Q,/,*89( 25*( R2C35( 28?*1/8( />2/(
4*2-(/3(,-*./,07,.6(,.-,Q,-K248(?*583.2447'( V*/9( -*+,-*./,0,12/,3.(12.(O**?(/>*(?531*88(30(4*25.,.6(
2.247/,18(,.( ?5365*88( N>,4*( 5*-K1,.6(/>*( 5,8O( 30( ,.2-Q*5/*./(-,81438K5*(30( 4*25.*58]( ,-*./,/,*8'( D.(
/>,8( ?2?*59( />*( 2K/>358( -,81K88(-*+,-*./,0,12/,3.( R*/>3-8( ,.( />*( 13./*B/( 30( />*( 4*25.,.6(
*.Q,53.R*./(2.-(?53?38*(2(0,58/(?53/3/7?*(13.1*?/K24(2??5321>(/>2/(-*815,S*8(/>*(13RS,.2/,3.(
30(2.3.7R,Y2/,3.(8/52/*6,*8(2.-(4*25.,.6(2.247/,18(/*1>.,^K*8'(
(
Keywords:'L*25.,.6(2.247/,189(2.3.7R,Y2/,3.9(-*+,-*./,0,12/,3.9(*/>,189(?5,Q217
(
1 INTRODUCTION
L*25.,.6(2.247/,18(,8(2.(21/,Q*( 25*2( 30(/>*(5*8*251>( 0,*4-(30(3.4,.*( *-K12/,3.(2.-(I*1>.34367(W.>2.1*-(
L*25.,.6(!IWL&'( D/( 2??4,*8(2.2478,8(/*1>.,^K*8( /3( />*(*-K12/,3.( -2/2(8/5*2R( ,.( 35-*5(/3( 21>,*Q*( 8*Q*524(
3SC*1/,Q*8'(I>*8*(3SC*1/,Q*8(R2,.47(2,R(/3(,./*5Q*.*(2.-(?5*-,1/(4*25.*58](?*5035R2.1*(,.(?K58K2.1*(30(
*.>2.1,.6( />*( 4*25.,.6( 13./*B/( 2.-( ,/8( *.Q,53.R*./'( [,6>*5( W-K12/,3.( ![W&( 2.-( 3.4,.*( 13K58*(
,.8/,/K/,3.8(25*(433O,.6( 2/( 4*25.,.6(2.247/,18(N,/>(2.(,./*5*8/(,.(,R?53Q,.6(5*/*./,3.(2.-(-*15*28,.6(/>*(
/3/24( -53?3K/( 52/*( !E42-*( _( X24?,.9( "#$"&'( [3N*Q*59( */>,124(,88K*8( *R*56*( N>,4*( 2??47,.6( 4*25.,.6(
2.247/,18(,.(*-K12/,3.24( -2/2(8*/8( !X5*44*5( _( )521>84*59( "#$"&'( M/( />*( 0,58/( D./*5.2/,3.24(P3.0*5*.1*(3.(
L*25.,.6(M.247/,18(2.-(`.3N4*-6*(!LM`(a$$&9(>*4-(,.(U2.009(M4S*5/29(P2.2-2(,.("#$$9(?25/,1,?2./8(265**-(
/>2/( 4*25.,.6( 2.247/,18(52,8*8( ,88K*8( 5*4*Q2./( /3( */>,18( 2.-( ?5,Q217( 2.-( b,/( 13K4-( S*( 13.8/5K*-( 28(
*2Q*8-53??,.6c(!U53N.9("#$$&'(I>*(R288,Q*(-2/2(1344*1/,3.(2.-(2.2478,8(30(/>*8*(*-K12/,3.24(-2/2(8*/8(
12.( 4*2-( /3( ^K*8/,3.8( 5*42/*-( /3( 3N.*58>,?9( /52.8?25*.179( 2.-( ?5,Q217( 30( -2/2'( I>*8*( ,88K*8( 25*( .3/(
K.,^K*(/3(/>*(*-K12/,3.(8*1/35(3.479(SK/(12.( S*(03K.-(,.(/>*(>KR2.( 5*83K51*(R2.26*R*./(2.-(>*24/>(
8*1/358(!P33?*59( "##:&'( M/( ,/8( O*7(4*Q*49( 4*25.,.6( 2.247/,18(,.Q34Q*8( /521O,.6( 8/K-*./8](8/*?8( ,.(4*25.,.6(
*.Q,53.R*./89( 8K1>( 28( Q,-*38( 30( deeP8( !f21>/4*59( `>24,49( I2526>,( _( WS.*59( "#$%&9( ,.( />*( ,./*5*8/( 30(
,-*./,07,.6( N>3( 25*( />*( 8/K-*./8( b2/( 5,8O9c( 35( /3( >*4?( 8/K-*./8(N,/>( -*1,8,3.8(2S3K/( />*,5( 0K/K5*8'(
F*Q*5/>*4*889( /521O,.6( ,./*521/,3.8( 30( 8/K-*./8( 13K4-( K.Q*,4( 15,/,124( ,88K*8( 5*625-,.6(/>*,5( ?5,Q217(2.-(
/>*,5(,-*./,/,*8(!U37-9("##=&'(
(
W/>,124( ,88K*8( 035( 4*25.,.6( 2.247/,18(0244( ,./3( -,00*5*./( 12/*635,*8'( f*( R2,.47( 8KRR25,Y*( />*R( 28( />*(
03443N,.6( !`>24,4( _( WS.*59( "#$HS&@( $&( /52.8?25*.17( 30( -2/2( 1344*1/,3.9( K826* 9(2 .-( ,.Q34Q*R*./( 30( />,5-(
?25/,*8g("&(2.3.7R,Y2/,3.(2.-(-*+,-*./,0,12/,3.(30(,.-,Q,-K248g(<&(3N.*58>,?(30(-2/2g(h&(-2/2(211*88,S,4,/7(
2.-(211K5217(30(/>*(2.247Y*-(5*8K4/8g(H&(8*1K5,/7(30(/>*(*B2R,.*-(-2/2(8*/8(2.-(8/K-*./(5*135-8(053R(2.7(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
437(
/>5*2/'( I>*8*( 15,/*5,2(?3,./( /3(/>*( N,-*47(S28*-(8*1K5,/7( R3-*4( PDM9( N>,1>( 8/2.-8( 035( P3.0,-*./,24,/79(
D./*65,/7(053R(24/*52/,3.9(2.-(MQ2,42S,4,/7(035(2K/>35,Y*-(?25/,*8'(
I>*( 4*25.,.6( 2.247/,18(13RRK.,/7(.**-8( /3( -*24( 125*0K447( N,/>( />*( ?3/*./,24( ?5,Q217( ,88K*8( N>,4*(
2.247Y,.6(8/K-*./(-2/2'(W-K12/,3.24(-2/2(2.2478,8(/*1>.,^K*8(12.(5*Q*24(?*583.24(,.035R2/,3.9(2//,/K-*89(
2.-(21/,Q,/,*8(5*42/*-(/3(4*25.*58( !U,*.O3N8O,9(i*.69( _( d*2.89("#$"&'( [3N*Q*59(/>*5*( >28(S**.(4,R,/*-(
5*8*251>9(2.-(/>*5*( 25*( 8/,44(.KR*53K8(K.2.8N*5*-( ^K*8/,3.8( 5*42/*-(/3(?5,Q2179(?*583.24(,.035R2/,3.9(
2.-(3/>*5(*/>,124(,88K*8(,.(/>*(13./*B/(30(4*25.,.6(2.247/,18(!U,*.O3N8O,9(i*.69(_(d*2.89("#$"g(X5*44*5(_(
)521>84*59("#$"g(E42-*(_(X24?,.9("#$"g(E42-*(_(j5,.84339("#$<&'(i35(*B2R?4*9(83R*(*-K12/358(142,R(/>2/(
*-K12/,3.24( ,.8/,/K/,3.8( 25*( K8,.6( 2??4,12/,3.8( />2/( 1344*1/( 8*.8,/,Q*( -2/2( 2S3K/( 8/K-*./8( N,/>3K/(
8K00,1,*./47(5*8?*1/,.6( -2/2( ?5,Q217( 2.-( >3N( />*(-2/2(N,44(*Q*./K2447( S*(K8*-(!E,.6*59( "#$h&'( I>K89( -2/2(
-*652-2/,3.(!M.1,2KB( */( 24'9("##=&9( -*+,-*./,0,12/,3.(R*/>3-89(35(-*4*/,3.(30(8?*1,0,1(-2/2( 5*135-89(R27(
S*( 5*^K,5*-( 28( 2( 834K/,3.( /3( ?5*8*5Q*( 4*25.*58]( ,.035R2/,3.'( D.( />,8( ?2?*59( N*( N,44( R2,.47( 031K8( 3K5(
-,81K88,3.( 3.( />*( -*+,-*./,0,12/,3.( ?531*88( ,.( />*( 4*25.,.6( 2.247/,18(2/R38?>*5*( 2.-( 20035-( 2( 0,58/(
?53/3/7?*(13.1*?/K24( 2??5321>( />2/( 13RS,.*8( 4*25.,.6(*.Q,53.R*./9(-*+,-*./,0,12/,3.( /*1>.,^K*89( 2.-(
4*25.,.6(2.247/,18'(
(
I>*(?2?*5(,8(3562.,Y*-(28(03443N8@(E*1/,3.("(13Q*58(/>*(-*+,-*./,0,12/,3.(,.(6*.*524(2.-(/>*(1K55*./(42N8(
28831,2/*-(N,/>(*-K12/,3.9(28(N*44(28(/>*(-5,Q*58(4,.O*-(N,/>(4*25.,.6(2.247/,18'(D.(E*1/,3.(<9(N*(?53?38*(
/>*( -*+,-*./,0,12/,3.;4*25.,.6 ( 2.247/,18(2??5321>'( I>*( 428/( 8*1/,3.( -,81K88*8(/>*(4,R,/2/,3 .8( 30( />*( -*+
,-*./,0,12/,3.(?531*88(,.(4*25.,.6(2.247/,18'(
(
2 BACKGROUND
2.1 Personal Information and De-Identification
j*583.24( ,.035R2/,3.( ,8( 2.7( ,.035R2/,3.( />2/( 12.( ,-*./,07( 2.( ,.-,Q,-K24'( D.( 0,*4-8( 8K1>( 28( />*( >*24/>(
8*1/359( ,/( ,8( .2R*-( j*583.24( [*24/>( D.035R2/,3.(35(j[D'( f>,4*( ,.( 3/>*5( 0,*4-89( 8K1>( 28( />*( *-K12/,3.(
8*1/359( />,8( ,.035R2/,3.( ,8( .2R*-( j*583.24( D-*./,0,2S4*( D.035R2/,3.(35( jDD'( I>*( F2/,3.24( D.8/,/K/*( 30(
E/2.-25-8(2.-(I*1>.34367(!FDEI&(-*0,.*8(jDD(28(b2.7(,.035R2/,3.(2S3K/(2.( ,.-,Q,-K24(R2,./2,.*-( S7(2.(
26*.179( ,.14K-,.6( $&( 2.7( ,.035R2/,3.(/>2/( 12.( S*( K8*-( /3( -,8/,.6K,8>( 35( /521*( 2.( ,.-,Q,-K24]8( ,-*./,/79(
8K1>( 28( .2R*9( 831,24( 8*1K5,/7( .KRS*59( -2/*( 2.-( ?421*( 30( S,5/>9( R3/>*5]8( R2,-*.( .2R*9( 35( S,3R*/5,1(
5*135-8g( 2.-( "&( 2.7( 3/>*5( ,.035R2/,3.( />2/( ,8( 4,.O*-( 35( 4,.O2S4*( /3( 2.( ,.-,Q,-K249( 8K1>( 28( R*-,1249(
*-K12/,3.249( 0,.2.1,249( 2.-( *R?437R*./( ,.035R2/,3.c( !d1P244,8/*59( X52.1*9( _( E12503.*9( "#$#&'( I>*(
?*583.24(,.035R2/,3.(30(4*25.*58(12.(S*(12/*635,Y*-(,./3(-*/2,48(8K1>(28(.2R*9(8*B9(?>3/3652?>9(-2/*(30(
S,5/>9( 26*9( 2--5*889( 5*4,6,3.9( R25,/24( 8/2/K89( *+R2,4( 2--5*889( ,.8K52.1*( .KRS*59(*/>.,1,/79( */( 1*/*529( 35(
*-K12/,3.24(-*/2,48(8K1>(28(^K24,0,12/,3.89(13K58*8(2//*.-*-9(-*65**89(2.-(8/K-7(5*135-8'(M8(2(15,/*5,3.9(2(
4*2O( 30( ,.-,Q,-K248]( ?*583.24( ,.035R2/,3.( 12.( ,.-K1*( R,8K8*( 30( -2/29( *RS255288R*./9( 2.-( 4388( 30(
5*?K/2/,3.'( [3N*Q*59( 3562.,Y2/,3.8( R27( S*( 5*^K,5*-( /3( ?KS4,8>( -*/2,48(*B/521/*-( 053R( ?*583.24(
,.035R2/,3.'(i35(,.8/2.1*9(83R*(*-K12/,3.24(,.8/,/K/,3.8(25*(5*^K,5*-(/3(?53Q,-*(8/2/,8/,18(2S3K/(8/K-*./(
?5365*88g( 4,O*N,8*9( >*24/>( 3562.,Y2/,3.8( R27( .**-( /3( 5*?35/( 8?*1,24( 128*8( 053R(/>*,5( ?2/,*./( 5*135-89(
8K1>( 28( 13RRK.,12S4*( -,8*28*8'( M8( 2( 5*8K4/9( -*+,-*./,0,12/,3.( >*4?8( 3562.,Y2/,3.8( /3( ?53/*1/( ?5,Q217(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
434(
N>,4*( 8/,44( ,.035R,.6(/>*( ?KS4,1'( I>*( -*+,-*./,0,12/,3.( ?531*88( ,8( K8*-( /3( ?5*Q*./( 5*Q*24,.6( ,.-,Q,-K24(
,-*./,/7(2.-(O**?,.6(/>*(jDD(13.0,-*./,24'(
D.(4*25.,.6(2.247/,189( ,/( ,8( 13RR3.(035(8/2O*>34-*58(/3(5*^K*8/( 2--,/,3.24(,.035R2/,3.(2S3K/(/>*( 5*8K4/8(
*B/521/*-( 053R( *-K12/,3.24( -2/2(8*/8'( W-K12/,3.24( -2/2( R,.,.6( 2.-( 4*25.,.6( 2.247/,18( R2,.47( 2,R( /3(
*.>2.1*(/>*( 4*25.,.6(*.Q,53.R*./( 2.-( *R?3N*5( 4*25.*58(2.-( ,.8/5K1/358(!X5*44*5( _( )521>84*59("#$"&'(
I>*5*035*9(/>*(2.2478,8(30(/>*8*(-2/2(R27(>2Q*(,./*5*8/,.6(/5*.-8(/>2/(13K4-(4*2-(/3(0K5/>*5(2.-(-**?*5(
2.2478,8(S7( 3/>*5( ,.8/,/K/,3.8(35(5*8*251>*58'(\*^K*8/8(035(R35*(*B/*.8,Q*(2.2478,8(R27(,.Q34Q*(/>*( K8*(
30(8/K-*./+4*Q*4( -2/2'(M1135-,.6479(*/>,124( ,88K*8( 25,8*9( 8K1>(28( ?5,Q217( -,81438K5*9(2.-( />*(.**-( /3(-*+
,-*./,07(/>*(-2/2(S*13R*8(?252R3K./'(
(
\*1*./479( [25Q25-( 2.-( dDI( K.,Q*58,/,*8( 5*4*28*-( -*+,-*./,0,*-(-2/2( 053R( $%( 13K58*8( 300*5*-( ,.( "#$";
"#$<(053R(/>*,5(N*44+O.3N.(*-k(d288,Q*(e?*.( e.4,.*( P3K58*(!deeP&(!dDI(F*N89( "#$h&'(I>*([25Q25-(
2.-( dDI( *-k( *.8K5*8( />2/( />*( 2.3.7R,/7( 30( />*( 5*4*28*-( -2/2( 13R?4,*8(N,/>( />*( i2R,47( W-K12/,3.24(
\,6>/8(2.-(j5,Q217(M1/(!iW\jM&'$(iK5/>*5R35*9(j5,.8433(2.-(E42-*(!"#$H&(8K66*8/*-(-,00*5*./(2??5321>*8(
/>2/(,.035R(8/K-*./8(,.(>,6>*5(*-K12/,3.(30(/>*(,R?4,12/,3.8(30(4*25.,.6(2.247/,18(3.(/>*,5(?5,Q2/*(-2/2'(
(
2.2 De-Identification Legislation
)*+,-*./,0,12/,3.( 30( 8/K-*./( 5*135-8(>28( S**.( 5*6K42/*-(,.(/>*( T.,/*-(E/2/*8(2.-( />*(WK53?*2.(T.,3.'(
I>*(T.,/*-(E/2/*8(2-3?/*-(iW\jM(5*625-,.6(/>*(?5,Q217(30(8/K-*./(*-K12/,3.24(5*135-8'(D.(/>*(WK53?*2.(
T.,3.9(/>*()2/2(j53/*1/,3.(),5*1/,Q*(!)j)g(:HAh%AWP"&(5*6K42/*8(/>*(?531*88,.6(30(?*583.24(-2/2(2.-(/>*(
R3Q*R*./(30(8K1>(,.035R2/,3.'(iW\jM(l::'<$!S&(-*248(N,/>(/>*(-*+,-*./,0,12/,3.(30(-2/2( 5K4*'( D/( 14*2547(
8/2/*8(/>2/(,.8/,/K/,3.8(bR27(5*4*28*9(N,/>3K/(13.8*./9(*-K12/,3.(5*135-89(35(,.035R2/,3.(053R(*-K12/,3.(
5*135-89(/>2/(>28(S**.(-*+,-*./,0,*-(/>53K6>(/>*(5*R3Q24(30(244(j*583.2447(D-*./,0,2S4*(D.035R2/,3.(!jDD&'c(
I>,8(8*1/,3.(30(iW\jM(5*^K,5*8(,.8/,/K/,3.8(/3(K8*(5*283.2S4*(R*/>3-8( /3(,-*./,07(/>*(3/>*5(?25/,*8(N>3(
-,81438*( *-K12/,3.( 5*135-8'( e.( />*( 3/>*5(>2.-9( />*( R38/( *B?4,1,/( 1,/2/,3.( 30( -*+,-*./,0,12/,3.( ,.( />*(
WK53?*2.()j)(,8(M5/,14*("%(3.(2.3.7R,Y2/,3.9(,.(N>,1>(b?5,.1,?4*8(30(-2/2(?53/*1/,3.(8>244(.3/(2??47(/3(
-2/2( 5*.-*5*-( 2.3.7R3K8( ,.( 8K1>( 2( N27( />2/( />*( -2/2( 8KSC*1/( ,8( .3( 43.6*5( ,-*./,0,2S4*'c(d35*3Q*59(
?25/,*8( 25*( *.13K526*-( /3( K8*( -*+,-*./,0,12/,3.( /*1>.,^K*8( /3( 5*.-*5( ,-*./,0,12/,3.( 30( -2/2( 8KSC*1/8(
,R?388,S4*'( D/( ,8( .3/( 3SQ,3K89( >3N*Q*59(N>2/( 4*Q*4( 30( -*+,-*./,0,12/,3.( ,8( 5*^K,5*-( /3( 2.3.7R,Y*(
*-K12/,3.(5*135-8(K.-*5(WK53?*2.(42N'([3N*Q*59(/>*(M5/,14*(":()2/2(j53/*1/,3.(f35O,.6(j25/7(>28(2.(
3?,.,3.(3.(/>*(,-*./,0,12/,3.(30(-2/2@(be.1*(2(-2/2(8*/(,8(/5K47(2.3.7R,Y*-(2.-(,.-,Q,-K248(25*(.3(43.6*5(
,-*./,0,2S4*9(WK53?*2.(-2/2(?53/*1/,3.(42N(.3(43.6*5(2??4,*8c(!"#$h9(?'(H&'(
(
2.3 Drivers of De-Identification in Learning Analytics
M(8/K-7( S7(j*/*583.( !"#$"&9( 2--5*88*-(/>*( .**-(/3( -*+,-*./,07( -2/2( K8*-( ,. ( 212-*R,1( 2.2478,8( S*035*(
R2O,.6(,/(2Q2,42S4*(/3( ,.8/,/K/,3.89(/3( SK8,.*88*89(35(035( 3?*52/,3.24( 0K.1/,3.8'( j*/*583.( !"#$"&(?3,./*-(
1 >//?@AANNN"'*-'63QA?34,17A6*.A6K,-A0?13A0*5?2A,.-*B'>/R4(!428/(211*88(J2.K257("#$H&(
"(>//?@AA*K5+4*B'*K53?2'*KAL*BT5,E*5QAL*BT5,E*5Q'-3mK5,n(PWLWk@<$::HL##h%@WF@[IdL!428/(211*88(J2.K257("#$H&(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
435(
/3( />*( ,-*2( 30( O**?,.6( 2( K.,^K*( ,-*./,0,*5( ,.( 1 28*( 2( 5*8*251>*5( R27( .**-( /3( 8/K-7( />*( S*>2Q,3K5( 30( 2(
?25/,1K425(,.-,Q,-K24'(E42-*( 2.-( j5,.8433( !"#$<&9( >3N*Q*59(-5*N( 2//*./,3.( /3( />*( 2RS,6K,/7( 30( -2/2(
R,.,.6( /*1>.,^K*8( ,.( R3.,/35,.6( 8/K-*./( S*>2Q,3K5( ,.( *-K12/,3.24( 8*//,.68'( I>*( 2K/>358( 4,.O*-( -*+
,-*./,0,12/,3.(N,/>( 13.8*./(2.-(?5,Q217( 2.-(8/5*88*-(/>*(.**-(/3( 6K252./**(8/K-*./(2.3.7R,/7( ,.( />*,5(
*-K12/,3.( 5*135-8(,.( 35-*5( /3( 21>,*Q*( 4*25.,.6( 2.247/,18(3SC*1/,Q*8( 8K1>( 28( ,./*5Q*./,3.8( S28*-( 3.(
8/K-*./( 1>2521/*5,8/,18'( M.( *B2R?4*( 30( />*( 4,.O(S*/N**.( 13.8*./( 2.-( -*+,-*./,0,12/,3.(N3K4-( S*(2(
^K*8/,3..2,5*( 35( 8K5Q*7( />2/(/>38*( 0,44,.6( ,/( 3K/(25*( /34-( N,44( S*( K8*-( 035( 5*8*251>( 3.47'( D.( />2/( 128*9(
14*2547( />*(4,R,/2/,3.( 30( K8,.6( />*,5( -2/2( N,44( S*( CK8/( />*( 3.*( 8/K-7'( D0( />*( 8K5Q*7( ,.14K-*8( ?*583.24(
,.035R2/,3.9(>3N*Q*59(/>*.(288K52.1*8(30(2.3.7R,Y,.6(/>*,5(-2/2(8>3K4-(S*(13.8,-*5*-'(
(
\72.( U2O*5( !"#$<&( -,81K88*-( />*( -*R2.-8( 30( -*+,-*./,07,.6( *-K12/,3.24( -2/2(8*/8( ,.( >,8( bL*25.,.69(
E1>334,.69(2.-()2/2(M.247/,18c(1>2?/*5(,.(/>*(8&%9:"";("%(<%%"=&0,"%2(,%(*+&$%,%-()"$(>0&0+2?(@,20$,102?(
&%9( >1A""'2'( )*+,-*./,0,12/,3.( 30( />*8*( -2/2(8*/8( R*2.8( S*,.6( 2S4*( /3(8>25*( />*R( 2R3.6( 3/>*5(
5*8*251>*58(N,/>3K/(Q,342/,.6(iW\jM(5*6K42/,3.8'(U2O*5(8/5*88*-(/>2/(*-K12/,3.24(?34,1,*8(8>3K4-(,.14K-*(
5K4*8( 035( 2.3.7R,Y,.6( -2/2( ,.( 35-*5( /3( ?5*Q*./( ,-*./,0,2S4*( ,.035R2/,3.( 053R( S*,.6( 4*2O*-( N,/>3K/(
2K/>35,Y2/,3.'(iK5/>*5R35*9()521>84*5(2.-(X5*44*5(13Q*5*-(/>*(/3?,1(30(2.3.7R,Y2/,3.(,.(/>*,5()WLDPMIW(
2??5321>(!)521>84*5(_( X5*44*59("#$%&'( M( b8/5,1/47( 6K25-*-( O*7c(8>3K4-( S*( >*4-(83( />2/( 5*8*251>*58( R27(
4,.O(/>*,5(5*8K4/8(053R(4*25.,.6(2.247/,18(2.-(*-K12/,3.24(-2/2(R,.,.6(N,/>(,.-,Q,-K24(8/K-*./8(,.(35-*5(/3(
S*.*0,/(/>*(8/K-*./8'( )*+,-*./,0,12/,3.( /*1>.,^K*8( >2Q*( S**.( 5*Q,*N*-(28( 2(5,6>/( 30( 211*88( ?5,.1,?4*(,.(
4*25.,.6(2.247/,18(-*?437R*./(!j25-3(_(E,*R*.89("#$h&'(D.(2--,/,3.9(j25-3(2.-(E,*R*.8(0K5/>*5(8K66*8/(
/>2/(8*R2./,1(2.2478,8(R,6>/(S*(5*^K,5*-(/3(-*/*1/(,-*./,0,2S4*(5*135-8(,.(2.3.7R,Y*-(-2/2(8*/8'(
(
3 PROPOSED APPROACH
'
D.( />,8( 8*1/,3.9( N*( ?53?38*( 2( 13.1*?/K24( -*+,-*./,0,12/,3.;4*25.,.6( 2.247/,18(052R*N35O( 28( 8>3N.( ,.(
i,6K5*( $'( I>*( 052R*N35O( S*6,.8(N,/>( 4*25.*58( ,.Q34Q*-( ,.( 4*25.,.6( *.Q,53.R*./8'( PK55*./479( 2( 4256*(
.KRS*5(30( 4*25.,.6( *.Q,53.R*./8( 8K??35/( 3.4,.*( 4*25.,.69( 8K1>( 28( deePE9( L*25.,.6( d2.26*R*./(
E78/*R8( !LdE&9( DRR*58,Q*( L*25.,.6( E,RK42/,3.8( !DLE&9( R3S,4*( 4*25.,.69( 2.-( j*583.24,Y*-( L*25.,.6(
W.Q,53.R*./8( !jLW&'( I>*8*( ?42/035R8( 300*5( *.Q,53.R*./8( N,/>( 5,1>9( Q28/( 2R3K./8( 30( -2/2( />2/( 12.( S*(
^K2./,/2/,Q*47A^K24,/2/,Q*47(2.247Y*-(/3(S*.*0,/(4*25.*58(2.-(*.>2.1*(/>*(4*25.,.6(13./*B/'(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
433(
'
Figure'1:'The'proposed'conceptual'de-identificationlearning'analytics'framework'
I>*( .*B/( 8/*?( ,8( />*( -*+,-*./,0,12/,3.( ?531*88( N>*5*( /*1>.,^K*8( /3( 13.Q*5/( ?*583.24( 2.-( ?5,Q2/*(
,.035R2/,3.( ,./3( 2.3.7R,Y*-( -2/2(/2O*( ?421*'( )*+,-*./,0,12/,3.( /*1>.,^K*8( ,.14K-*( 8K1>(R*/>3-8(28(
2.3.7R,Y2/,3.9(R28O,.69(S4K55,.69(2.-(?*5/K5S2/,3.'(I>*(428/(8/*?(,.14K-*8(/>*(-*+,-*./,0,*-(-2/2(4,.O*-(
N,/>( 2( K.,^K*( -*815,?/35( />2/( R27(S*(*B2R,.*-( S7( 4*25.,.6( 2.247/,18(5*8*251>*58(2.-( S*.*0,/(
8/2O*>34-*589(SK/(K4/,R2/*47(RK8/(S*(K8*-(3.47(/3(/>*(2-Q2./26*(30(8/K-*./8'(
'
3.1 De-Identification Techniques
D.(3K5(?53?38*-(-*+,-*./,0,12/,3.;4*25.,.6(2.247/,18(13.1*?/K24(052R*N35O9(/>*5*(25*(8*Q*524(/*1>.,^K*8(
2Q2,42S4*( /3( -*+,-*./,07(8/K-*./( -2/2( 5*135-8'( i,6K5*( <(4,8/8( 8*Q*524(R*/>3-8( 30( -*+,-*./,0,12/,3.(2.-(
?53Q,-*8( *B2R?4*8( !S28*-( 3.( M5/,14*( ":( )2/2( j53/*1/,3.( f35O,.6(j25/79( "#$hg( P35R3-*( _( E5,Q28/2Q29(
"##:g(WK538/2/9($::%g(j*/*58*.9("#$"&'(
(
.%"%/B,C&0,"%(
)2/2(2.3.7R,Y2/,3.(/*1>.,^K*8( >2Q*( 5*1*./47( S**.( O**.47( 5*8*251>*-(,.( -,00*5*./( 8/5K1/K5*-( -2/2(
5*135-8(N,/>(/>*(6324(30(6K252./**,.6(/>*(?5,Q217(30(8*.8,/,Q*(,.035R2/,3.(262,.8/(K.,./*.-*-(-,81438K5*(
2.-( 2( Q25,*/7( 30( 2//21O8(!P35R3-*( _( E5,Q28/2Q29( "##:&'( e>R( !"#$#&( -*0,.*-( 5*283.8( S*>,.-(
2.3.7R,Y2/,3.(N>*.(3562.,Y2/,3.8(N2./( /3( 5*4*28*(/>*(-2/2(/3(/>*(?KS4,19(8*44(/>*( ,.035R2/,3.( /3( />,5-(
?25/,*89(35(8>25*(/>*(,.035R2/,3.(N,/>,.(/>*(82R*( 3562.,Y2/,3.'(I>*(-,00*5*.1*(S*/N**.(2.3.7R,Y2/,3.(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43D(
2.-(-*+,-*./,0,12/,3.9(>3N*Q*59(,8(^K,/*(R,8K.-*58/33-'(M.3.7R,Y2/,3.(?5,.1,?4*8(25*(2(8KS8*/(30(>34,8/,1(
-*+,-*./,0,12/,3.( R*/>3-3436,*8'( )2/2( 2.3.7R,Y2/,3.( ,8( />*( ?531*88( 30( -*+,-*./,07,.6( -2/2( N>,4*(
?5*8*5Q,.6(,/8(35,6,.24(035R2/(!\26>K.2/>2.9("#$<&'(D.(/>*(*-K12/,3.24(13./*B/9(2.3.7R,Y2/,3.(5*0*58(/3(
-,00*5*./( ?531*-K5*8( /3(-*+,-*./,07( 8/K-*./( -2/2( ,.( 8K1>( 2( N27( />2/(,/( 12..3/( S*( 5*+,-*./,0,*-( !/>*(
3??38,/*( 30( -*+,-*./,0,12/,3.&( K.4*88( />*5*( ,8( 2( 5*135-( 13-*'( M.3.7R,Y2/,3.( ,8( .3/( 5*8*5Q*-( 3.47( 035(
/2SK425(-2/2(5*135-89(SK/(12.(2483(S*(2??4,*-(/3(3/>*5(/7?*8(30(-2/2(o(8K1>(28(Q,8K24,Y*-(-2/2(35(652?>8(
o(N>*5*(,.8/,/K/,3.8(,./*.-(/3(?5*8*./(/>*,5(3K/13R*8(N,/>3K/(5*Q*24,.6(8*.8,/,Q*(,.035R2/,3.'(
(
e.( />*( 3/>*5( >2.-9( ,.( 2--,/,3.( /3( 2.3.7R,Y2/,3.9( -*+,-*./,0,12/,3.( ,.14K-*8(R28O,.69( 52.-3R,Y2/,3.9(
S4K55,.69( 2.-( 83( 3.'( i35( ,.8/2.1*9( 5*?421,.6( bU*5.25-c( N,/>( bpppppppc( ,8( 2( R*/>3-( 30( R28O,.6(N>,4*(
24/*5,.6( bU*5.25-c( /3(bf34062.6c( N3K4-( S*( 2.( *B2R?4*( 30( 2.3.7R,Y2/,3.'( [3N*Q*59( R28O,.6( 2.-(
S4K55,.6(25*( .3/( 28( N*44(O.3N.( 28( 2.3.7R,Y2/,3.'(U7( 2.7( R*2.89( -*+,-*./,0,12/,3.9( ?8* K-3.7R,Y2/,3.9(
2.-(2.3.7R,Y2/,3.(25*(,./*51>2.6*2S4*(/3?,18(K.-*5(/>*(,.035R2/,3.(13.1*24,.6(KRS5*442'(I3(1425,07(/>*(
-,00*5*.1*8(,.(8,R?4*(/*5R89( ?8*K-3.7R,Y2/,3.( R*2.8(1432O,.6(/>*(35,6,.24( -2/2(N,/>(0248*(,.035R2/,3.(
N,/>(/>*(2S,4,/7(/3(/521O(,/(S21O(/3(,/8(35,6,.24(035R2/,3.g(2.3.7R,Y2/,3.9(13.Q*58*479(12..3/(S*(5*Q*58*-((
!\26>K.2/>2.9("#$<&'(
(
M8(?5*Q,3K847(R*./,3.*-9(*-K12/,3.24(-2/2( 5*135-8( R27( ,.14K-*( ?5,Q2/*(,.035R2/,3.9( 8K1>( 28( .2R*( 35(
8/K-*./(D)9( N>,1>(8,.6K42547( 25*(1244*-( -,5*1/(,-*./,0,*58'(\*R3Q,.6( 35(>,-,.6( />*8*( ,-*./,0,*58(-3*8(.3/(
288K5*(2(/5K*(-2/2( 2.3.7R,Y2/,3.'( D-*./,0,*58( 13K4-( S*( 4,.O*-(N,/>( 3/>*5(,.035R2/,3.(/>2/(N3K4-(2443N(
,-*./,0,12/,3.( 30( ,.-,Q,-K248(!8**( i,6K5*("&'([3N*Q*59(^K28,+,-*./,0,*58( 12.( S*(K8*-( /3( *.8K5*(S*//*5( -*+
,-*./,0,12/,3.( 30( -2/2'( b)2 /*( 30( U,5/>( q( E*B( q( F2R*c( ,8( 2.( *B2R?4*( 30( 2( ^K28,+,-*. /,0,*5'( D.( "##%9( MeL(
5*4*28*-(/>*(8*251>(5*135-8(30(H##9###(30(,/8(K8*58'(E*Q*524(-278(20/*5(MeL]8(-2/2S28*(5*4*28*9(E+F(G"$;(
H,B+2(C3K5.24,8/8(N*5*(2S4*(/3(5*Q*24(/>*(,-*./,/7(30(2(%"+7*25+34-(N,-3N(K8,.6(2(8,R,425(?531*88(/3(/>2/(
8>3N.(,.(i,6K5*("(!E36>3,2.9("##G&'(MeL(2-R,//*-(/>2/(/>*(-2/2(5*4*28*(N28(2(R,8/2O*(2.-(/>*(5*8*251>(
/*2R(5*8?3.8,S4*(035(8>25,.6(/>*(-2/2(N28(0,5*-'(
(
(
Figure'2:'Linking'data'sources'leads'to'name'identification'
(
M.3/>*5(*B2R?4*(30(,-*./,07,.6( ,.-,Q,-K248(N28(5*?35/*-(,.("###(N>*.(-*R3652?>,1(,.035R2/,3.(4*-(/3(
5*/5,*Q,.6(/>*(.2R*8(2.-(13./21/(,.035R2/,3.(30(?2/,*./8(N>38*(R*-,124(-2/2(>2-(S**.(5*4*28*-(,.(/>*(
T.,/*-(E/2/*8(!EN**.*79("###&'((
(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43I(
E2R252/,( 2.-( EN**.*7(!$::=&( ?53Q,-*-( 2(N*44+O.3N.( 2.3.7R,Y2/,3.( /*1>.,^K*9( .2R*47( ;+
2.3.7R,Y2/,3.'( I>,8( R*/>3-( 2--5*88*8( />*( ?53S4*R( 30( 4,.O,.6( 5*135-8( /3( ,-*./,07( />*( ,.-,Q,-K24]8(
,.035R2/,3.(N>*.(5*4*28,.6( -2/29( />K8(820*6K25-,.6(2.3.7R,/7'(I>*( ;+2.3.7R,/7( /*1>.,^K*(031K8*8(3.(
2Q3,-,.6(2(-2/2(5*135-(053R(S*,.6(,-*./,0,*-(N,/>(;(,.-,Q,-K248(!P35R3-*(_(E5,Q28/2Q29("##:&'(
(
(
Figure'3:'Examples'of'de-identification'techniques'
J&2;,%-(
d28O,.6( ,8( 2( -*+,-*./,0,12/,3.( /*1>.,^K*( />2/( 5*?421*8( 8*.8,/,Q*( -2/2( N,/>( 0,1/,3.24(-2/2( ,.( 35-*5(/3(
-,81438*(5*8K4/8( 3K/8,-*(/>*( ,.8/,/K/,3.'()2/2( R28O,.6(12.( R3-,07(/>*( -2/2( 5*135-8(83( />2/( />*7(5*R2,.(
K82S4*( N>,4*(O**?,.6( ?*583.24( ,.035R2/,3.( 13.0,-*./,24'( i35( ,.8/2.1*9( 1>2521/*5( R28O,.6( 5*?421*8( 2(
8/5,.6(N,/>(8?*1,24(1>2521/*58'(
(
K'#$$,%-(
U4K55,.6( ,.Q34Q*8( 5*-K1,.6( ?5*1,8,3.( /3( R,.,R,Y*( />*( ,-*./,0,12/,3.( 30( -2 /2'( I>*5*( 25*( 8*Q*524( N278( /3(
21>,*Q*( S4K55,.69( 8K1>( 28( -,Q,-,.6( />*( -2/2( ,./3( 8KS12/*635,*89(52.-3R,Y,.6(/>*(-2/2( 0,*4-89( 35( 2--,.6(
.3,8*(/3(-2/2(5*135-8'(
(
3.2 Coding Data Records
(
D.( 81,*./,0,1( 5*8*251>9( -2/2 (K8K2447( 5*^K,5*8(0K5/>*5( ,.Q*8/,62/,3.( N,/>(5*8*251>*58 ( 433O,.6(-**?*5( ,./3(
/>*(-*/2,48'([2Q,.6(-*+,-*./,0,*-(-2/2(R,6>/(S*(,.8K00,1,*./(035(/>*8*(?K5?38*8g(5*8*251>*58(R27(5*^K,5*(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43L(
2--,/,3.24(,.035R2/,3.( ,.(35-*5(/3( -3( R35*(2.2478,8'(I>*( MR*5,12.(0*-*524([*24/>( D.8K52.1*(j35/2S,4,/7(
2.-(M113K./2S,4,/7(M1/(![DjMM&9(N>,1>(,8(5*8?3.8,S4*(035(?53/*1/,.6(/>*(13.0,-*./,24,/7(30(?2/,*./(5*135-89(
2K/>35,Y*8( K8,.6(2.( b288,6.*-( 13-*c( />2/( 12.( S*( 2??*.-*-( /3( />*( 5*135-8(,.( 35-*5( /3( ?*5R,/( />*(
,.035R2/,3.(/3(S*(5*+,-*./,0,*-( 035( 5*8*251>(?K5?38*8'3(U28*-(3.(/>2/([DjMM(5K4*9(N*(03K.-(/>2/(iW\jM(
::'<$!S&(2443N8(035(K8,.6(2(K.,^K*(-*815,?/35(035(8/K-*./(-2/2(5*135-8(,.(35-*5(/3( R2/1>(2.(,.-,Q,-K24]8(
,.035R2/,3.(035(5*8*251>(2.-(,.8/,/K/,3.24(K8*'(M1135-,.6479(N*(13.14K-*(/>2/(288,6.,.6(2(13-*(/3(8/K-*./(
5*135-8( ,.( 3K5( ?53?38*-( 052R*N35O( 12.( 652./( 4*25.,.6( 2.247/,18(5*8*251>*58( />*( 2S,4,/7( /3( 8/K-7(
S*>2Q,3K58( 30( 8?*1,0,1( 8/K-*./8( 2.-9( />*5*035*9( 12.( S*.*0,/( 4*25.*58'()*8?,/*( />*( 021/( />2/( 4*25.,.6(
2.247/,18(?38*8(*/>,124(1>244*.6*89(/>*(R2,.( 6324( ,8( 8/,44(/3( S*.*0,/(4*25.,.6(*.Q,53.R*./8( 2.-(8/K-*./89(
8K1>( 28( R2O,.6( 5*13RR*.-2/,3.89( 14288,07,.6( 8/K-*./8( ,./3(?530,4*8(35( ?5*-,1/,.6( />*,5(?*5035R2.1*(
!WS.*5(_(E1>r.9( "#$<g( X5*44*5(_()521>84*59("#$"g( E42-*( _( j5,.84339("#$<g( `>24,4( _( WS.*59( "#$H2g( `>24,49(
`28/4(_(WS.*59("#$%&'(
'
4 LIMITATIONS
)*8?,/*( />*( 021/( />2/( -*+,-*./,0,12/,3.( ?53/*1/8( 13.0,-*./,24(,.035R 2/,3.( 2.-( ?5,Q2179( />*( -*+,-*./,0,*-(
-2/2( 8/,44( ?38*8(83R*( ?5,Q217( 5,8O8( !j*/*58*.9( "#$"&'( D.( R2.7( 128*89( 83R*( 2//5,SK/*8( 25*( 12?2S4*( 30(
,-*./,07,.6( ,.-,Q,-K248g( ,.( 3/>*5( 128*89( 2//21O*58( 12.( 4,.O( 5*135-8( /36*/>*5( 053R ( -,00*5*./( 83K51*8( 2.-(
/>*5*035*(b13-*(S5*2Oc(/>*(-*+,-*./,0,12/,3.'(e.(/>*(3/>*5(>2.-9(,.(/>*,5(?2?*5(bj5,Q2179(M.3.7R,/79(2.-(
U,6( )2/2( ,.( />*( E31,24( E1,*.1*89c()25,*8( */( 24'( !"#$h&(288K5*-( />2/( N,/>( -*+,-*./,0,12/,3.9( />*5*( ,8( .3(
6K252./**( 30(O**?,.6(/>*( 2.2478,8( ?531*88( K.1355K?/*-'( j25-3(2.-( E,*R*.8( 265**(/>2/( b-2/2( 12.( S*(
*,/>*5(K8*0K4(35(?*50*1/47(2.3.7R3K89(SK/(.*Q*5(S3/>c(!"#$h9(?'(hhG&'(I>*(S3//3R(4,.*(,8(/>2/(/>*(8/5,1/*5(
/>*(-*+,-*./,0,12/,3.(6K,-*4,.*89(/>*(65*2/*5(/>*(.*62/,Q*(200*1/(3.(/>*(K4/,R2/*(2.2478,8'((
5 CONCLUSION
E,.1*(4*25.,.6(2.247/,18(0,58/(S*12R*(O.3N.(,.("#$$9(,/(>28(>*4?*-(4*25.*58(/3(,R?53Q*(/>*,5(?*5035R2.1*(
S28*-(3.(2.247Y,.6(/>*,5(*-K12/,3.24(-2/2'( F*Q*5/>*4*889(/>,8(0,*4-(52,8*8(R2.7(,88K*8(5*42/*-(/3(*/>,18(
2.-(3N.*58>,?'(I>*(R288,Q*(8124*(30(-2/2(1344*1/,3.(2.-(2.2478,8(4*2-8( /3( ^K*8/,3.8( 2S3K/(/>*(13.8*./(
2.-( ?5,Q217( 30( ?*583.24( ,.035R2/,3.'( I>,8( ?2?*5( R2,.47( -,81K88*8( 3.*( 30( />*( 2//2,.2S4*(834K/,3.8( 035(
?5*8*5Q,.6(4*25.*58](8*.8,/,Q*(,.035R2/,3.9( />*(b-*+,-*./,0,12/,3.(30(-2/2c( /3( 021,4,/2/*(4*25.,.6(2.247/,18(
2??4,12/,3.8'(f*(8>*-(4,6>/(3.(/>,8(/3?,1(Q,2(TE(2.-(WT(5*6K42/,3.8(5*625-,.6(-2/2(?5,Q217'(f*(?53?38*-(
2( 13.1*?/K24( 2??5321>( N,/>(*B2R?4*8( 30( -*+,-*./,0,12/,3.( /*1>.,^K*8( />2/ ( 288,8/(K8(N,/>(3K5( b,d33kc(
?42/035R(!>//?@AANNN',R33B'2/&(2.-(12.(>*4?(4*25.,.6(2.247/,18(8?*1,24,8/8(?5*8*5Q*(13.0,-*./,24(4*25.*5(
,.035R2/,3.'(
M4/>3K6>(-*+,-*./,0,12/,3.(,8( .3/(2( 0334?5330(834K/,3.( 035(?53/*1/,.6(4*25.*5(?5,Q2179( ,/(,8( 2.( ,R?*52/,Q*(
13.8,-*52/,3.(,.(*B2R,.,.6(/>*(*/>,124(-,R*.8,3.8(30(4*25.,.6(2.247/,18'('
'
'
3 (\K4*(hH(P'i'\'(l($%h'H$h!1&'
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43M(
REFERENCES
(
M5/,14*( ":( )2/2( j53/*1/,3.( f35O,.6( j25/7'( !"#$h&'( e?,.,3.( #HA"#$h( 3.( M.3.7R,82/,3.( I*1>.,^K*8(
!#=":A$hAWF( fj"$%&'( \*/5,*Q*-( 053R( >//?@AA*1'*K53?2'*KACK8/,1*A-2/2+?53/*1/,3.A25/,14*+
":A-31KR*./2/,3.A3?,.,3.+5*13RR*.-2/,3.A0,4*8A"#$hAN?"$%s*.'?-0(
M.1,2KB9(F'9(U3K62.,R9(L'9(t2.([**5-*9(['9(jK1>*5249(j'9(_(M?*589(j'(d'(!"##=&'()2/2(-*652-2/,3.@(d2O,.6(
?5,Q2/*(-2/2(4*88(8*.8,/,Q*(3Q*5(/,R*'(D.(J'(X'(E>2.2>2.9(E'(MR*5+V2>,29(D'(d2.34*81K9(V'(u>2.69()'(
M'( WQ2.89(M'( `341Y9(`'+E'( P>3,9(M'( P>3N->K57(!W-8'&9( N$"1++9,%-2( ")( 4M0A(.OJ( <%0+$%&0,"%&'(
O"%)+$+%1+( "%( <%)"$B&0,"%( &%9( P%"F'+9-+( J&%&-+B+%0( QO<PJ(577RS( !??'($h#$;$h#"&'( F*N(
V35O@(MPd'(>//?@AA-B'-3,'356A$#'$$hHA$hH=#="'$hH=<#$(
U2O*59(\'(E'(J'(-'(!"#$<&'(L*25.,.69(81>334,.69(2.-(-2/2(2.247/,18'(D.(d'(dK5?>79(E'(\*--,.69(_(J'(IN7R2.(
!W-8'&9(8&%9:"";( "%( ,%%"=&0,"%2( ,%( '+&$%,%-( )"$( 20&0+2?( 9,20$,102?( &%9( 21A""'2( !??'($G:;$:#&'(
j>,42-*4?>,29(jM@(P*./*5(3.(D..3Q2/,3.8(,.(L*25.,.69(I*R?4*(T.,Q*58,/7'(
U,*.O3N8O,9( d'9( i*.69( d'9( _( d*2.89( U'( !"#$"&'( T%A&%1,%-( 0+&1A,%-( &%9( '+&$%,%-( 0A$"#-A( +9#1&0,"%&'(
9&0&(B,%,%-(&%9( '+&$%,%-( &%&'/0,12U(.%( ,22#+( :$,+)'( \*/5,*Q*-( 053R(/>*( N*S8,/*( 30( />*(e00,1*(30(
W-K12/,3.24( I*1>.343679( TE( )*?25/R*./( 30( W-K12/,3.9(>//?8@AA/*1>'*-'63QAN?+
13./*./AK?432-8A"#$hA#<A*-R+42+S5,*0'?-0(
U53N.9( d'( !"#$$&'( *+&$%,%-( &%&'/0,12U( HA+( 1"B,%-( 0A,$9( F&=+( !T@VO.V>T( *+&$%,%-( <%,0,&0,=+( K$,+)S'(
\*/5,*Q*-(053R(W)TPMTEW(4,S5257(>//?8@AA.*/'*-K12K8*'*-KA,5A4,S5257A?-0AWLDU$$#$'?-0(
P33?*59( F'( !"##:&'( f35O0351*( -*R3652?>,1( 2.247/,18( 7,*4-( >*24/>+125*( 82Q,.68'( TBW'"/B+%0( X+'&0,"%2(
H"9&/(3L!<&9($<;$='(>//?@AA-B'-3,'356A$#'$##"A*5/'"#"H%(
P35R3-*9( X'9( _( E5,Q28/2Q29( )'( !"##:&'(M.3.7R,Y*-( -2/2@( X*.*52/,3.9( R3-*489( K826*'( D.( P '( U,..,6( _( U'(
)26*Q,44*(!W-8'&9(N$"1++9,%-2(")(0A+(3I0A(<%0+$%&0,"%&'( O"%)+$+%1+( "%( J&%&-+B+%0( ")( @&0&(!??'(
$#$H;$#$=&'(F*N(V35O@(MPd'(>//?@AA-B'-3,'356A$#'$$#:ADP)W'"#$#'HhhGG"$(
U37-9( )'( !"##=&'( i21*S33O]8( ?5,Q217( /52,.N5*1O@( WB?38K5*9( ,.Q28,3.9( 2.-( 831,24( 13.Q*56*.1*'(
O"%=+$-+%1+U(HA+(<%0+$%&0,"%&'(!"#$%&'(")(X+2+&$1A(,%0"(E+F(J+9,&(H+1A%"'"-,+29(4D!$&9($<;"#'(
>//?@AA-B'-3,'356A$#'$$GGA$<Hh=H%H#G#=hh$%(
)25,*89(J'(j'9(\*,1>9(J'9(f24-39(J'9(V3K.69(W'(d'9(f>,//,.6>,449(J'9([39(M'()'9('''(_(P>K2.69( D'( !"#$h&'( j5,Q2179(
2.3.7R,/79( 2.-( S,6( -2/2( ,.( />*( 831,24( 81,*.1*8'( O"BB#%,1&0,"%2( ")( 0A+( .OJ9( IM!:&9( H%;%<'(
>//?@AA-B'-3,'356A$#'$$hHA"%h<$<"(
)521>84*59(['(_(X5*44*59(f'(!"#$%&'(j5,Q217(2.-( 2.247/,18(;(,/v8(2()WLDPMIW(,88K*'((M(1>*1O4,8/(/3(*8/2S4,8>(
/5K8/*-(4*25.,.6(2.247/,18'(N$"1++9,%-2(")(0A+(L0A(<%0+$%&0,"%&'(O"%)+$+%1+("%(*+&$%,%-(.%&'/0,12(
&%9(P%"F'+9-+(Q*.P(Y4LS9(=:;:='(>//?@AA-B'-3,'356A$#'$$hHA"==<=H$'"==<=:<(
WS.*59(d'9(_(E1>r.9(d'(!"#$<&'(f>7(4*25.,.6(2.247/,18(,.(?5,R257(*-K12/,3.(R2//*58'(D.(P'(`2526,2..,-,8(
_(E'(X520(!W-8'&9(K#''+0,%(")(0A+(H+1A%,1&'(O"BB,00++("%(*+&$%,%-(H+1A%"'"-/9(4I!"&9($h;$G'(
WK538/2/'(!$::%&'(d2.K24(3.(-,81438K5*(13./534(R*/>3-8'(*#Z+B:"#$-U([)),1+()"$([)),1,&'( N#:',1&0,"%2(")(
0A+( T#$"W+&%( O"BB#%,0,+2'( \*/5,*Q*-( 053R(
>//?@AA*1'*K53?2'*KA*K538/2/A52R3.A8/2/R2.K248A0,4*8AR2.K24s3.s-,81438K5*s13./534sR*/>3-
8s$::%'?-0(
X5*44*59(f'9(_()521>84*59(['(!"#$"&'(I52.842/,.6(4*25.,.6(,./3(.KRS*58@(M(6*.*5,1(052R*N35O(035(4*25.,.6(
2.247/,18'(T9#1&0,"%&'(H+1A%"'"-/(&%9(>"1,+0/9(4I!<&9(h";HG'(
`>24,49(d'9(_(WS.*59(d'(!"#$H2&'(M(EIWd(deeP(035(81>334(1>,4-5*.@(f>2/(-3*8(4*25.,.6(2.247/,18(/*44(K8m(
<%0+$%&0,"%&'( O"%)+$+%1+( "%( <%0+$&10,=+( O"''&:"$&0,=+( *+&$%,%-(!DPL("#$H&9( !??'( $"$G;$""$&'(
i435*.1*9(D/247@(DWWW'(
`>24,49(d'9(_(WS.*59(d'(!"#$HS&'(L*25.,.6(2.247/,18@(j5,.1,?4*8(2.-(13.8/52,./8'(D.(E'(P254,.*59(P'(iK4035-9(_(
F'( e8/28>*N8O,( !W-8'&9( N$"1++9,%-2( ")( T9J+9,&U( \"$'9( O"%)+$+%1+( "%( T9#1&0,"%&'( J+9,&( &%9(
H+1A%"'"-/?(574IQ4S?($G=:;$G::'(
!"#$%&'()*+,-*./,0,12/,3.(,.(4*25.,.6(2.247/,18'(!"#$%&'(")(*+&$%,%-(.%&'/0,129(3!$&9($":;$<='(>//?@AA-B'-3,'356A$#'$=%#=AC42'"#$%'<$'=(
(
DEEF($:":+GGH#(!3.4,.*&'(I>*(J3K5.24(30(L*25.,.6(M.247/,18(N35O8(K.-*5(2(P5*2/,Q*(P3RR3.8(L,1*.8*9(M//5,SK/,3.(+(F3.P3RR*51,24+F3)*5,Q8(<'#(T.?35/*-(!PP(UV+FP+F)(<'#&(
43R(
`>24,49( d'9( `28/49( P'9( _( WS.*59( d'( !"#$%&'( j35/527,.6( deeP8( 4*25.*58@( M( 14K8/*5,.6( *B?*5,*.1*( K8,.6(
4*25.,.6(2.247/,18'(D.(d'(`>24,49(d'(WS.*59(d'(`3??9(M'(L35*.Y9(_(d'(`24Y(!W-8'&9(N$"1++9,%-2(")(0A+(
T#$"W+&%( >0&;+A"'9+$( >#BB,0( "%( +ZW+$,+%1+2( &%9( :+20( W$&10,1+2( ,%( &%9( &$"#%9( J[[O2(
QTJ[[O>(574L&(!??'("%H+"G=&'(F35-*58/*-/g(X*5R2.7@(U33O8(,.()*R2.-(XRS['(
d1P244,8/*59( W'9( X52.1*9( I'9( _( E12503.*9( `'( !"#$#&'( ]#,9+( 0"( W$"0+10,%-( 0A+( 1"%),9+%0,&',0/( ")( W+$2"%&''/(
,9+%0,),&:'+( ,%)"$B&0,"%(QN<<S(!\*13RR*.-2/,3.8( 30( />*( F2/,3.24( D.8/,/K/*( 30(E/2.-25-8( 2.-(
I*1>.34367&((\*/5,*Q*-(053R(/>*(N*S8,/*(30(P3R?K/*5(E*1K5,/7(),Q,8,3.(30(/>*(F2/,3.24(D.8/,/K/*(
30(E/2.-25-8(2.-(I*1>.34367(>//?@AA1851'.,8/'63QA?KS4,12/,3.8A.,8/?KS8A=##+$""A8?=##+$""'?-0(
dDI( F*N8'( !"#$h9( d27( <#&'( dDI( 2.-( [25Q25-( 5*4*28*( -*+,-*./,0,*-( 4*25.,.6( -2/2( 053R( 3?*.( 3.4,.*(
13K58*8'( wf*S( ?38/( S7( />*( dDI( F*N8( e00,1*x'( \*/5,*Q*-( 053R( >//?@AA.*N8'R,/'*-KA"#$hAR,/+
2.-+>25Q25-+5*4*28*+-*+,-*./,0,*-+4*25.,.6+-2/2+3?*.+3.4,.*+13K58*8(
e>R9(j'(!"#$#&'(U53O*.(?53R,8*8(30(?5,Q217@(\*8?3.-,.6(/3(/>*(8K5?5,8,.6(02,4K5*(30(2.3.7R,Y2/,3.'(VO*.(
*&F(X+=,+F9(IM9($G#$'(
j25-39(M'9( _(E,*R*.89( X'( !"#$h&'( W/>,124(2.-( ?5,Q217( ?5,.1,?4*8(035( 4*25.,.6( 2.247/,18'(K$,0,2A(!"#$%&'( ")(
T9#1&0,"%&'(H+1A%"'"-/9(DI9(h<=;hH#'(>//?@AA-B'-3,'356A$#'$$$$ASC*/'$"$H"(
j*/*58*.9( \'(J'( !"#$"9( JK47( $=&'( j34,17( -,R*.8,3.8( 30( 2.247/,18( ,.( >,6>*5( *-K12/,3.'(T@VO.V>T( X+=,+F'(
\*/5,*Q*-( 053R( >//?@AA*5'*-K12K8*'*-KA25/,14*8A"#$"AGA?34,17+-,R*.8,3.8+30+2.247/,18+,.+
>,6>*5+*-K12/,3.(
j5,.84339(j'9(_(E42-*9(E'(!"#$<&'(M.(*Q24K2/,3.(30(?34,17(052R*N35O8(035(2--5*88,.6(*/>,124(13.8,-*52/,3.8(
,.(4*25.,.6( 2.247/,18'(N$"1++9,%-2(")(0A+( 3$9(<%0+$%&0,"%&'(O"%)+$+%1+("%(*+&$%,%-(.%&'/0,12(&%9(
P%"F'+9-+9("h#;"hh'(>//?@AA-B'-3,'356A$#'$$hHA"h%#":%'"h%#<hh(
\26>K.2/>2.9(U'( !"#$<&'(HA+(1"BW'+0+(:"";( ")( 9&0&( &%"%/B,C&0,"%U(^$"B(W'&%%,%-(0"( ,BW'+B+%0&0,"%'(
U312(\2/3.9(iL@(P\P(j5*88'(
E2R252/,9(j'9(_(EN**.*79(L'(!$::=&'(N$"0+10,%-(W$,=&1/(FA+%(9,21'"2,%-(,%)"$B&0,"%U(;_&%"%/B,0/(&%9(,02(
+%)"$1+B+%0( 0A$"#-A( -+%+$&',C&0,"%( &%9( 2#WW$+22,"%( !I*1>.,124( 5*?35/&'(\*/5,*Q*-( 053R( />*(
W4*1/53.,1( j5,Q217( D.035R2/,3.( P*./*5( N*S8,/*(
>//?8@AA*?,1'356A?5,Q217A5*,-*./,0,12/,3.AE2R252/,sEN**.*7s?2?*5'?-0(
E,.6*59( F'( !"#$h9( F3Q*RS*5( $=&'( P4288)3C3( 2-3?/8( -*4*/,3.( ?34,17( 035( 8/K-*./( -2/2'( E+F( G"$;(H,B+2'(
\*/5,*Q*-(053R(>//?@AAS,/8'S4368'.7/,R*8'13RA"#$hA$$A$=A14288-3C3+2-3?/8+-*4*/,3.+?34,17+035+
8/K-*./+-2/2A(
E42-*9(E'9(_(X24?,.9(i'(!"#$"&'(L*25.,.6(2.247/,18(2.-(>,6>*5(*-K12/,3.@(W/>,124(?*58?*1/,Q*8'(N$"1++9,%-2(
")( 0A+( 5%9(<%0+$%&0,"%&'( O"%)+$+%1+( "%( *+&$%,%-( .%&'/0,12( &%9( P%"F'+9-+(Q*.P( `45S9( $%;$G'(
>//?@AA-B'-3,'356A$#'$$hHA"<<#%#$'"<<#%$#(
E42-*9( E'9( _( j5,.84339( j'( !"#$<&'( L*25.,.6( 2.247/,18@( W/>,124( ,88K*8( 2.-( -,4*RR28'( .B+$,1&%( K+A&=,"$&'(
>1,+%0,20?(IM?($H#:;$H"='(>//?@AA-B'-3,'356A$#'$$GGA###"G%h"$<hG:<%%(
E36>3,2.9( P'( !"##G9( )*1*RS*5( $&'( MeL9( F*/04,B( 2.-( />*( *.-( 30( 3?*.( 211*88( /3( 5*8*251>( -2/2'( O_E+0a(
\*/5,*Q*-( 053R( >//?@AANNN'1.*/'13RA.*N8A234+.*/04,B+2.-+/>*+*.-+30+3?*.+211*88+/3+
5*8*251>+-2/2A(
EN**.*79(L'(!"###&'(E,R?4*(-*R3652?>,18(30/*.(,-*./,07(?*3?4*(K.,^K*47'(8+&'0A(Q>&%(^$&%1,21"S9(LM49($;
<h'(E2.(i52.1,8139(PM'(
f21>/4*59(J'9(`>24,49(d'9(I2526>,9(U'9(_(WS.*59(d'(!"#$%&'(e.(K8,.6(4*25.,.6(2.247/,18(/3(/521O(/>*(21/,Q,/7(30(
,./*521/,Q*( deeP( Q,-*38'( D.( d'( X,2..2O389( )'X'( E2R?83.9( L'( `,-Y,.8O,9( M'( j25-3( !W-8'&9(
N$"1++9,%-2( ")( 0A+( *.P( 574L( \"$;2A"W( "%( >B&$0( T%=,$"%B+%02( &%9( .%&'/0,12( ,%( b,9+"_K&2+9(
*+&$%,%-( !??'=;$G&( W-,.SK56>9( E13/42.-@(PWT\E+fE'( \*/5,*Q*-( 053R( >//?@AA1*K5+N8'356At34+
$HG:A?2?*5<'?-0(
... Innovation in education through DSR involving GAI will require classrooms to be ready for EDM to determine whether technology interventions help improve learning experiences. However, the data collection and integration face ethical challenges [9], which, if not thoroughly addressed, can compromise the integrity of the research. This study proposes a framework that is efficient and deeply rooted in ethical principles. ...
... Three broad overlapping categories of issues involving ethics in LA were proposed by [16]: 1) the location and interpretation of data, 2) informed consent, privacy, and de-identification of data, and 3) management, classification, and storage of data. Personally identifiable information (PII) is "any information that can identify an individual, and de-identification is used to prevent revealing individual identity and keeping the PII confidential" [9]. This study shall use these categories of [16] as a checklist for addressing EDM ethical concerns. ...
... This study shall use these categories of [16] as a checklist for addressing EDM ethical concerns. Deidentification techniques shall be borrowed from the work of [9]: anonymization (de-identifying data while preserving its original format), masking (replacing sensitive data with fictional data while still making records usable), and blurring (adding noise to records). Design science research (DSR) calls for "creating innovative artifacts to solve real-world problems" iteratively [7]. ...
Conference Paper
Full-text available
Educational data mining (EDM) can be used to design better and smarter learning technology by finding and predicting aspects of learners. Insights from EDM are based on data collected from educational environments. Among these educational environments are computer-based educational systems (CBES) such as learning management systems (LMS) and conversational intelligent tutoring systems (CITS). The use of Large Language Models (LLMs) to power a CITS holds promise due to their advanced natural language understanding capabilities. These systems offer opportunities for enriching management and entrepreneurship education. Collecting data from classes experimenting with these new technologies raises some ethical challenges. This paper presents an EDM framework for analyzing and evaluating the impact of these LLM-based CITS on learning experiences in management and entrepreneurship courses and also places strong emphasis on ethical considerations. The different learning experience aspects to be tracked are 1) learning outcomes and 2) emotions or affect and sentiments. Data sources comprise Learning Management System (LMS) logs, pre-post tests, and reflection papers gathered at multiple time points. This framework aims to deliver actionable insights for course and curriculum design and development through design science research (DSR), shedding light on the LLM-based system's influence on student learning, engagement, and overall course efficacy. Classes targeted to apply this framework have 30-40 students on average , grouped between 2-6 members. They will involve sophomore to senior students aged 18 to 22 years. One entire semester takes about 14 weeks. Designed for broad application across diverse courses in management and entrepreneurship, the framework aims to ensure that the utilization of LLMs in education is not only effective but also ethically sound.
... The advantages of DP are significant: DP is robust against composition attacks 1 . It can also defend against many other attacks on sensitive data, such as bias attacks 23 [32]. 1 a type of attack when attackers combine multiple independently released anonymized datasets to uncover sensitive information of an individual. 2 bias attack generally refers to a situation where a malicious actor manipulates a system's output by exploiting biases in the model or the data it was trained on. 3 a similarity attack occurs when an attacker tries to infer information about an individual's data by comparing the output of a model or a dataset in response to different Furthermore, DP may scale to large datasets and complex queries [32]. ...
... In the context of LA, responding to many privacy measures is more of a framework or policy recommendation, with limited practical application evidence [40] [30]. Additionally, prior research has shown that conventional data anonymization and de-identification methods are incapable of addressing the complexity and diversity of learning data [23] [30]. Motivated by these findings and the compelling recent calls to expand the horizons of LA by integrating insights from other fields, the current paper presents an empirical study to implement and evaluate the application of DP in LA approaches that rely on machine learning. ...
Preprint
Full-text available
This paper addresses the challenge of balancing learner data privacy with the use of data in learning analytics (LA) by proposing a novel framework by applying Differential Privacy (DP). The need for more robust privacy protection keeps increasing, driven by evolving legal regulations and heightened privacy concerns, as well as traditional anonymization methods being insufficient for the complexities of educational data. To address this, we introduce the first DP framework specifically designed for LA and provide practical guidance for its implementation. We demonstrate the use of this framework through a LA usage scenario and validate DP in safeguarding data privacy against potential attacks through an experiment on a well-known LA dataset. Additionally, we explore the trade-offs between data privacy and utility across various DP settings. Our work contributes to the field of LA by offering a practical DP framework that can support researchers and practitioners in adopting DP in their works.
... These often take the form of standardized test scores and grade information, but AI can interpret these with more levels of complexity. For example, Pardo et al. (2019) and Khalil and Ebner (2016) describe how learning analytics platforms can track not just final grades but patterns in how students approach problems and the time they spend on different tasks. Pardo et al. argue that monitoring not only students' final grades but also how they approach a problem and how long they spend on each type of task provides valuable information about their learning process as represented through different patterns shown by learners using LA platforms-dominated topics. ...
Chapter
Full-text available
This chapter examines the data privacy challenges posed by AI-driven education and offers strategic solutions to protect student information. The authors explore how AI systems are collecting various types of student data, from test scores to social interactions, and what this means for privacy. Through real-world examples, the authors shed light on worrying trends, like excessive surveillance and potential data breaches. The authors also tackle the legal and ethical questions that arise when AI meets education and point out how current laws often fall short in this rapidly developing field. Key findings reveal the inadequacy of current regulations and the potential for AI to exacerbate existing educational inequalities. The authors recommend implementing comprehensive data governance policies, investing in educator training on AI and privacy, and incorporating data literacy into curricula. The chapter emphasizes the need for a balanced approach that harnesses AI's benefits while protecting students' privacy through technical solutions, policy reforms, and enhanced digital literacy.
... Data privacy management is an essential part of ethical LA practice. There is a growing base of research on the measurement and mitigation of privacy risks to address ethical challenges presented by the collection and use of learner data for analytics (Chicaiza et al., 2020;Corrin et al., 2019;Drachsler & Greller, 2016;Ferguson, 2019;Gursoy et al., 2017;Hoel & Chen, 2016;Khalil & Ebner, 2016;Machado et al., 2019;Pardo & Siemens, 2014;Steiner et al., 2016). There is also some work addressing the crossdisciplinary nature of learning analytics, including privacy concerns (eg, Teasley, 2019). ...
Article
Data is fundamental to Learning Analytics research and practice. However, the ethical use of data, particularly in terms of respecting learners’ privacy rights, is a potential barrier that could hinder the widespread adoption of Learning Analytics in the education industry. Despite the policies and guidelines of privacy protection being available worldwide, this does not guarantee successful implementation in practice. It is necessary to develop practical approaches that would allow for the translation of the existing guidelines into practice. In this study, we examine an initial set of privacy-preserving mechanisms on a large-scale education dataset. The data utility is evaluated before and after privacy-preserving mechanisms are applied by fitting into commonly used Learning Analytics models, providing an evaluation of the utility loss. We further explore the balance between preserving data privacy and maintaining data utility in Learning Analytics. The results prove the compatibility between preserving learners’ privacy and Learning Analytics, providing a benchmark of utility loss to practitioners and researchers in the education sector. Our study reminds an imminent concern of data privacy and advocates that privacy-preserving can and should be an integral part of the design of any Learning Analytics technique.
... Moreover, AI-driven content creation tools are now generating customized reading material suited to individual literacy levels. Not to be overlooked, predictive analytics in platforms like BrightBytes predict student outcomes, enabling timely intervention [35]. With the ongoing advancements in AI, it is evident that its symbiosis with education is poised for further growth and innovation. ...
Chapter
Artificial intelligence (AI) has emerged as a revolutionary force in education, presenting both significant opportunities and challenges. This study aims to provide a comprehensive overview of the goals and challenges associated with implementing AI in education. The goals of AI in education include personalized learning, increased efficiency, and improved accessibility, which have the potential to transform traditional classrooms into more dynamic, engaging, and inclusive environments. However, the implementation of AI in education also presents several challenges, including ethical concerns, data privacy issues, and the need for adequate training and support for educators and students. This study provides a critical analysis of the current literature on AI in education, highlighting the key goals and challenges, and offers recommendations for overcoming these challenges and maximizing the benefits of implementing AI in education. Ultimately, this study aims to contribute to a better understanding of the potential impact of AI on education and to provide a roadmap for the successful implementation of AI in educational settings.
... In addition to the aforementioned technical solutions that directly address data privacy issues, there are also legal and framework-based solutions. For example, [26] proposed a conceptual framework to de-identify learning analytics data. ...
Preprint
Full-text available
Privacy poses a significant obstacle to the progress of learning analytics (LA), presenting challenges like inadequate anonym ization and data misuse that current solutions struggle to address. Synthetic data emerges as a potential remedy, offering robust privacy protection. However, prior LA research on synthetic data lacks thorough evaluation, essential for assessing the delicate balance between privacy and data utility. Synthetic data must not only enhance privacy but also remain practical for data analytics. Moreover, diverse LA scenarios come with varying privacy and utility needs, making the selection of an appropriate synthetic data approach a pressing challenge. To address these gaps, we propose a comprehensive evaluation of synthetic data, which encompasses three dimensions of synthetic data qua lity, namely resemblance, utility, and privacy. We apply this evaluation to three distinct LA datasets, using three different synthetic data generation methods. Our results show that synthetic data can maintain similar utility (i.e., predictive performance) as real data, while preserving privacy. Furthermore, considering different privacy and data utility requirements in different LA scenarios, we ma ke customized recommendations for synthetic data generation. This paper not only presents a comprehensive evaluation of synthetic data but also illustrates its potential in mitigating privacy concerns within the field of LA, thus contributing to a wider application of synthetic data in LA and promoting a better practice for open science.
... Cavoukian's (2012) privacy by design framework includes end-to-end security as one of the seven principles. Due to this connection, security is often mentioned in lists of ethical concerns related to learning analytics (Khalil & Ebner, 2016;Steiner et al., 2016). Pardo and Siemens (2014) also noted how security impacted learning analytics. ...
Article
Full-text available
This single-site case study will seek to answer the following question: how is the concept of privacy addressed in relation to a student success information system within a small, public institution of higher education? Three themes were found within the inductive coding process, which used interviews, documentation, and videos as data resources. Overall, the case study shows an institution in the early stages of implementing a commercial learning analytics system and provides suggestions for how it can be more proactive in implementing privacy considerations in developing policies and procedures.
Article
This review article provides a comprehensive exploration of the key pillars of trustworthy AI: security privacy and robustness. The article delved into security measures both traditional and cutting edge identifying emerging threats and challenges in ever ever-evolving landscape of artificial intelligence (AI) the discussion extends to advanced encryption techniques and imperative privacy preservation, emphasizing the ethical consideration inherent in safeguarding user data. The robustness and adversarial attack on AI, present techniques for the robustness model and ensure model interpretability and explainability through AI. The exploration of federated learning (FL) elucidates its conceptual foundations and intricate interplay between security, privacy, and collaborative model training. Differential privacy (DP) outlines insights into its application, and challenges. The ethical consideration section scrutinized bias and fairness in AI. The article concludes with an examination of emerging technologies in AI security and privacy anticipating challenges. This review article serves as a comprehensive guide to navigating the complex terrain of trustworthy AI.
Chapter
Full-text available
Educational data mining (EDM) can be used to design better and smarter learning technology by finding and predicting aspects of learners. Amend if necessary. Insights from EDM are based on data collected from educational environments. Among these educational environments are computer-based educational systems (CBES) such as learning management systems (LMS) and conversational intelligent tutoring systems (CITSs). The use of large language models (LLMs) to power a CITS holds promise due to their advanced natural language understanding capabilities. These systems offer opportunities for enriching management and entrepreneurship education. Collecting data from classes experimenting with these new technologies raises some ethical challenges. This paper presents an EDM framework for analyzing and evaluating the impact of these LLM-based CITS on learning experiences in management and entrepreneurship courses and also places strong emphasis on ethical considerations. The different learning experience aspects to be tracked are (1) learning outcomes and (2) emotions or affect and sentiments. Data sources comprise Learning Management System (LMS) logs, pre-post-tests, and reflection papers gathered at multiple time points. This framework aims to deliver actionable insights for course and curriculum design and development through design science research (DSR), shedding light on the LLM-based system’s influence on student learning, engagement, and overall course efficacy. Classes targeted to apply this framework have 30–40 students on average, grouped between 2 and 6 members. They will involve sophomore to senior students aged 18–22 years. One entire semester takes about 14 weeks. Designed for broad application across diverse courses in management and entrepreneurship, the framework aims to ensure that the utilization of LLMs in education is not only effective but also ethically sound.
Technical Report
Full-text available
In data mining and data analytics, tools and techniques once confined to research laboratories are being adopted by forward-looking industries to generate business intelligence for improving decision making. Higher education institutions are beginning to use analytics for improving the services they provide and for increasing student grades and retention. The U.S. Department of Education's National Education Technology Plan, as one part of its model for 21st-century learning powered by technology, envisions ways of using data from online learning systems to improve instruction. With analytics and data mining experiments in education starting to proliferate, sorting out fact from fiction and identifying research possibilitiesand practical applications are not easy. This issue brief is intended to help policymakers and administrators understand how analytics and data mining have been-and can be-applied for educational improvement. At present, educational data mining tends to focus on developing new tools for discovering patterns in data. These patterns are generally about the microconcepts involved in learning: one-digit multiplication, subtraction with carries, and so on. Learning analytics-at least as it is currently contrasted with data mining-focuses on applying tools and techniques at larger scales, such as in courses and at schools and postsecondary institutions. But both disciplines work with patterns and prediction: If we can discern the pattern in the data and make sense of what is happening, we can predict what should come next and take the appropriate action. Educational data mining and learning analytics are used to research and build models in several areas that can influence online learning systems. One area is user modeling, which encompasses what a learner knows, what a learner's behavior and motivation are, what the user experience is like, and how satisfied users are with online learning. At the simplest level, analytics can detect when a student in an online course is going astray and nudge him or her on to a course correction. At the most complex, they hold promise of detecting boredom from patterns of key clicks and redirecting the student's attention. Because these data are gathered in real time, there is a real possibility of continuous improvement via multiple feedback loops that operate at different time scales-immediate to the student for the next problem, daily to the teacher for the next day's teaching, monthly to the principal for judging progress, and annually to the district and state administrators for overall school improvement. The same kinds of data that inform user or learner models can be used to profile users. Profiling as used here means grouping similar users into categories using salient characteristics. These categories then can be used to offer experiences to groups of users or to make recommendations to the users and adaptations to how a system performs. User modeling and profiling are suggestive of real-time adaptations. In contrast, some applications of data mining and analytics are for more experimental purposes. Domain modeling is largely experimental with the goal of understanding how to present a topic and at what level of detail. The study of learning components and instructional principles also uses experimentation to understand what is effective at promoting learning. These examples suggest that the actions from data mining and analytics are always automatic, but that is less often the case. Visual data analyticsclosely involve humans to help make sense of data, from initial pattern detection and model building to sophisticated data dashboards that present data in a way that humans can act upon. K-12 schools and school districts are starting to adopt such institution-level analyses for detecting areas for instructional improvement, setting policies, and measuring results. Making visible students' learning and assessment activities opens up the possibility for students to develop skills in monitoring their own learning and to see directly how their effort improves their success. Teachers gain views into students' performance that help them adapt their teaching or initiate tutoring, tailored assignments, and the like. Robust applications of educational data mining and learning analytics techniques come with costs and challenges. Information technology (IT) departments will understand the costs associated with collecting and storing logged data, while algorithm developers will recognize the computational costs these techniques still require. Another technical challenge is that educational data systems are not interoperable, so bringing together administrative data and classroom-level data remains a challenge. Yet combining these data can give algorithms better predictive power. Combining data about student performance-online tracking, standardized tests, teachergenerated tests-to form one simplified picture of what a student knows can be difficult and must meet acceptable standards for validity. It also requires careful attention to student and teacher privacy and the ethical obligations associated with knowing and acting on student data. Educational data mining and learning analytics have the potential to make visible data that have heretofore gone unseen, unnoticed, and therefore unactionable. To help further the fields and gain value from their practical applications, the recommendations are that educators and administrators: • Develop a culture of using data for making instructional decisions. • Involve IT departments in planning for data collection and use. • Be smart data consumers who ask critical questions about commercial offerings and create demand for the most useful features and uses. • Start with focused areas where data will help, show success, and then expand to new areas. • Communicate with students and parents about where data come from and how the data are used. • Help align state policies with technical requirements for online learning systems.Researchers and software developers are encouraged to: • Conduct research on usability and effectiveness of data displays. • Help instructors be more effective in the classroom with more realtime and data-based decision support tools, including recommendation services. • Continue to research methods for using identified student information where it will help most, anonymizing data when required, and understanding how to align data across different systems. • Understand how to repurpose predictive models developed in one context to another. A final recommendation is to create and continue strong collaboration across research, commercial, and educational sectors. Commercial companies operate on fast development cycles and can produce data useful for research. Districts and schools want properly vetted learning environments. Effective partnerships can help these organizations codesign the best tools.
Conference Paper
Full-text available
It is widely known that interaction, as well as communication, are very important parts of successful online courses. These features are considered crucial because they help to improve students’ attention in a very significant way. In this publication, the authors present an innovative application, which adds different forms of interactivity to learning videos within MOOCs such as multiple-choice questions or the possibility to communicate with the teacher. Furthermore, Learning Analytics using exploratory examination and visualizations have been applied to unveil learners’ patterns and behaviors as well as investigate the effectiveness of the application. Based upon the quantitative and qualitative observations, our study determined common practices behind dropping out using videos indicator and suggested enhancements to increase the performance of the application as well as learners’ attention.
Conference Paper
Full-text available
Massive Open Online Courses are remote courses that excel in their students' heterogeneity and quantity. Due to the peculiarity of being massiveness, the large datasets generated by MOOCs platforms require advance tools to reveal hidden patterns for enhancing learning and educational environments. This paper offers an interesting study on using one of these tools, clustering, to portray learners' engagement in MOOCs. The research study analyse a university mandatory MOOC, and also opened to the public, in order to classify students into appropriate profiles based on their engagement. We compared the clustering results across MOOC variables and finally, we evaluated our results with an eighties students' motivation scheme to examine the contrast between classical classes and MOOCs classes. Our research pointed out that MOOC participants are strongly following the Cryer's scheme of Elton (1996).
Conference Paper
Full-text available
The widespread adoption of Learning Analytics (LA) and Educational Data Mining (EDM) has somewhat stagnated recently, and in some prominent cases even been reversed following concerns by governments, stakeholders and civil rights groups about privacy and ethics applied to the handling of personal data. In this ongoing discussion, fears and realities are often indistinguishably mixed up, leading to an atmosphere of uncertainty among potential beneficiaries of Learning Analytics, as well as hesitations among institutional managers who aim to innovate their institution's learning support by implementing data and analytics with a view on improving student success. In this paper, we try to get to the heart of the matter, by analysing the most common views and the propositions made by the LA community to solve them. We conclude the paper with an eight-point checklist named DELICATE that can be applied by researchers, policy makers and institutional managers to facilitate a trusted implementation of Learning Analytics.
Conference Paper
Full-text available
Massive Open Online Courses (MOOCs) have been tremendously spreading among Science, Technology, Engineering and Mathematics (STEM) academic disciplines. These MOOCs have served an agglomeration of various learner groups across the world. The leading MOOCs platform in Austria, the iMooX, offers such courses. This paper highlights authors’ experience of applying Learning Analytics to examine the participation of secondary school pupils in one of its courses called “Mechanics in everyday life”. We sighted different patterns and observations and on the contrary of the expected jubilant results of any educational MOOC, we will show, that pupils seemingly decided to consider it not as a real motivating learning route, but rather as an optional homework.
Conference Paper
Full-text available
Within the evolution of technology in education, Learning Analytics has reserved its position as a robust technological field that promises to empower instructors and learners in different educational fields. The 2014 horizon report (Johnson et al., 2014), expects it to be adopted by educational institutions in the near future. However, the processes and phases as well as constraints are still not deeply debated. In this research study, the authors talk about the essence, objectives and methodologies of Learning Analytics and propose a first prototype life cycle that describes its entire process. Furthermore, the authors raise substantial questions related to challenges such as security, policy and ethics issues that limit the beneficial appliances of Learning Analytics processes.
Article
Full-text available
Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification procedures necessitate changes to data sets that threaten replication and extension of baseline analyses. To balance student privacy and the benefits of open data, we suggest focusing on protecting privacy without anonymizing data by instead expanding policies that compel researchers to uphold the privacy of the subjects in open data sets. If we want to have high-quality social science research and also protect the privacy of human subjects, we must eventually have trust in researchers. Otherwise, we'll always have the strict tradeoff between anonymity and science illustrated here.
Conference Paper
Full-text available
Higher education institutions have collected and analysed student data for years, with their focus largely on reporting and management needs. A range of institutional policies exist which broadly set out the purposes for which data will be used and how data will be protected. The growing advent of learning analytics has seen the uses to which student data is put expanding rapidly. Generally though the policies setting out institutional use of student data have not kept pace with this change. Institutional policy frameworks should provide not only an enabling environment for the optimal and ethical harvesting and use of data, but also clarify: who benefits and under what conditions, establish conditions for consent and the de-identification of data, and address issues of vulnerability and harm. A directed content analysis of the policy frameworks of two large distance education institutions shows that current policy frameworks do not facilitate the provision of an enabling environment for learning analytics to fulfil its promise.