Michael Tsikerdekis

Michael Tsikerdekis
Western Washington University | WWU · Department of Computer Science

PhD

About

37
Publications
130,007
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
719
Citations
Introduction
My scientific research theme lies at the intersection of computational systems and social systems, and my research interests revolve around cybersecurity, online deception, social engineering, data mining and machine learning. FOR MY ARTICLES, SEND A REQUEST OR SOME OF THEM CAN BE FOUND ON MY WEBSITE: http://michael.tsikerdekis.com
Additional affiliations
July 2017 - September 2021
Western Washington University
Position
  • Professor (Assistant)
August 2013 - July 2017
University of Kentucky
Position
  • Professor (Assistant)
Education
February 2008 - January 2013
Masaryk University
Field of study
  • Informatics

Publications

Publications (37)
Article
Identity deception has become an increasingly important issue in the social media environment. The case of blocked users initiating new accounts, often called sockpuppetry, is widely known and past efforts, which have attempted to detect such users, have been primarily based on verbal behavior (e.g., using profile data or lexical features in text)....
Article
Proliferation of web-based technologies has revolutionized the way content is generated and exchanged through the Internet, leading to proliferation of social-media applications and services. Social media enable creation and exchange of user-generated content and design of a range of Internet-based applications. This growth is fueled not only by mo...
Article
In the past decade social networking services (SNS) flooded the Web. The nature of such sites makes identity deception easy, offering a quick way to set up and manage identities, and then connect with and deceive others. Fighting deception requires a coordinated approach by users and developers to ensure detection and prevention. This article ident...
Article
Identity deception in social media applications has negatively impacted online communities and it is likely to increase as the social media user population grows. The ease of generating new accounts on social media has exacerbated the issue. Many previous studies have been posited that focused on both verbal, non-verbal and network data produced by...
Book
Grokking Relational Database Design is a friendly illustrated guide to designing and implementing your first database. In this book, you’ll learn how to: (1) Query and create databases using Structured Query Language (SQL), (2) Design databases from scratch, (3) Implement and optimize database designs, and (4) Take advantage of generative AI when...
Article
The integration of artificial intelligence has led to significant advancements across industries but also exposed systems to security vulnerabilities. We evaluate defense methods, including robust data practices, adversarial training, model hardening, fairness-aware algorithms, and privacy-preserving techniques, and highlight each method’s effectiv...
Book
"Overnight Hercules for Network Security" by Michail Tsikerdekis is an essential guide for aspiring network security professionals, offering a practical, example-driven approach to mastering the fundamentals of cybersecurity. Written by a seasoned expert with over a decade of experience, the book covers critical skills such as threat hunting, foren...
Article
In recent years, we have witnessed growing interest in using deep learning to detect misinformation. This increased attention is being driven by deep learning technologies’ ability to accurately detect this misinformation. However, there is a diverse array of content that can be considered misinformation, such as fake news and satire. Similarly, in...
Article
Full-text available
Network anomaly detection solutions can analyze a network’s data volume by protocol over time and can detect many kinds of cyberattacks such as exfiltration. We use exponential random graph models (ERGMs) in order to flatten hourly network topological characteristics into a time series, and Autoregressive Moving Average (ARMA) to analyze that time...
Article
Full-text available
Network anomaly detection solutions are being used as defense against several attacks, especially those related to data exfiltration. Several methods exist in the literature, such as clustering or neural networks. However, these methods often focus on local and global network indicators instead of network structural properties, such as understandin...
Article
Full-text available
Marine network protocols are domain-specific network protocols that aim to incorporate particular features within the specialized marine context that devices are implemented in. Devices implemented in such vessels involve critical equipment; however, limited research exists for marine network protocol security. In this paper, we provide an analysis...
Article
Full-text available
In the last few years, we have witnessed an explosive growth of fake content on the Internet which has significantly affected the veracity of information on many social platforms. Much of this disruption has been caused by the proliferation of advanced machine and deep learning methods. In turn, social platforms have been using the same technologic...
Article
Full-text available
Identity deception in online social networks is a pervasive problem. Ongoing research is developing methods for identity deception detection. However, the real-world efficacy of these methods is currently unknown because they have been evaluated largely through laboratory experiments. We present a review of representative state-of-the-art results o...
Article
Full-text available
Online question & answer (Q & A) is a distinctive type of online interaction that is impactful on student learning. Prior studies on online interaction in large-scale classes mainly focused on online discussion and were conducted mainly in non-STEM fields. This research aims to quantify the effects of online Q & A interactions on student performanc...
Article
The surge of content (such as fake news) in the last few years has made content deception an important area of research. We identify two main types of content deception based on either fake content or misleading content. We present a classification of deception attacks along with their delivery methods. We also discuss defense measures that can det...
Article
Full-text available
Bots in video games has been gaining the interest of industry as well as academia as a problem that has been enabled by the recent advances in deep learning and reinforcement learning. In turn several studies have attempted to establish bot detectors in various video games. In this article, we introduce a bot detection model that can implemented in...
Article
Full-text available
Advances in hardware, software, communication, embedding computing technologies along with their decreasing costs and increasing performance have led to the emergence of the Internet of Things (IoT) paradigm. Today, several billions of Internet-connected devices are part of the IoT ecosystem. IoT devices have become an integral part of the informat...
Article
Full-text available
As the societal demands for application and knowledge in computer science (CS) increase, CS student enrollment keeps growing rapidly around the world. By continuously improving the efficacy of computing education and providing guidelines for learning and teaching practice, computing education research plays a vital role in addressing both education...
Article
Full-text available
Malicious attackers have been prevalent in online communities with the ability to create accounts freely and with ease on social platforms. This negatively impacts sub-communities on these platforms that often require trusted members especially when they are intended to be used for exchange of valuable goods or services. Prior research on identity...
Conference Paper
Full-text available
Honeypots have been used extensively for over two decades. However, their development is rarely accompanied with an understanding of how attackers are able to detect them. Further, our understanding of effective evasion strategies that prevent the detection of honeypots is limited. We present a classification of honeypot characteristics as well as...
Article
Full-text available
Measuring code contribution in crowdsourced software is essential for ranking contributors to a project or distributing revenue. Past studies have demonstrated that there is variation between different code contribution measures and their ability for ranking users accurately. This study proposes a new code contribution ranking algorithm, Persistent...
Article
Intra-institutional collaboration is an often neglected type of research collaboration from the literature. This study aimed to understand what factors contribute to this type of collaboration as well as what types of factors can impact negatively the likelihood of this collaboration. We deployed a survey in a US research institution and measured f...
Article
Cumulative experience is often seen as a major factor for influencing content quality in collaborative projects such as Wikipedia. However, past studies often utilize cumulative experience based on the quantity of work rather than quality and context. Moreover, the perspective on cumulative experience assumes a final destination for user behavior,...
Article
Identity deception detection methods have been proposed for social media platforms with high effectiveness but their efficiency can vary. Previous literature has not examined the potential of these methods to work as real-time monitoring systems. Such implementations highlight further the challenges of applying computationally intensive methods in...
Article
Designing a collaborative platform that produces project outcomes of high quality and allows for wisdom of the crowds to come together in the achievement of a common goal can be a challenge. Literature often addresses the interplay between designing for online community needs and outcome/ product quality as coexistence, where design implementations...
Article
Information and communication technologies (ICTs) provide a distinctive structure of opportunities with the potential to promote political engagement. However, concerns remain over unequal technological access in our society, as political resources available on the internet empower those with the resources and motivation to take advantage of them,...
Chapter
Bayesian inference has a long standing history in the world of statistics and this chapter aims to serve as an introduction to anyone who has not been formally introduced to the topic before. First, Bayesian inference is introduced using a simple and analytical example. Then, computational methods are introduced. Examples are provided with common H...
Chapter
Social networks such as Facebook and LinkedIn have gained a lot of popularity in recent years. These networks use a large amount of data that are highly valuable for different purposes. Hence, social networks become a potential vector for attackers to exploit. This chapter focuses on the security attacks and countermeasures used by social networks....
Article
Online collaborative projects have been utilized in a variety of ways over the past decade, such as bringing people together to build open source software or developing the world's largest free encyclopedia. Personal communication networks as a feature do not exist in all collaborative projects. It is currently unclear if a designer's decision to i...
Article
Groupthink behavior is always a risk in online groups and group decision support systems (GDSS), especially when not all potential alternatives for problem resolution are considered. It becomes a reality when individuals simply conform to the majority opinion and hesitate to suggest their own solutions to a problem. Anonymity has long been establis...
Conference Paper
The rise in popularity of social media along with new web technologies has presented designers and developers with tremendous new interface opportunities for evaluating user-generated content. One of these new interface designs found in social media today, is the dynamic voting interface; voting results are public from the initiation of an evaluati...
Thesis
Full-text available
Social media today play an increasingly important role in computer science, the information technologies industry and society at large, changing people's everyday communication and interaction. The domain of social media encompasses a variety of services, such as social networking services, collaborative projects, microblogging services and even vi...
Article
Groupthink can occur under certain conditions in all collaborative groups and online groups that use group decision support systems such as wikis which are of no exception. One of the measures for preventing groupthink within a group is ensuring that all alternatives are considered and are also evaluated in detail. Pro/con list interfaces may be mo...
Article
Full-text available
Anonymity is a factor that could lead to disinhibited behavior which is something that could cause damage to many online communities. Anonymity is a generic term and should be analyzed further into different states such as pseudonymity and complete anonymity. In this paper a survey was conducted in order to determine the differences between the two a...
Conference Paper
The effects of anonymity on aggression have been discussed by many social scientists in the past years. Anonymity is a factor that could lead to disinhibited behavior which could damage many online communities. This knowledge provides software engineers with a dilemma as to whether to use anonymity as an option for their users and suffer the increa...

Questions

Questions (19)
Question
I want to compare 50 different network structures fitted in two categories (A has 25 networks, and B has 25 networks). Individuals may exist in multiple of these 50 networks.
I want to find a way that I can quantify metrics such as density and centrality over time as these networks evolve.
Basically, I want to see if the velocity through which networks evolve has anything to do with a network being part of A group or B group. The simple example of this will be, a network add 1.5 users per day. Any ideas if I could quantify in the same way density and centrality metrics over time? It does not have to be a single value as long as the final representation of metric can be compared as a whole.
Question
I am experimenting with finding a way to trace behaviors back to design features for a game and I figured affordances may be the way to go. The goal is through participant observation and interviews to tie behaviors back to affordances that are aimed back at artifacts. Here is what I have in mind with an example.
Say I want to study players playing FIFA online.
Stage 1: Play the game and attempt to identify artifacts and affordances. Artifacts do not have to just be part of UI but they can also non-visual (e.g., algorithms and other pieces of code). The view that I would build is still one of perceptual affordances but probably closer to a designers model rather than a user's model. That's okay. Also, I am expecting that of the affordances I discover not all users will perceive the same affordances or utilize all features in a game. In fact, some artifacts that lead to affordances may be a mystery to some players.
Example: 
=Artifact=                         =Affordances=
Team Score indicator      tracking game progress
                                        predicting game outcome*
Player rank                      tracking players skill in the world
* An artifact can have more than one affordance. Some are implicit while others explicit. Not all affordances have to be confirmed at this stage either. This is so that I will have something to work with.
** I also may or may not categorize affordances at this point.
Stage 2: Play the game but this time attempt to actively talk to people as well as have interviews. The goal is to verify as many of the affordances identified as I can and even link them to the artifacts. At this stage, I also aim to expand affordances to further behaviors.
Example:
=Artifact=                         =Affordances=                            =Behavioral outcomes/effects=
Team Score indicator      tracking game progress                 feeling pressured
                                        predicting game outcome               becoming more aggressive near the end of the game
Player rank                      tracking players skill in the world    feeling exposed
* At this point some of affordances will be dropped if not verified. Others may be added. Behavioral are linked to affordances all the way to artifacts.
Grouping of artifacts and affordances under themes will be done at this point.
Stage 3: Ask actors more direct questions that aim to provide a more definitive picture of the links identified.
I omitted a lot of the process but kept the gist.
Question
I have a numeric output variable and a numeric predictor in a small sample size. For example, my output variable is percentage of domestic abuse per each state and my predictor is the percentage of alcohol abuse per state.
I have multiple predictor variables similar to the one above. In a linear model, some are significant while others are not. When I convert them to binary variables based on the median across my sample size (50 states in my example) some seize to be significant while others become significant when previously they were not.
In terms of interpretation, a numeric alcohol abuse causing domestic abuse makes sense, but also, a binary "high" alcohol abuse (or alcohol abuse above median) causing domestic abuse makes sense. My sample size is small due to the context of the problem, so binary variables of high and low usage do seem to give me more power. However, other variables, when converted to binary, they lose their significance in the linear model.
When is it methodologically correct to convert numeric variables to binary? Does it make sense from a point of simplifying interpretation in a limited sample size?
Question
I have two models with the same binary dependent variable but different independent variables. I haven't used all IV in one model because:
a) for the second set of IVs there are missing data for half the dataset, while for the first set of IVs I have data for the complete dataset
b) The IV in both models can be grouped logically as a set of variable measuring verbal and non-verbal communication. As such using two separate models based on each category of IVs makes sense.
My question is, how do I combine the probabilities for each case based on each of the two models? One solution that I came up with is to get the mean between the two probabilities obtained. So, if one model gives .5 for positive and the second give .8 for positive, then the mean is .65. A more sophisticated approach would be to use Bayes theorem to combine classifiers.
and in equation 8 it shows that this is possible. I have two questions:
a) what is a and what value should it take? How do I determine this?
b) Based on my numbers for P(C) = 0.5 (the occurence of a positive dependent variable and P(C|D1)= .5 and P(C|D2) = .8 then according to Bayes theorem with taking account the a, I would get a P(C|D1,D2) = .8. Does this make sense?
I am far from an expert on this so if you have any other ideas that may be effective and apply to my problem please do share.
Question
I need your assistance and constructive criticism to a) evaluate parts of my method which are correct b) find weak points and improve on them.
I am far from an expert on ERGM and my case is rather "special" because I am dealing with a large network (most examples that I found were dealing with relatively smaller networks).
I have a network of 7 million edges and 5 million nodes. Nodes have several quantitative attributes. My main goal is to find if these attributes influence the probability of tie formation and if people with similar values tend to have a higher probability to form relations.
Since the network is too large, I took an uniform independent sample for 26697 nodes. The sampling method is favored by literature (see for example http://www.minasgjoka.com/papers/wosn2012-kurant_coarse-topology.pdf). All of their edges even relations to nodes that were not in the sample were included. The sampled network had 39983 nodes and 67024 edges. Then I built a couple of models and I have their results attached to the text file along with the gof of the last one.
I have several questions regarding my results:
1) Do I have to include network metrics (mutual, kstar, etc) if these do not revolve around my hypotheses? Even if I find any results this will probably be irrelevant to the topic that I am working on.
2) I actually did try to build a model for mutual out of curiosity but got back awful diagnostics for mcmc (even with 100,000 sample and 50000 burnin). Instead of normal plots on the right side of the plots printed by mcmc.diagnostics the plots were truly all over the place.
3) The AIC and BIC seem to be quite high compared to other examples. Does it matter? My suspicion is that this is a result of the size of the network.
4) The gof does not seem to fit the data well in several metrics while it is effective in others up to a level. Given the size of the network I am not sure that I will ever get a proper model that would fit the data exactly. Is this however even relevant? Can I still make assertions about my node attributes affecting the probabilities for tie formation?

Network

Cited By