ArticlePDF Available

Changing Paradigms of Technical Skills for Data Engineers

  • Regis University, Denver, CO, USA

Abstract and Figures

Aim/Purpose: This paper investigates the changing paradigms for technical skills that are needed by Data Engineers in 2018. Background: A decade ago, data engineers needed technical skills for Relational Database Management Systems (RDBMS), such as Oracle and Microsoft SQL Server. With the advent of Hadoop and NoSQL Databases in recent years, Data Engineers require new skills to support the large distributed datastores (Big Data) that currently exist. Job demand for Data Scientists and Data Engineers has increased over the last five years. Methodology: This research methodology leveraged the Pig programming language that used MapReduce software located on the Amazon Web Services (AWS) Cloud. Data was collected from 100 job advertisements during July of 2017 and then was uploaded to the AWS Cloud. Using MapReduce, phrases/words were counted and then sorted. The sorted phrase / word counts were then leveraged to create the list of the 20 top skills needed by a Data Engineer based on the job advertisements. This list was compared to the 20 top skills for a Data Engineer presented by Stitch that surveyed 6,500 Data Engineers in 2016. Contribution: This paper presents a list of the 20 top technical skills required by a Data Engineer.
Content may be subject to copyright.
A preview of the PDF is not available
... Database technologies have received vast attention since the 1960s and remain one of the fastest growing fields in IT (Huang & Leng, 2019;Mason, 2018). Database management is seen as a foundational element of any IT program (Leidig & Salmela, 2021;Topi et al., 2010) and increasingly included in general business programs as the inclusion of data management and data analytics skills are recommended in the 2018 Association to Advance Collegiate Schools of Business (AACSB) guidelines (AACSB International, 2018;Larson et al., 2021). ...
Full-text available
The complexity of today’s organizational databases highlights the importance of hard technical skills as well as soft skills including teamwork, communication, and problem-solving. Therefore, when teaching students about databases it follows that using a team approach would be useful. Team-based learning (TBL) has been developed and tested as an instructional strategy that leverages learning in small groups in order to achieve increased overall effectiveness. This research studies the impact of utilizing team-based learning strategies in an undergraduate Database Management course in order to determine if the methodology is effective for student learning related to database technology concepts in addition to student preparation for working in database teams. In this study, a team-based learning strategy is implemented in an undergraduate Database Management course over the course of two semesters. Students were assessed both individually and in teams in order to see if students were able to effectively learn and apply course concepts on their own and in collaboration with their team. Quantitative and qualitative data was collected and analyzed in order to determine if the team approach improved learning effectiveness and allowed for soft skills development. The results from this study are compared to previous semesters when team-based learning was not adopted. Additionally, student perceptions and feedback are captured. This research contributes to the literature on database education and team-based learning and presents a team-based learning process for faculty looking to adopt this methodology in their database courses. This research contributes by showing how the collaborative assessment aspect of team-based learning can provide a solution for the conceptual and collaborative needs of database education.
Full-text available
This paper presents an experience report on teaching Data Engineering as a graduate-level class using a real-world project domain. Traditional computer science database courses focus on relational database theory and typically offer a background in SQL and database implementation. Our course presented databases within the context of Systems and Information Engineering, supplementing traditional relational database theory with a strong sequence of requirements engineering, data design, and analysis. The primary deliverable of the course was a semester-long project to implement an information system in a real-world application domain (that is, with a real, external customer with uncertain requirements in a practical business setting.) We believe that the use of such project domains motivate students to apply good Software Engineering principles in the classroom, which consequently encourages those principles to be extended into industrial practice.
IBM predicts demand for data scientists will soar 28% by 2020
  • L Columbus
Columbus, L. (2017). IBM predicts demand for data scientists will soar 28% by 2020. Retrieved from
50 best jobs in America
  • Glassdoor
Glassdoor. (2018). 50 best jobs in America. Retrieved from,20.htm
Indeed slips past Monster, Now largest job site by unique visitors
  • E Schonfeld
Schonfeld, E. (2010). Indeed slips past Monster, Now largest job site by unique visitors. Retrieved from
IEEE Technical Committee on Data Engineering
  • Tcde
TCDE. (2018). IEEE Technical Committee on Data Engineering. Retrieved from