The software engineering field is weird. Truly weird. One would certainly expect to hear that for hiring a nanny, a maid, a cook, a painter, a mechanical engineer, a graphic designer… or similar education is not key, while EXPERIENCE IS, but to hear this when one would like to get hired in the big data engineering field…. well, this is not-so-expected!
… Yet this is right.
Big data engineering is not a job one could do by learning in class or by buying subscriptions to some online classes. Say you do not expect to be a brilliant data engineer and are satisfied with being a lousy one – education alone would still not suffice.
To become a big data engineer one needs to be passionate about math, problem-solving, numbers, charts, graphs and – certainly – about IT. One becomes a real data engineer only after certain hands-on experience: this might be the most important difference between software engineers: while programming could be an incipient exercise, proper data engineering would not.
…Before we elaborate, let us clarify two related concepts: what is ‘big data’ and what is ‘data analyst’ (as this sounds pretty similar with data engineer).
Big data refers to extremely large data sets that are commonly – or purposely – collected by companies while conducting their business operations. When used correctly, big data can be highly beneficial for organizations in improving efficiency, profitability and scalability. However, companies’ big data is not helpful unless there is a big data engineer to build systems to collect, maintain and extract data. With these, a data analyst generates insights, using various predictive models(while a data analyst is recommended to have some coding experience, this is not a sine-qua-non condition).
Compared to a data analyst, a big data engineer is primarily responsible for building and maintaining the systems and processes that collect and extract data. So one is the miner, the other is the grinder. As simple as that. No wonder that data scientists, machine learning engineers and big data engineers rank in top emerging jobs in LinkedIn these days, despite AI proliferation and potential threats that some say AI – such as GPT4 of OpenAI or Ernie of Baidu – will bring tomorrow (almost literally, tomorrow).
Some typical job responsibilities of a big data engineer are creating systems for collecting data and for processing that respective data; creating data architectures that meet the requirements of the business; using Extract Transform Load operations (the so-called ‘ETL process’); creating structured data solutions using various programming languages and tools and mining data from multiple areas to construct efficient business models.
The collaborative approach of a big data engineer having to perform these responsibilities is self-implied, of which the most important one is with data scientists. The way these two key functions in companies dealing with big data work together and are aligned towards the same objective could be a ‘make it or break it’ for the company’s performance. As such, hiring a professional and well-experienced big data engineer is a very important task (please follow regularly our updated open positions for such functions here: https://www.vonconsulting.ro/jobs/).
It is not common for big data engineers to possess all of the following skills:
- Computer programming with languages like C++, Java, and Python
- Databases and SQL
- ETL and data warehousing
- Talend, IBM DataStage, Pentaho, and Informatica
- Operating system knowledge for Unix, Linux, Windows, and Solaris
- Hadoop
- Apache Spark
- Data mining and modeling
- …and some more.
In our more than 20-year experience we could learn that it is of utmost importance for a data engineer to have a strong programming background, as well as a love of – or at least an interest in – data and/ or in finding patterns in data. Work could become boring if these two are missing, as big data projects are 10 times more complex on average than regular software projects of companies or small data projects.
And like most of the software engineering jobs, a data engineer needs to understand the company’s (field of) business well and to be updated with the strategy. ‘Where do we want to arrive, by when and with what resources?’ is not only a question for managers, but for most of the big data engineers, as well.
…So it is not easy.
But it is essential and sometimes beautiful.
And it can be summed up as follows: 1. ensure that the data pipeline (the acquisition and processing of data) works; 2. serve the needs of the data scientists and data analysts (the so-called ‘internal customers’); and 3. control the cost of moving and storing data.
…Needless to say that social and communications skills are also (ideally) required for a big data engineer, as are for any IT engineer. Slowly but surely the myth of not being so is destroyed.