As a society whose name and purpose revolves around the field of data science, it seems fitting to take a step back and examine data science itself. What constitutes this degree and this career? It might be ‘the sexiest job of the 20th century’, but sexy is hardly precise or specific, especially when the subject is an academic field rather than the aesthetic appreciation of the human (or non-human) body.
It’s a common question, but let’s ask it again: what is data science?
Imagine for a moment that you open your browser and type into your search bar the simple words “data science”. The page loads, the results pop up. You click on one site, then another. Before long, the number of tabs open has become unmanageable, and you, yourself, are slowly coming to the horrific realization that everyone is saying something a little bit different and little in contradiction with each other.
Data science is, to put it simply, slippery to pin down. Common consensus tells us it is the study of big data, and predictive as well as analytical statistics. We hear the words ‘machine learning’ thrown around, we know that it has something to do with modelling and algorithms. We are almost sure that the field intersects with almost every other field, that any business worth their salt would do well to employ a data scientist, and yet, when questioned on what the study of data science itself constitutes, more often that not our first response is hesitation.
Surrounded on all sides by synonymous fields, it slides between statistics and software, differentiating itself perhaps only by name rather than content. Certainly, the debate as to whether data science is an enlargement, subset, or separate altogether from the field of statistics is heated.
Part of what makes definition so nebulous is that data science is still in an emergent state. It resists definition because it has not been enacted enough to be defined. The sibling term ‘data analysis’ itself was coined only in 1962, while the term ‘data science’ itself appeared in 1996.
Why is this?
Clearly, data has existed before data science, and just as clearly ways to study data have developed in the histories of mathematics and science. It is, however, only recently in human history that we have the ability to record and draw value from the volume of data that requires, and perhaps defines, data science. Behaviour, practices, everything we do now leaves a mark of data, constructing a picture that requires the interpretation of the data scientist. Arguably more importantly, data has gone from static to responsive. The instant processing of data, and the feedback loop that produces where consumers ‘speak’ as it were, to enterprises, through data, elevates the importance of data processing and analysis.
But what does it even mean?
Even the most enthusiastic proponents for the term ‘data science’ will concede that it is hazily defined at best. The term is at once specific and broad, and seems to be trying to capture a career emerging in the industry rather than involve itself in the process of its creation. As Harlan Harris writes in “Data Science, Moore’s Law, and Moneyball”: “’Data Science’ is defined as what ‘Data Scientists’ do.” This tells us effectively nothing, apart from the fact that the doing comes before the defining. In fact, perhaps this whole foray into definition is just a long, rather tedious and purposeless game of catch-up, a linguistic exercise in categorisation that serves only to confuse rather than clarify.
But this explanation is hardly satisfactory, especially when we consider that data science is not new, or at least is blurry in the sense of what exactly about it is not borrowed from pre-existing disciplines. If it’s not new, then why the renaming? Are we in fact recursively defining, as some would suggest, the field of statistics with the more attractive, contemporary name of data science? Are we simply putting a sexy sticker on an ancient hood?
Perhaps not, although there is certainly vast overlap between data science and other fields. However, there’s no denying that there is something new in data science, something about the career path that is eclectic and targeted towards specific issues of the age.
One might propose that data science’s newness lies primarily in the broadness of its discipline and the specificity of its subject matter. It is, as UNSW itself proudly presents, ‘interdisciplinary’, combining fields that have traditionally been neighbours into a single entity. It squashes the metaphorical sandwich of business, mathematics and computer science into a flat plane, or a disgusting lunch.
But its subject matter, the study of and extraction of information from data, big data, is more limited than any of these disciplines might cover by themselves. It casts a wide net upon a small boat, being a sort of Frankenstein’s monster pulled from the living bodies of other fields. The originality of the subject lies perhaps in the direction of its purpose and the broadness of its toolset, but the tools itself, and even the purpose upon deconstruction, are pre-constructed rather than freshly built. Then again, this is not an indictment. In a somewhat apt comparison, as Isaac Newton ‘stood on the shoulders of giants’, data science hinges and grows on the work of its predecessors. The only difference is that Newton was a sequel while data science is more of an offshoot that branched into its own tree.
So what am I meant to take from this?
Rather than tying ourselves in knots over the definition of data science, or falling into poststructuralist anxiety over the relativity of language, perhaps it is better for everyone to simply focus on the practical. Data Science, whatever it is defined as, cannot escape being what the data scientist does. Its meaning for each individual will sharpen as they enter into their niche in this confusing, vast field. Besides, isn’t there something magical about it all - that in the practice of data science, we are in fact, constructing it?