Research Plan

 How the ways of processing of digital data change knowledge landscapes in the West 

Abstract

This plan aims to analyze how the two poles of data processing, big data and argumented reality (AR), affect cultures and societies in USA and Europe. Big data is a collection of huge data from different categories and current technology can deal big data quickly to make it accessible for ordinary users, such as Google search services. Argumented reality is the image processing that combines physical and virtual objects and users can interact with them via their mobile’s camera. The former requires data centers to calculate the data and the latter runs on user’s end devices. Their  

Past researches on how technology changed our daily life focused on the ability brought by technology, such as Twitter and Facebook help the communication in Jasmine Revolution and Occupy Wall Street. The technique and knowledge of the technology were ignored in social science and humanity. The ways of data processing involved in the construction of knowledge and the calculation of huge amount of data led to the quality changes of knowledge. This plan focuses on the process and technology, and I want to present the hybrid of the views of humanity and technique how to help us to see how technology shape the new faces of the Western cultures in new angles.    

Keyword: Big data, technology, argumented reality, data processing

Introduction

This plan aims to study the data processing and its effect on the Western cultures and societies. The motive of this plain was incepted from my PhD project about urban Wi-Fi landscapes. In the project, I collected Wi-Fi information in cities around the world via fieldwork, and then converted them to web colour grids to create contemporary urban landscapes. I adopted the identifier code as a classification unit for a house, which is the elemental unit in anthropology’s studies on ‘house societies.’ In those studies, anthropologists have pointed out that kinship in some societies is constructed by living and sharing food in houses. I appropriated this theory to view Wi-Fi access points as houses that share internet connections, and this sharing confirms kinship. In this house metaphor, Wi-Fi users are a hybrid of flesh and a Wi-Fi device, similar to Donna Haraway’s ‘cyborg’ and to Niklas Luhmann’s ‘posthuman.’ Because there were different types of Wi-Fi networks used in the cities studied, the final landscapes in the different cities displayed a diverse urban personality in terms of Wi-Fi usage. 

The one of main differences of Wi-Fi landscapes between Taipei and London/New York is the visibility of Wi-Fi access points. In Taipei, Wifly’s Wi-Fi boxes were attached to the poles of road lamps by the collaboration of government and telecom companies; and Wi-Fi routers were settled in the visible locations in stores. In London and New York/Chicago, Wi-Fi mark on signs (the collaboration of The Cloud and London City), stickers (Starbucks and Costa Café) or blackboards (pubs in London) were almost the only visual landscapes. The difference does not only involve the issue of public/private spheres, but the data of geographical information was stored and processed in different ways. In Taipei, physical and visible Wi-Fi machines are the data source, but figurative sings and on-line Wi-Fi maps are the main information source in London and New York. Both UK and US’s cities convert physical data into digital data and managed them without additional visual assistants, such as boxes. In other words, Wi-Fi access points are the infrastructure in Taipei, but they are digitalized data connecting to other data in the internet in London and New York. The conversion between physical objects and digitalized data was the representation of the cognition based on cultures and societies. Besides huge data, the application of argumented reality that combines physical surroundings and virtual objects via camera developed with the mobile phones also attracted me to study the data processing to observe the changes in the West.

Literature review

There are four aspects to observe and analyze how data processing affects the America and Europe:

  1. The limit of technology and its developments.
  2. The economic and research landscapes reshaped by the technology
  3. The gap between computer science and social science
  4. Data visualization in computer science and art.

  1. The limit of technology and its development

To understand which data processing affect the Western, the change of technology should be considered. The barrier of technology knowledge stopped the non-computer-science realize and evaluate the technology development properly. The popular keyword is cloud computing, but few people really understand what it means. Cloud computing was composed of computing and storage. Its computing could be run on multiple computers, including physical and virtual ones, to improve the speed. The other is database. Most databases were adopted in past 10 years are relational databases, and the data in this type of databases were relational. Its limit was the speed and storage size. The database in cloud computing is document-oriented one and every item has its independent file. Because it is file-type, its size could be scaled with the size of storage media. Document-oriented data is similar to Gilles Deleuze and Félix Guattari’s rhizome concept. They adopted this term to describe non-hierarchical data and knowledge and to emphasize multiplicity, as in music having various aspects.

  1. The economic and research landscapes reshaped by the technology

Most people may not feel the change of data storage and processing, but it leads to different development of computer industry, such as big data centers and how data was conceived and dealt with. Most data centers were built by technology giants, such as Apple, Google and Amazon. These centers were distributed in different countries and they are special types of subtle international constructions. Traditional global companies had their factories in developing or undeveloped countries to reduce their cost, but these new constructions aim to provide fast connection and data backup for deal with their global users. They have created new type of economic landscapes. Besides industrial aspect, another main effect of modern data processing is this technology allowing huge data processing in humanities and computer science beyond we used to. “Digging Into Data” (2013) is a competition supported by Office of Digital Humanity at the National Endowment for the Humanities in USA. Its funders come from Canada, the Netherlands, the United Kingdom and the United States. This competition aims to promote how ‘big data’ gradually changes the studies of humanities and social science to new research landscapes. The projects in this competition covers different themes from exploring authorship in visual arts in ‘15th century manuscripts, 17th and 18th century maps, and 19th and 20th century quilts’ (Rehberger 2010) to how railroad shaped modern America via ‘ a vast collection of textual, geographical and numerical data to allow for the visual presentation of the railroads.’(Thomas 2010)  ‘Big data’ is a hybrid of new technology to calculate and analyze huge data at the same time. For scholars outside computer science, ‘big data’ may be just another database with huge data, but big data is beyond database. It involves how to arrange the data to extract information in specific ways.

  1. The gap between computer science and social science

Although the above competition aims to bridge social science and computer science, there is always a gap between the studies of computer sciences and humanities. For example, in emotion studies on music, computer science focuses on how emotion can be classified in specific taxonomy and how related information can be retrieved automatically, creating a universal way to understand music. For example, Zhijung Zhao et al. (2010) have explored how to detect emotions in Chinese and Western music based on timbre, rhythm, and pitch. They used four mood categories (anxiousness, depression, contentment, and exuberance) to classify the music in their auto-detection system. Although their mood classification system is simplified, traditional structured classification cannot be used to integrate their research into the existing system.

Besides Benjamin Fields (2011) combined content-based information retrieval, graph networks, and social networks to explore how best to generate contextualized playlists for users. His study focused on personal music taxonomy rather than universal music taxonomy. Data from ethnomusicology has exposed shortcomings in existing music taxonomy. Linton C. Freeman and Alan P. Merriam (1956) considered “the individuality of esthetic expression as it is shaped by the customs of a particular group is more sharply established.” They computed frequencies of use of major seconds, minor thirds, and total intervals in 20 Trinidad Rada and Brazilian Ketu songs. After weighting various measurements via a lambda score, the statics displayed the differences between songs in these two areas. They then attempted to solve the classification challenge through statistics rather than distinguishing songs via classification terms. Computer scientists attempt to construct a universal classification system via specific term and anthropologists used the similar statistics way to find out local features in songs’ context. This is not only the debate between quantification and qualification, but also caused by different disciplines seldom knowing each other. The gap will be a noticeable issue in how various subjects affect others and themselves.

  1. Data visualization in computer science and art.

Data visualization is important to understand the data processing and it constitutes the elements of this big data era. Visualization is for extracting information and it also reflects how data are viewed in different cultures. SIGGRAPH (Special Interest Group on GRAPHics and Interactive Techniques) and its related international events are held annually to present visualisation in computer sciences. But they have added ‘Art Gallery’ entity for artworks generated by new technology. It drew art and technology closer and implies art is another research way to develop another direction of technology.

Argumented reality (AR) is another type of data visualization. Through the lens of camera, AR can combined predefined objects (such as buildings) and specific 2D/3D images and both interact each other. For example, you can use your smartphone’s camera to focus on a table, then you can play tennis ball with one virtual ball and a net on the screen. Physical objects became data immediately via the camera and they were calculated in users’ personal devices. The visualization break the boundary of reality/virtual and macro/micro data to create a new way to understand data and daily life.

The above aspects are not independent, because they relate to each other. We could benefit from the four angles to observe the changes holistically rather than emphasizing one special subject and ignoring others.

Methodology

In this plan, I will focus on the developments of following technologies and its effects in academic research and daily application.

a. Semantics on data processing: Apache Lucent project (http://lucene.apache.org/core/).

b. Searching via document-oriented data : Elastic Search (http://www.elasticsearch.org/).

c. Database: Redis (http://redis.io/) and MongoDB.

d. Data type: Big Data, Big Table.

e. Distributed Computing: Hadoop (http://hadoop.apache.org/).

Conclusion

Data processing doesn’t only affect research landscapes but also daily life, such as Google Search. Famous Google services are composed of different types of databases and the differences are difficult to be understood by the stranger in this field. But it is important for us to know how data are obtained and stored, how information is extract and how knowledge is constructed by these tools.

If we don’t understand the values of these techniques, we can’t evaluate them correctly. It will be dangerous to look them as traditional studies of technology history. Past technology could be seen and felt the impact via our bodies, such as trains transporting us to different places and print making information could be reproduced easily. Data processing is hidden by user interfaces and we can get the result as we read reports by man-made, but it involves ontology, epistemology, ecology and economics deeply. This plan aims to bridge the gap between computer science and humanities via the studies in the mentioned four aspects and explores how data processing made the contemporary western societies.

Bibliographies

Deleuze, Gilles and Félix Guattari. (1980). A Thousand Plateaus. Brian Massumi (trans.). London and New York: Continuum.

Office of Digital Humanities, 2013. Digging into Data Challenge. http://www.diggingintodata.org.

Fields, Benjamin. (2011). Contextualize your listening: The playlist as recommendation engine. Diss. Goldsmiths, University of London, 2011. Print.

Freeman, Linton C. and Alan P. Merriam. (1956). Statistical classification in anthropology: An application to ethnomusicology. American Anthropologist, New Series, 58(3) (June 1956):464-472.

Rehberger, Dean (2010). Digging into Image Data to Answer Authorship Related Questions. https://securegrants.neh.gov/PublicQuery/main.aspx?f=1&gn=HJ-50001-10.

Scaringella, Nicolas, Giorgio Zoia, and Daniel J. Mlynek. (2006). Automatic genre classification of music content: A survey. Signal Processing Magazine, IEEE, 23(2).


Thomas III, William G. (2010). Railroads and the Making of Modern America -- Tools for Spatio-Temporal Correlation, Analysis, and Visualization. https://securegrants.neh.gov/PublicQuery/main.aspx?f=1&gn=HJ-50028-10.

Zhao, Zhijun, Lingyun Xie, Jing Liu, and Wen Wu. (2010). The analysis of mood taxonomy comparison between Chinese and Western music. Signal Processing Systems (ICSPS), 2010, 2nd International Conference, 1:606-610.