Two news articles published during the last few weeks have attracted my attention. Although not mentioning edumining research or research of logged online activity at all, they (immediately) made me wonder about several aspects of our research. Let me just summarize the two stories:
- Speed tracking technology using (existing) cellular broadcasts. “Or Yarok” (in Hebrew: green light), an Israeli non-profit road safety organization, promotes a revolutionary technology: using existing cellular broadcasts for measuring drivers’ driving-speed. This application is quite simple to understand. Cellular broadcasts can be gathered and used for knowing the position of the car in certain times. Given coordinates and timestamps of a certain car traveling between them, and knowing the physical figure of the road - velocity calculation is enabled. This way, the organization’s researchers can measure average driving speeds for each road (for which data is available) and may, for example, correlate it with number of accidents occured on that road.
- Restaurant of the Future knows exactly what, when and how (but not why) you eat and drink. This fascinating project, held by Wageningen University (Netherlands), is probably one of its kind (if not counting The Truman Show). The project description, citing from the project’s homepage, is as simple as that: “The Restaurant of the Future is not just a place to experiment with new food products, preparation methods and self-service systems, but also a facility allowing close observation of consumer eating and drinking behavior”. If not understood, “the restaurant” tracks each and every movement (both of food and of people) within it. This complicated research facility, as summarized in the article published in the New York Times, aims on answering a simple yet complicated question: what makes people eat and drink the way they do?
One main difference distinguishes between these two studies reported, and it is due to this difference that the first one got many aggressive talkbacks in the online version of the newspaper, while the second one was treated as an anecdote: while every visitor in the Restaurant of the Future is a research subject by agreement, none of the drivers the cellular broadcasts of whom were analyzed knew about any use of their cellular (talks-independent) activity.
* * *
Before inferring anything on this blog’s domain, let me first examine some more similarities and differences between these two researches:
Data collected is “anonymous”. It is clear that data regarding the same person should be recognized as such, hence data atoms cannot be identified by random numbers. However, there is no need in recognizing the data with a “true” person, and it is not relevant who was the person driving the car (or even which car was it) or who is the one eating the sushi. Any identifying parameters may be replaced by random identifiers (keeping in mind this replacement should be injective, at least for each “session”).
Costs of research pretty much differ. The Wageningen sushi restaurant’s project is estimated in 2.3 Million Euro (according to this report). It is clear that the cost of the driving-speed research is much lower. The main cost-gap is, obviously, in the collecting mechanisms.
Scaling. Both the restaurant and the roads researched have physical capacity limitations, being serial in nature (each coordinate in every lane of the road can “carry” at most one car at a certain time; number of people dining at the restaurant in any given moment is, of course, limited by terms of chairs), and therefore the research population size is, theoretically, limited by a number known in advance.
Research questions may be asked in retrospect. In both cases, the data is collected in the most atomic level possible by the research designers, allowing them later to ask many quetsions they didn’t think of beforehand.
Nature of data collected. The cars research uses very basic information, each raw of which documents (so I guess) mainly place and time indicators. The restaurant research uses much more complicated data (imagined e.g., 3-D location, weight, facial expression, tables/chairs configuration).
Continuousness of data collecting. In both cases, data is collected continuously, but in contrast to the 24/7-available roads, the restaurant has specific opening hours, out of which data is not collected.
Multidicliplinarity. The restaurant research is truly multidisciplinary (e.g., psychology, anatomy, computer science, culinary). The velocity research is quite straightforward.
* * *
I must admit now that I really intended to infer from these two totally-offline-totally-not-educational researches on edumining, but after revisiting the points I’ve just named, it seems really unnecessary. Those topics of comparison (and maybe some more) refer to each and every one of our researches as well. Although I wanted this post to discuss the great benefits we have using Web mining techniques in education comparing to similar research in the “real world”, I’m now standing - after finishing it - quite confused. It seems that research is research is research, no matter what is the subject matter, which is the population investigated and which methods are being used.
Each topic mentioned above referring the velocity/eating research, might also be (and I’m sure is indeed most of the times) in mind when planning a research in edumining. There is no set of answers relevant to all the edumining studies: some of them are anonymous by the nature of the data and some of them use data that should be anonymousize (is there such a word?!); some logs are cheap to collect, but some actions might be expensice to track; some Web-based learning enviroments are parallel to all users (e.g., fully on-line course), some are quasi-serials (e.g., Wiki platforms); often research questions may also be asked even after data was collected; there is huge differences in the complexity of the logged data, and it is sometimes being fit to the research purposes; some online educational media are open to the public 24/7, but some are limited by time, space (e.g., by IP) or identity (e.g., for current students only); and although edumining is mutidisciplinary by its nature, some research within it uses only a small portion of the big package.
So, is edumining special in any manner? I guess there is a short answer and a long answer to that question. The short one will be: “Yes!”, the longer: “Well, it’s hard to explain it in only a few words at the botton of a long post; better to dedicate a special post for it”.