Latest Publications

Social Networking Might Reveal Your Hidden Secrets

Many data mining applications deal with the very nature of data (including meta-data) about objects. This is, of course, true also for educational data mining. If we browse through EDM’09 proceedings, we can easily find that most of the articles are indeed of this type, where the main subjects are, in most cases, students. Here are just a few examples for the data researched:

  • Student’s skill knowledge
  • Student’s choice to go off-task
  • Symptoms of low performance [of students]
  • Students’ drop out
  • Students’ knowledge and learning
  • Students’ pace
  • Students’ consistency
  • Students’ mental models

In all of these examples, not only the discussed measure was of students’, but the data itself was constructed upon data of students. Of course, it is not possible to predict students’ behavior without researching students’ behavior, but are these measures enough? This question is very interesting as some learning configurations and learning environments enable us to learn more about students not only from their own behavior, but also from the social network underlying their collaboration or mutual actions. Two different works of that type are worth mentioning.

MIT Gaydar. Although not being published (yet?) in a scientific journal, a students’ project in MIT has suggested a very interesting, not to say revolutionary, result: Information about Facebook users and their friends in this social network might reveal sexual orientation, and in particular might point out gay men, even when they have not indicated this fact in their profile. It might be criticized that many methodological details should be improved, however the idea and its implementation are definitely intriguing (and, of course, raise a lot of ethical questions).

Discovering missing links in Wikipedia. Wikipedia, and wiki-based applications in general, have been a fertile ground for dozens and hundreds of studies from all kinds of point of views, including from the educational angle. This particular research (Adafre & Rijke, 2005) uses similarity between Wikipedia pages – i.e., finding clusters of pages by their content – for discovering missing links in a certain page (according to the links in its cluster members). Putting it in other words: Data about a page’s “close friends” reveal some important hidden information about the page itself.

Attempts have already been done in the direction of understanding students’ collaboration – e.g., in (Talavera & Gaudioso, 2004; Kay, Maisonneuve, Yacef & Zaiane, 2006) – however, it seems that mining social networks is somehow different. If we borrow similar ideas to those presented in the Gaydar, Wikipedia examples above, we might think of a few research directions using data mining methodologies for studying social networks in the learning/teaching context:

  • Predicting students’ success/failure by analyzing their online collaborators’ grades in a wiki-based learning environment;
  • Developing a homework recommendation system based on what your Twitter-followers twittered;
  • Updating a student model according to the student’s friends’ models.

These are, of course, only a few provocative(?) imaginary examples. It is clear that as we enrich our sources of information, research and its applications would only benefit. However, we should consider that not only direct data about the students may reveal important information about them, but also that indirect data may lead to some very direct conclusions. As the old saying states: “Tell me who your friends are, and I’ll tell you who you are.”

Bookmark and Share

Model Before Mine? Text Visualization of EDM Proceedings

EDM’2009 is already behind us, but the memories from lovely Cordoba (Spain) and from the well-organized and fascinating conference are still with me… I’ve met a lot of very nice people, met again very nice colleagues, heard many intersting lectures, and… ate a lot of good Spanish food.

The conference proceedings are online, and this is a great opportunity to try again the cool online (and free) tool for text visualiation, Wordle. All I did is a simple Copy&Paste of the full proceedings file with its 146,506 words (according to MS Word count), and then played a bit with the layout, color and font options. The resulted visualization – in which a word’s size is corresponding to its frequency – is very interesting. Click on the thumbnail below to see its in full.

Wordle: EDM'2009

The next step was quite obvious: Repeating this visualization with EDM’2008 proceedings (123,672 words, MS Word count)… Here is the result (click on the thumbnail to see a large version):

Wordle: EDM'2008

Next and last step was to check the most common words, so I visualized for each of the proceedings the top 5 words. Here is the result (click to enlarge):

EDM’2008 Top 5 Words:
Wordle: EDM'2008 Top 5

EDM’2009 Top 5 Words:
Wordle: EDM'2009 Top 5

The only-one-word difference between these two statistics might suggest that last year we were focused on the models, while now we’re focused on the mining. Is it really so? An intensive qualitative/quantitative research (using Data Mining?) is needed to answer this question…

Bookmark and Share

Special Issue on Web Mining and Higher Education

The Internet and Higher Education Special Issue on: Web Mining and Higher Education Call for Paper

Guest Editor: Rafi Nachmias, Tel Aviv University

Click here to the journal homepage

The purpose of this special issue is to promote the understanding of Web Mining as a novel and useful research methodology for investigating aspects related to the various usages of the Internet in higher education. Web Mining (or Web Data Mining) is the application of Data Mining tools and techniques to discover novel and potentially useful information from data drawn from the Web.

Traditionally, Web Mining consists of three types: Usage Mining, Content Mining, and Structure Mining, each of which uses different data: Log files describing activity within Web pages, text from Web pages, and information about connectivity of Web pages, accordingly. As a result, many different levels of e-learning might be researched, allowing diverse points of view for instructors, researchers, curriculum developers, learning environment developers, and policy makers.

We encourage submissions of empirical and conceptual articles which address (but not limited to) the following topics:

  • Assessing online students’ behavior throughout the learning process
  • Collaborative learning investigation
  • Cost-effectiveness of Web-supported learning
  • Measuring affective aspects of learning

Important Dates (Tentative):

  • Submission deadline: 10 January 2010
  • Authors’ notification: May 2010
  • Final papers submission: June 2010
  • Special Issue publication: January 2011
Bookmark and Share

Life Signal

Well, it’s been a while since the last post… So, this is just a life signal. There are a lot of things to tell, and only little time to tell them. Meanwhile, the to-post list is getting longer and longer. And, meanwhile, EDM’09 (The Second International Conference on Educational Data Mining) submission deadline is almost behind us already, and plans for visiting Cordoba are starting to be concrete. And since this is only a life signal, it can and should be short…

Bookmark and Share

JEDM: New Journal is Coming!

(It’s been a busy-busy period, I really hope to write more frequently…)

JEDM – Journal of Educational Data Mining – is on air. The first contect should be published on April 2009, but the excitement is already all over (or is it just me?! ;-) .

The journal will be published online and free of charge, and its Website is already waiting for the new contents. I was honored to be offered a role in the creation of this new “baby”, and I truly hope I’ll have a lot of stuff to take care of…

Here is the Call for Papers for the new journal. I see its great potential in becoming the main stage for presenting EDM studies and EDM-related discussions.

Bookmark and Share

The Chrome Rush

Less than two weeks have passed since the official launching of the new Chrome browser by Google, but it seems like it was ages ago. The three-color circle-shaped icon is all over. is currently giving more than 53,600 results when seraching “Chrome”; “google chrome” is returning more than 9 million results when searching in; and even this blog’s statistics already shows the new colored icon.

Google, however, tries not only to improve the surfing-the-Internet experience, but also to enhance the company’s learning of surfing the Internet. Chrome will give Google the chance to understand not only what people search, but also how they search it. This is done via logging keystrokes from the Omnibox (the URL box, which is not only for URLs…). According to Google, about 2% of these keystrokes will be stored along with the IP of the Chrome-installed computer. Just to be clear: These are not only the finalized search strings, but every keywtroke on the way, including those which finally are not being sent to the search engine.

Two major questions arise:

  1. What will the good people of Google do with this huge amount of data? Research-wise, I can instantly think of at least 10 ways of analyzing this data for getting insights about the users; plenty of data mining techniques might be applied for that matter. However, I have no idea as for Google’s intentions.
  2. Is it legal? Well, the answer here is quite simple: Yup! Every single user who’d installed Chrome, had accepted the End User License Agrement, in which all kinds of strange “agreemnents” between the user and the company may be found.

So, Google simply has the authority to do whatever it wants with the data of the company’s products, and the users accept it. Sounds like the wet dream of every data miner, doesn’t it? :-)

Bookmark and Share

The Big Brother-Researcher

A short article in last weekend’s edition of Haaretz’ daily Israeli newspaper was entitled “The Big Sisters”. The article mentions two complaints of an Israeli customer against two telephone companies (in Hebrew, the word “company” is feminine, and this is why they were referred as “sisters” and not “brothers”); the two cases have nothing to do with each other, and only by chance they happenned to the same woman:

  1. An Israeli student got a few calls to her cellular from a Jordan-based colleague, while the latter was travelling to Jerusalem through the West Bank; a few hours later, the cellular company representative called the Israeli’s mother (who’s the registrant of the phone), trying to understand her relation to the Occupied Territories.
  2. A representative of an Israeli International Calls Provider called that very same mother and suggested her a special discount plan for Vienna and New York; the reason for this special offer (as the representative told her): that lady had increasingly dialed numbers in these destinations during the weeks beforehand.

We might say that these two companies had used the technology in a wise way, and for the benefits of their customers (and this is clear from their response to the journalist): In the first case, the company just made sure the phone had not been stolen from its customer; in the second case, the customer had been offered with a plan to reduce the amount of money she pays to the company.

Well, what’s wrong with that? It is the customer’s reaction to the two companies-initiated calls: I’ve been tracked!

Now, let’s think about a totally different scenario, in which a student is using the imaginary ICanLearnAnyWhere Ubiquitous GPS-enabled Learning System. During his ubiquitous History class while travelling over the world, this student gets a pop-up message saying: “Welcome to Paris! Do you want to read about the history of Eiffel Tower?” What will this student feel at that very moment?

And, let’s think about a second imaginary scenario: After extensively using a High-School Mathematics VITS (Very Intelligent Tutoring System) for two weeks, one student gets the following message from the system: “Hey, we see that you are very good in Fractions, however you probably need some more practice in finding LCM; do you want to review the basics of finding LCM?”.

What will those students feel? Will they feel like they’re truly being helped by the advanced technology, or will they feel as being tracked? Will the answer to this question be different if a short notice had been presented at the beginning of the ubiquitous/VITS course, saying: “Your actions are being tracked for the benefit of helping you utilizing the system and earning higher score”? Maybe, on the other hand, such a notice will do more harm and will increase the dropping rate?

Above all, and since we know that these scenarios – both in commerce and learning – are everyday practice already, the big question is this: What is the golden path between privacy and data mining?

Bookmark and Share

EDM’08 Aftermath

The heavy white book is already on my shelf, and now citations are possible. However, EDM’08 is, of course, much more than “(Baker, Barnes, & Beck, 2008)”. The wonderful two days of EDM’08 are behind us, and this is the time to do the aftermath.

First of all, cheers to the initiators and the organizers for a very-well-done conference (for the fillet we had in the first night’s banquet, “medium” was the perfect measure, however for a conference - ”very-well-done” is indeed a compliment :-) . Interesting studies, good presentations, nice people, well-served food, beautiful location, and above all – the atmosphere, ho the atmosphere. Something new is being formed, and we are there, at the very first moments of this creation.

For now, I’d like to focus on what I may title: Bridges Over Interdisciplinarity. As far as I see it, this (i.e., the lack of them) is the main problem of our emerging domain.

The very nature of EDM is to be interdisciplinar. Just like “Bioinformatics”, “Computational Statistics”, “Music Information Retrieval”, or “Econophysics” – the name of our research area defines the two extremely-different fields that (together with some others) should synergically form a new one. However, it seems that “Education” and “Data Mining” are not yet equal in this merging equation, and that the latter is much more dominant than expected.

If EDM’08 was a reflection of the current world-wide EDM research (and I believe it was indeed), than it seems that it may be metaphorised as being built of two isolated islands: The “Education Island” (EI) and the “Data Mining Island” (DMI), and that some tour-ferries often depart from DMI towards EI, but always – as ferries do – get back home before sunset. Having two distinct islands is not the problem; the transportation between them is the main issue. Instead of the current situation, I’d prefer to see a wide, steady bridge being built, enabling a full-speed drive road between the islands.

Most of the questions asked in the CFP were of that highway-between-islands type, e.g.:

  • Can we use our discoveries to improve the software’s effectiveness?
  • Student learning data provides a powerful mechanism for determining which teaching actions are successful. How can we best use such data?
  • Can we use existing educational and psychological knowledge to better focus our search?

However, to my own opinion, many of the conference articles do not discuss such in-between questions, but rather taking only one side, choosing to settle only on one of the islands.

If EDM’08 is the mirror (and, as I mentioned before, I think it is indeed) – we are the ones who should carefully look at it. We should understand the current situation and ask ourselves if this is the desired one. Possible answers are mainly: “Sure!”, “Nay!”, “Don’t know”; and no matter what your own answer is, talking about it might help in laying down the constructions of the desired bridge. Furthermore, it might help in understanding if such a bridge is needed needed.

* * *

EDM’08 is behind us. EDM’09 is already “in the oven” (July 1-3, Cordoba, Spain). It took them only about 2 years to build the largest bridge in the world (Burapha Withi Expressway, Bangkok, Thailand). Will we finish our until EDM’10?

Bookmark and Share

See You in Montreal…

In a few hours from now I’l leave Israel for a family trip before EDM’08. My poster is ready and will be carried over miles and miles of New England region beautiful roads.

See you in Montreal on June 20-21, 2008.

Bookmark and Share

At Least One Tenth of Top Young Trainers Believe in Data Mining

Training Magazine had recently published a list of 40 of the training industry’s rising young stars (age 40 and younger). This is the “first annual list”, i.e. the beginning of a tradition. Nominated by industry peers or self-nominated, the decision about the finalists was determined by a panel comprising members of Training’s Editorial Advisory Board.

Among the Top Ten, one can find Gideon Zailer, CEO and founder of e-learning knowledge solutions of Israel. Besides being a very nice person, Gideon is very interested in data mining in the context of improving the software his company developes.

So, we can surely say that EDM is well established within the agenda of at least a tenth of the Top Young Trainers world-wide. If this is the case also for the education researchers, EDM2008 will be very crowdy…

Bookmark and Share
FireStats icon Powered by FireStats