Model Before Mine? Text Visualization of EDM Proceedings
EDM’2009 is already behind us, but the memories from lovely Cordoba (Spain) and from the well-organized and fascinating conference are still with me… I’ve met a lot of very nice people, met again very nice colleagues, heard many intersting lectures, and… ate a lot of good Spanish food.
The conference proceedings are online, and this is a great opportunity to try again the cool online (and free) tool for text visualiation, Wordle. All I did is a simple Copy&Paste of the full proceedings file with its 146,506 words (according to MS Word count), and then played a bit with the layout, color and font options. The resulted visualization – in which a word’s size is corresponding to its frequency – is very interesting. Click on the thumbnail below to see its in full.
The next step was quite obvious: Repeating this visualization with EDM’2008 proceedings (123,672 words, MS Word count)… Here is the result (click on the thumbnail to see a large version):
Next and last step was to check the most common words, so I visualized for each of the proceedings the top 5 words. Here is the result (click to enlarge):
EDM’2008 Top 5 Words:
The only-one-word difference between these two statistics might suggest that last year we were focused on the models, while now we’re focused on the mining. Is it really so? An intensive qualitative/quantitative research (using Data Mining?) is needed to answer this question…