The International Conference on Machine Learning (ICML) is considered the number one conference in its field. This year more than 1500 people from all over the world met in Lille, France to participate in the 32nd ICML event. In simple words, machine learning is related to algorithms that learn from data to predict future behavior and observations.
Well, you could ask yourself: how can this be possible? This is simply because there is a correlation between the observed data and the variables controlling a phenomenon. For example, there is high correlation and causality between weather conditions and the number of people riding their bicycles to their workplace. As a Data Mining specialist, I couldn't miss this great event, and I also brought back some thoughts for you ...
The ICML conference has a wide spectrum of machine learning topics and workshops. I joined the the Mining Urban Data workshop (MUD), which I was really looking forward to. This workshop was being held for the second time to discuss different machine learning and data mining approaches that might help to increase the quality of urban services.
In recent years, sensor hardware has been getting cheaper and cheaper, and this has enabled the collection of huge amounts of data related to the mobility of people, public transportation, and the environment and ecology. Since urban data is normally noisy and lacks proper labeling information, typical machine learning approaches face many problems when dealing with such data; facing these challenges was the scope of the workshop.
Keynotes - Where the inspiration comes from: Using Mobile Data to Predict User Interaction with a Certain Event in a City.
One of the most interesting key talks was from Eleni Pratsini (IBM), in which she described a study on using mobile data to predict user interaction with a particular event in a city.
Although it is a nice idea, there were a couple of valid questions which were discussed after the talk. The first one relates to privacy. To overcome such a problem, companies try to collect mobile data from African and Asian countries, mainly China, where privacy issues are not as hot as they are in Europe.
The second question is how to get this kind of data. Currently, only big companies like IBM and Google can access and make use of this data, leaving less opportunities for research institutions and small to mid-sized companies.
Another interesting talk was held by Boris Chidlovskii (XEROX) on improving trip planning by the history of traveler choices. He pointed out real use cases from Nancy and Adelaide, where there is a big gap between trip planning and the travel preferences or actual choices made by the users.
The major problem for adapting their solution is the lack of mobility data. In addition to this, public transportation companies normally do not know the ultimate destination of the users. A user in Germany, for example, buys a ticket specifying the source station and the number of zones to travel. As a result, there is no way to know which buses or trains a user took to reach his/her destination.
There was also a interesting project presented by Indre Zliobaite and colleagues from Aalto University. She showed the relationship between public transportation infrastructure and house prices in the Helsinki region. Part of her talk was a demo related to public transportation isochornes. This is map where areas of similar travel times, from your current location, are connected through color coding.
Finally, I would like to point to a great experience presented by Joelle Pineau from McGill University. Basically, the city of Montreal published a huge amount of data. This includes data related to infrastructure, housing, urban planning, economy, human resources and many more.
Joelle took this open data and created a workshop with students attending the machine learning course. The main idea was to let the students come up with new ideas to link machine learning with the open data of the city of Montreal. If you are interested on the background of this, there is a paper on Analyzing Open Data from the City of Montreal from this project.
Personally, I was impressed by the wide variety of ideas proposed in the workshop, including real state analysis, transportation, predicting the future of the city, food safety, and many more: a complete list can be found in their paper using the link. This also encouraged us at moovel lab to get in touch with universities interested in Data Mining to create a workshop similar to the Speculative Design Workshop we held this spring, but with Data Mining students.
Recently there is more and more interest in data analysis through the field of smart cities. However, there are still a couple of problems.
... there is a big gap between public transportation companies and research institutions. The issue is that such companies do not give their data for research purposes, to avoid privacy issues. On the other hand, big companies like IBM and Google invest and hire to achieve predictive mobility services from data. However, as yet most public transportation authorities still do not see the advantages in this type of research.
... there is not much research in the fields of data mining and machine learning that targets smart cities. We believe that the more public data is made available by authorities and transportation companies, the more people will be interested in studying these areas.
... we notice that more and more cities in Europe are opening their data to the public. However, many cities, especially in Germany, still hesitate to do this.
In one sentence: Privacy issues are still a bottleneck in bridging the gap between the world of machine learning and urban mobility, and smart cities.