AI – An integral part of our daily life
Synthetic data, AutoML method, SAME framework / REINFORCE AI CONFERENCE 2022
08. 04. 2022.
At the outstanding AI event in March, the Reinforce AI conference, a number of Hungarian and international artificial intelligence experts introduced the latest achievements in the field to those interested. They drew attention to the problems to be solved, suggested solutions, and showed how artificial intelligence relates to our daily lives and how it affects that.
As in the previous year, the event focused on the ethical issues, considerations and possible solutions surrounding AI methods. This is also supported by the fact that 3 out of 15 presentations focused only on this subject, while it was a recurring theme in the further lectures as well.
In addition, the event focused on the vision of artificial intelligence and the role of the AI community, as well as technical solutions that offer automated ways to run AI applications in a live environment. For example, we saw a live demo by Javier Blanco Cordero, a senior data scientist at Quix, and Tomas Neubauer, co-founder of Quix, showcasing the capabilities of a product they developed with semantic analysis of messages from a real-time chat application.
Use of synthetic data – Alexandra Ebert
Our data miners were also able to gain insight into the use of synthetic data, which is a very popular topic and is closely related to ethics. Alexandra Ebert, a senior trust officer who previously researched the operation and use of AI systems with regard to GDPR, gave an excellent presentation on the benefits of synthetic data.
Ebert talked about the challenges those people face who want to create business value by using the large amount of data they collect. At first it may seem that the difficulty lies in obtaining the data, but this is no longer necessarily true today. The problem is that, in addition to the GDPR, there are several other regulations in place that dictate what companies can use the data for, and these are generally inconsistent with the goals of each AI project.
According to the expert, despite the fact that companies have more and more data from their customers, they can use less and less of them due to regulations. Of course, methods for “anonymizing” data have long existed, but their use generally prevents them from modelling this anonymous data. According to Alexandra, the solution is to use synthetic data, which means that we create machine learning models based on existing sensitive data that will be able to generate data very similar to the original data.
Since the data created cannot be traced back to the entity in any way, – which is described in the original sensitive data – but the structure of the resulting data is the same as the original, it can be used freely and efficiently in the development of real applications.
This method has the advantage not only if we want to work with sensitive data, but also if we have little data available. In this case, we can generate synthetic data to expand our data set. This is closely related to ethical considerations when, for example, we want to make social or human-related predictions and estimates using our AI model. This is because we can ensure that the affected groups of people are equally represented in our data set by creating synthetic data points based on the data points of the under-represented classes. With this, we can take a big step towards making the model non-discriminatory, as it allows us to have the right amount of information about everything and everyone.
No new knowledge was missing in terms of technical solutions. The presentations have shown that a significant part of the profession is looking for a solution to the problem of how to simplify data mining workflows. This is, of course, a very multifaceted task, with many possible approaches.
AutoML method – Erin LeDell
Erin LeDell, a leading machine learning scientist, introduced the so-called AutoML method, which can be used to automatically create machine learning solutions. The user only needs to provide the data, while the computing infrastructure, model selection, teaching, fine-tuning, and evaluation are provided by the platform. With this method, even without programming knowledge, professionals may be able to create and apply models with outstanding performance, which has the great advantage of saving a lot of time and human resources for the company. Another advantage is that the platform evaluates the completed model from an ethical point of view, which is far from an easy task, but still a basic requirement today.
Ploomber Framework – Eduardo Blancas, SAME Framework – David Aronchik
Eduardo Blancas, founder of Ploomber, and David Aronchik, director, and development manager of Protocol Labs, gave presentations on one-one framework. Both frameworks aim to simplify the work of data mining so that the code written by prototyping and experiments can be used immediately in a live environment. This is accomplished by integrating the development environment with platforms that provide the live environment.
This is useful because it is enough to write the code well once and we can immediately present a “production ready” solution that can be modified at any time, with small changes, in real time, and the results can be made reproducible. Modifying or replacing the runtime environment is also easy with these methods, as it is sufficient to specify only configuration files, and the frameworks run our program based on this, allowing the resulting service to provide all the performance that fulfils every need.
Author: Norbert Kiss, eNet’s Data Miner colleague