Human-robot interaction research at our company

In the summer of 2019, an enthusiastic student approached us. Marcell Balogh is studying artificial intelligence at Aalborg University. More precisely, he deals with the research of human-robot interaction. He was looking for a traineeship place in Hungary where he could research these AI technologies during summer.

Marcell Balogh from Aalborg University, deals with human-robot interaction researches.
Marcell Balogh, Aalborg University

The development of our company is a voice bot capable of human-robot interaction, which can be integrated, for example, into humanoid robots or into any other platforms that has a microphone and a speaker. For us, are these kind of university researches extremely useful. Above all, artificial intelligence holds great opportunities for humanity.

That’s why we received Marcell’s research with great interest. Speech recognition and synthesis, basically human-robot interactions, is one of the main development areas of our company.

In this post, I’d like to present Marcell’s research.

What is the problem you want to solve?

“From 2010 to the present, speech recognition and natural language processing (NLP) have become essential functions for systems and machines that interact with humans. And these technological revolutions pose new challenges too, such as voice identification of the speaker.”

NLP is a common area of AI and linguistics. Natural languages have evolved over millennia in human communication. But there are many elements that are difficult to interpret for computers. NLP is a common area of AI, linguistics and software development too. It aims the interpretation of natural languages by using technological tools. Learn more about the topic: University of Szeged Institute of Informatics; Budapest University of Technology and Economics Department of Automation and Applied Informatics

What was the focus of your human-robot interaction research?

“During my traineeship, I focused on how to implement a speaker recognition system using a convolutional neural network taught with voice data. The task of speaker recognition is to solve the problem of individual identification based on the sound quality characteristics of the person.”

The convolutional (artificial) neural network is an information processing tool. It belongs to the artificial intelligence technologies, it enables machine learning. It is made up of neurons, like the nervous system, and has a learning and retrieval algorithm. More information: Artificial Intelligence Electronic Almanach, 1.1. Definition and operation of the neural network.

What is the difficulty in identifying a person during a human-robot interaction?

 “Experience with machine learning shows that neural networks have low sampling efficiency compared to humans, because they use huge amounts of learning data to achieve high performance. However, collecting tagged data is too expensive and time-consuming.

So the solution is the following: 1. teach the machine to make difference between the characteristics of individual sound qualities, 2. identify the speaker (identifying someone by listening to his voice once).

Transfer learning and metric learning are promising areas for active research to improve the sampling efficiency of learning algorithms. These approaches are based on the intuition that a new task can be easily learned after completing or presenting different tasks. As the use of preliminary knowledge contributes to the development of learning.”

How was the reception at the university?

“The teacher liked the project. They acknowledged that I implemented the neural network from scratch and welcomed the way I processed the samples. They were also satisfied with machine learning. For the purpose of my university studies, Netlife Robotics was an extremely beneficial choice. So, I would highly recommend the team to other students of robotics or software. Humanoid robotics is currently a dinamically developing area, and the company is making unique, pioneering developments.”

What are your plans for the future?

“So far, I have specialized myself in machine learning, neural networks and I continue my work in this area. Mainly in generative models and “trial and error” learning topics. I don’t have a specific project currently, but I’ll write my thesis on how to improve the user experience of conversational AIs (Siri, Google, Alexa, Cortana or even the Netlife Robotics voice bot) through automatic visual content generation.

After univerity, I would love to get a job at Netlife Robotics. During those months, I also learned a lot about machine learning put into practice. I want a job where I have the opportunity of constant improvement.”

We are also very satisfied with Marcell’s results, in practice we got to know an absolutely usable solution.

For more about our products or research, please contact us!