Machine Learning for Lunch

Machine learning is a hot topic in almost every field and is also becoming increasingly important at D-CHAB. But how can expertise be built up in a discipline that is developing too quickly for textbooks and is constantly presenting new challenges for specific disciplines? Doctoral students at D-CHAB focus on exchanging experiences and have created a special student training format called “Data & Machine Learning (ML) lunch”. We spoke to the organizer Julian Götz.

02.09.2024 by Julia Ecker

Read
Number of comments

The Data & Machine Learning (ML) lunch is a student-led project and has been running since 2022. Where did the idea come from?

We are an experimental research group in chemistry, but through my doctoral project, I drifted towards machine learning and felt a bit alone with this topic in the group. Only a few colleagues could critically review this part of my work. However, at the same time, I knew that there were people in other groups who were working on ML. So, Professor Jeffrey Bode and I came up with the idea of the machine learning lunches. Prof. Bode and Prof. Copéret then brought other professors on board who made their research groups aware of the offer. People were very open, and we soon welcomed interested people from five or six groups.

How does a Data & ML lunch work?

Most of the time, someone gives a talk, which is followed by a discussion. The speaker decides to what extent the focus is on the discussion. It can range from a 40-45-minute talk with a classic question-and-answer session to a short 15-20-minute talk with a long discussion or even an open Q&A session, something like: “I've used method XY, but I'm not sure if I'm on the right track. How would you do that?”

What is the aim of people who come to the event?

The Machine Learning Lunch is open to everyone. Some are primarily looking to exchange ideas; others are specifically looking for feedback – this was also my initial motivation – and still others prefer to listen without presenting anything themselves. It's indeed challenging to stand in front of 15 people who are all experts on the topic. However, I remember that I received a lot of suggestions and constructive criticism at my first talk.

Visualization: Inferrring meaning from chemical structure—exemplified here by sorting constituents of a virtual chemical library into structurally similar clusters—is a frequent topic in the data&ML lunches — Inferrring meaning from chemical structure—exemplified here by sorting constituents of a virtual chemical library into structurally similar clusters—is a frequent topic in the data&ML lunches (Visualisierung: Julian Götz)

Which topics are discussed? Can you give an example?

The topics come from very different areas. For example, we have already dealt with reaction data, which is data on whether a reaction gives the desired product or not. How do you collect a large amount of such data? And how do you use it in a machine learning model to predict future reactions? A speaker from a different group talked about force fields for molecular modeling. These calculations reveal the structure of a molecule on the atomic level and can be used to derive its properties. We then discussed how the force fields can be approximated using ML methods. Aside from exchanging methods, the conversation also touched on data sets that can help to improve the model.

Are the topics set spontaneously or is there an annual plan?

I usually plan three or four months in advance. Either people suggest topics, or I actively ask them. Sometimes we also have external speakers, which also requires some planning time.

You are almost finished with your doctorate. Who will take over the lead of the Data & ML Lunch?

I’m happy to say that two people will take over the organization. Shaun O’Hare from the Bode group – we are more experimentally oriented – together with Stefan Schmid from the Jorner group, which is heavily involved in machine learning and the representation of molecules. I think that will be a good combination. This way, the format will remain accessible to those interested in both experimental and theoretical aspects.

The Data & ML lunch takes place monthly during the semester and is geared mainly towards PhD students and PostDocs. Interested people can send an email to Shaun O’Hare (shaun.ohare@org.chem.ethz.ch) and Stefan Schmid (schmiste@chem.ethz.ch) to be added to the mailing list. The ML lunch dates will also be published in the D-CHAB event calendar.

Julian Götz has been a doctoral student in Professor Jeffrey Bode's group since 2019. In his research, he conducts organic reactions in miniaturized form to create large data sets on reaction outcomes. He uses these data to train machine learning models that can predict the reaction outcome for untested substrates. Machine learning is currently experiencing a major boom in chemistry but is restricted by a lack of data. Julian Götz’ research allowed him to explore the limitations of ML in chemistry.