Connecting the dots — Exploring the survey paper on AutoGL for Automated Machine Learning on Graphs

4 min readNov 27, 2021

Source: Understanding Graph Neural Networks

Social networks, transportation systems, financial transaction systems and biological networks everything is an example of the connected world we live in. On a daily basis, we generate large amounts of data. Graphs can be used to unravel relationships and model dependencies in problems ranging from a molecular structure in proteins to predicting a pandemic like COVID-19.

What do Graphs teach us?

Given a connected dataset, we can solve the following problems:

Link Prediction
Node classification
Community Detection
Graph Classification

Graphs provide different perspectives to a problem .Graphs help analyzing relationships and provide a way to deal with interactions among those. Tech giants use graph technology to harness the power of data connections.

Graphs provide unmatched performance, flexibility, and agility to enterprises. Graphs are used for fraud detection, real-time recommendation engines, identity and access management, retail, life sciences, supply chain management, etc.

Graph Deep learning is an active research field used for analyzing and learning over graph data. Graph Neural Network (GNN) are the intuition behind its inner workings. For modeling graph related tasks, we use GNNs. A GNN is a neural network that can be applied to graphs. It provides a convenient way for node level, edge level, and graph level prediction task.

GNNs have been popularly used for analyzing and designing connected data. Manually trying to solve graph related problems can be a computationally complex task. From manual hyper parameter tuning to designing model architecture a lot of human effort is required. There is also the risk of introducing inherent human biases into training of machine learning models. In addition to this, design of graph neural networks requires a lot of manual work and domain knowledge.

Graph Machine Learning is mainly divided into following two categories:

Node Classification: Tasks are associated with like link prediction or node classification.
Graph Classification: : Graph machine learning for tasks like graph classification and graph generation.

Automated machine learning algorithms come to rescue by reducing not only the human effort required for hyper parameter tuning, feature engineering, model development and deployment but also delivers a model with superior performance.

Thus, spectral methods to developing graph neural networks “died” out because it was computationally expensive and inherently transductive.

AutoML Algorithms for Graph Learning like HPO and NAS are bi-level optimization problem where they intend to optimize the objective function and weights so that the model achieves best result. They require enumerating and training every feasible objective which is hard and computationally expensive. Also, the search space can grow in magnitude thus posing scalability challenges.

AutoGL is based on the same concept as that of AutoML. It is a framework for automated machine learning on graphs developed by the researchers at Tsinghua University [6].

Components of AutoGL:

AutoGL Solver: An object is created to define the task user wants to achieve and AutoGL has inbuilt dataset to maintain the graph dataset given by user. Functionalities like feature engineering, hyperparameter optimization, data preprocessing and also creating an ensemble of best models can be taken care of by the AutoGL solver object.

AutoGL Dataset: It is used to maintain and build datasets for graph-based machine learning, which is based on the dataset inherited from PyTorch Geometric.

Auto Ensemble: Voting and Stacking ensemble methods are available in AutoGL. An ensemble can increase robustness and performance of model.

Auto Feature Engineering: Helps process graph data using three operation generators: creating new nodes and edge features, selectors: filtering out useless features and sub-graph generators: generating graph-level features.

Model Training: This modules trains and evaluates the model. Constructing the graph machine learning model and optimizing the process for training.

Source: http://mn.cs.tsinghua.edu.cn/autogl/

The above diagram explains the AutoGL framework. AutoGL Solver just not automates the tasks from feature engineering to ensemble modeling but also help us create a model free from all the human biases. Auto Feature Engineering, Neural Architecture Search, Hyper parameter Optimization and Auto Model and Auto Ensemble make up the AutoGL Solver which handles all the stages involved in a Graph Machine Learning problem.

Benefits of using AutoGL:

It can reduce human involvement and biases in the machine learning loop on a large scale.
Eases developers efforts to perform autoML quickly on the graph datasets and tasks.
Enhances productivity by automating difficult tasks like feature engineering.

Future Works on AutoGL:

AutoGL is being actively updated. There is still active research going on to include support for neural architecture search, large-scale datasets, and more graph tasks.

For risk sensitive application like those of healthcare and finance robustness of model is necessary generalizing that in AutoGL still needs to be done.

Challenges related to integrating hardware aware AutoGL models technique for real industrial use case still needs to be explored.

Conclusion:

The survey paper outlines the benefits of the library for automated graph learning problems. AutoGL is open source and free to use. The modular design it is easily extendable and customizable to the needs of the user. Usage of this library accelerates model turnaround time for graph data by tending to the uniqueness and complexity of graph tasks.

Note: A quick walkthrough of defining Solver and Feature Engineering to training the model is present in this link from the official documentation.

I have also created a video explanation here for this topic.

References:

[1] https://arxiv.org/pdf/2104.04987v2.pdf

[2] https://github.com/THUMNLab/AutoGL

[3] https://analyticsindiamag.com/complete-guide-to-autogl-the-latest-automl-framework-for-graph-datasets/

[4] https://autogl.readthedocs.io/en/latest/index.html

[5] https://arxiv.org/pdf/1904.09981.pdf

[6] https://arxiv.org/pdf/2103.00742v3.pdf