Graph Embeddings for Non-IID Data Feature Representation Learning

Research output: Chapter in Book/Conference paperConference paperpeer-review

Abstract

Most machine learning models like Random Forest (RF) and Support Vector Machine (SVM) assume that features in the datasets are independent and identically distributed (IID). However, many datasets in the real world contain structural dependencies so neither the data observations nor the features satisfy this IID assumption. In this paper, we propose to incorporate the latent structural information in the data and learn the best embeddings for the downstream classification tasks. Specifically, we build traffic knowledge graphs for a traffic-related dataset and apply node2vec and TransE to learn the graph embeddings, which are then fed into three machine learning algorithms, namely SVM, RF, and kNN to evaluate their performance on various classification tasks. We compare the performance of these three classification models under two different representations of the same dataset: the first representation is based on traffic speed, volume, and speed limit; the second representation is the graph embeddings learned from the traffic knowledge graph. Our experimental results show that the road network information captured in the knowledge graphs is crucial for predicting traffic risk levels. Through our empirical analysis, we demonstrate knowledge graphs can be effectively used to capture the structural information in no-IID datasets.
Original languageEnglish
Title of host publicationData Mining - 20th Australasian Conference, AusDM 2022, Proceedings
EditorsLaurence A.F. Park, Simeon Simoff, Heitor Murilo Gomes, Maryam Doborjeh, Yee Ling Boo, Yun Sing Koh, Yanchang Zhao, Graham Williams
PublisherSpringer Science + Business Media
Pages43-57
Number of pages15
ISBN (Print)9789811987458
DOIs
Publication statusPublished - 2022
Event20th Australasian Data Mining Conference, AusDM 2022 - Western Sydney, Australia
Duration: 12 Dec 202215 Dec 2022

Publication series

NameCommunications in Computer and Information Science
Volume1741 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference20th Australasian Data Mining Conference, AusDM 2022
Country/TerritoryAustralia
CityWestern Sydney
Period12/12/2215/12/22

Fingerprint

Dive into the research topics of 'Graph Embeddings for Non-IID Data Feature Representation Learning'. Together they form a unique fingerprint.

Cite this