Subscribe to the PwC Newsletter

Join the community, trending research, qa-mdt: quality-aware masked diffusion transformer for enhanced music generation.

latest research on machine learning

In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering an innovative approach to synthesizing musical content from textual descriptions.

latest research on machine learning

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative.

latest research on machine learning

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

latest research on machine learning

The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation.

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

We build our model based on the latest Llama-3. 1-8B-Instruct model.

Kolmogorov-Arnold Transformer

In this paper, we introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model.

latest research on machine learning

OmniGen: Unified Image Generation

vectorspacelab/omnigen • 17 Sep 2024

In this work, we introduce OmniGen, a new diffusion model for unified image generation.

latest research on machine learning

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

stanford-oval/storm • 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning

lamm-mit/SciAgentsDiscovery • 9 Sep 2024

A key challenge in artificial intelligence is the creation of systems capable of autonomously advancing scientific understanding by exploring novel domains, identifying complex patterns, and uncovering previously unseen connections in vast scientific data.

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours.

latest research on machine learning

Qwen2 Technical Report

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.

latest research on machine learning

latest research on machine learning

Frequently Asked Questions

Journal of Machine Learning Research

The Journal of Machine Learning Research (JMLR), established in 2000 , provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online.

  • 2024.02.18 : Volume 24 completed; Volume 25 began.
  • 2023.01.20 : Volume 23 completed; Volume 24 began.
  • 2022.07.20 : New special issue on climate change .
  • 2022.02.18 : New blog post: Retrospectives from 20 Years of JMLR .
  • 2022.01.25 : Volume 22 completed; Volume 23 began.
  • 2021.12.02 : Message from outgoing co-EiC Bernhard Schölkopf .
  • 2021.02.10 : Volume 21 completed; Volume 22 began.
  • More news ...

Latest papers

Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning Sarah Rathnam, Sonali Parbhoo, Siddharth Swaroop, Weiwei Pan, Susan A. Murphy, Finale Doshi-Velez , 2024. [ abs ][ pdf ][ bib ]      [ code ]

PromptBench: A Unified Library for Evaluation of Large Language Models Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Gaussian Interpolation Flows Yuan Gao, Jian Huang, and Yuling Jiao , 2024. [ abs ][ pdf ][ bib ]

Gaussian Mixture Models with Rare Events Xuetong Li, Jing Zhou, Hansheng Wang , 2024. [ abs ][ pdf ][ bib ]

On the Concentration of the Minimizers of Empirical Risks Paul Escande , 2024. [ abs ][ pdf ][ bib ]

Variance estimation in graphs with the fused lasso Oscar Hernan Madrid Padilla , 2024. [ abs ][ pdf ][ bib ]

Random measure priors in Bayesian recovery from sketches Mario Beraha, Stefano Favaro, Matteo Sesia , 2024. [ abs ][ pdf ][ bib ]      [ code ]

From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs Lorenz Richter, Leon Sallandt, Nikolas Nüsken , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Label Alignment Regularization for Distribution Shift Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H.S. Torr, Yangchen Pan , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fairness in Survival Analysis with Distributionally Robust Optimization Shu Hu, George H. Chen , 2024. [ abs ][ pdf ][ bib ]      [ code ]

FineMorphs: Affine-Diffeomorphic Sequences for Regression Michele Lohr, Laurent Younes , 2024. [ abs ][ pdf ][ bib ]

Tensor-train methods for sequential state and parameter learning in state-space models Yiran Zhao, Tiangang Cui , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Memory of recurrent networks: Do we compute it right? Giovanni Ballarin, Lyudmila Grigoryeva, Juan-Pablo Ortega , 2024. [ abs ][ pdf ][ bib ]      [ code ]

The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis El Mehdi Achour, François Malgouyres, Sébastien Gerchinovitz , 2024. [ abs ][ pdf ][ bib ]

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise Liam Madden, Emiliano Dall'Anese, Stephen Becker , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Euler Characteristic Tools for Topological Data Analysis Olympio Hacquard, Vadim Lebovici , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization Cameron Jakub, Mihai Nica , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fortuna: A Library for Uncertainty Quantification in Deep Learning Gianluca Detommaso, Alberto Gasparin, Michele Donini, Matthias Seeger, Andrew Gordon Wilson, Cedric Archambeau , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Characterization of translation invariant MMD on Rd and connections with Wasserstein distances Thibault Modeste, Clément Dombry , 2024. [ abs ][ pdf ][ bib ]

On the Hyperparameters in Stochastic Gradient Descent with Momentum Bin Shi , 2024. [ abs ][ pdf ][ bib ]

Improved Random Features for Dot Product Kernels Jonas Wacker, Motonobu Kanagawa, Maurizio Filippone , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Regret Analysis of Bilateral Trade with a Smoothed Adversary Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi , 2024. [ abs ][ pdf ][ bib ]

Invariant Physics-Informed Neural Networks for Ordinary Differential Equations Shivam Arora, Alex Bihlo, Francis Valiquette , 2024. [ abs ][ pdf ][ bib ]

Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspective Youssef Marzouk, Zhi (Robert) Ren, Sven Wang, Jakob Zech , 2024. [ abs ][ pdf ][ bib ]

Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression Joseph Shenouda, Rahul Parhi, Kangwook Lee, Robert D. Nowak , 2024. [ abs ][ pdf ][ bib ]

Individual-centered Partial Information in Social Networks Xiao Han, Y. X. Rachel Wang, Qing Yang, Xin Tong , 2024. [ abs ][ pdf ][ bib ]

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls Erich Kummerfeld, Jaewon Lim, Xu Shi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Continuous Prediction with Experts' Advice Nicholas J. A. Harvey, Christopher Liaw, Victor S. Portella , 2024. [ abs ][ pdf ][ bib ]

Memory-Efficient Sequential Pattern Mining with Hybrid Tries Amin Hosseininasab, Willem-Jan van Hoeve, Andre A. Cire , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao , 2024. [ abs ][ pdf ][ bib ]

Split Conformal Prediction and Non-Exchangeable Data Roberto I. Oliveira, Paulo Orenstein, Thiago Ramos, João Vitor Romano , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model Rashmi Ranjan Bhuyan, Adel Javanmard, Sungchul Kim, Gourab Mukherjee, Ryan A. Rossi, Tong Yu, Handong Zhao , 2024. [ abs ][ pdf ][ bib ]

Sparse Graphical Linear Dynamical Systems Emilie Chouzenoux, Victor Elvira , 2024. [ abs ][ pdf ][ bib ]

Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model Ning Wang, Xin Zhang, Qing Mai , 2024. [ abs ][ pdf ][ bib ]

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds Hao Liang, Zhi-Quan Luo , 2024. [ abs ][ pdf ][ bib ]

Low-Rank Matrix Estimation in the Presence of Change-Points Lei Shi, Guanghui Wang, Changliang Zou , 2024. [ abs ][ pdf ][ bib ]

A Framework for Improving the Reliability of Black-box Variational Inference Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, Jonathan H. Huggins , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Understanding Entropic Regularization in GANs Daria Reshetova, Yikun Bai, Xiugang Wu, Ayfer Özgür , 2024. [ abs ][ pdf ][ bib ]

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning Matteo Bettini, Amanda Prorok, Vincent Moens , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Learning from many trajectories Stephen Tu, Roy Frostig, Mahdi Soltanolkotabi , 2024. [ abs ][ pdf ][ bib ]

Interpretable algorithmic fairness in structured and unstructured data Hari Bandi, Dimitris Bertsimas, Thodoris Koukouvinos, Sofie Kupiec , 2024. [ abs ][ pdf ][ bib ]

FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization José A. Carrillo, Nicolás García Trillos, Sixu Li, Yuhua Zhu , 2024. [ abs ][ pdf ][ bib ]

On the Connection between Lp- and Risk Consistency and its Implications on Regularized Kernel Methods Hannes Köhler , 2024. [ abs ][ pdf ][ bib ]

Pre-trained Gaussian Processes for Bayesian Optimization Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis Yuanxing Chen, Qingzhao Zhang, Shuangge Ma, Kuangnan Fang , 2024. [ abs ][ pdf ][ bib ]

From Small Scales to Large Scales: Distance-to-Measure Density based Geometric Analysis of Complex Data Katharina Proksch, Christoph Alexander Weikamp, Thomas Staudt, Benoit Lelandais, Christophe Zimmer , 2024. [ abs ][ pdf ][ bib ]      [ code ]

PAMI: An Open-Source Python Library for Pattern Mining Uday Kiran Rage, Veena Pamalla, Masashi Toyoda, Masaru Kitsuregawa , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Law of Large Numbers and Central Limit Theorem for Wide Two-layer Neural Networks: The Mini-Batch and Noisy Case Arnaud Descours, Arnaud Guillin, Manon Michel, Boris Nectoux , 2024. [ abs ][ pdf ][ bib ]

Risk Measures and Upper Probabilities: Coherence and Stratification Christian Fröhlich, Robert C. Williamson , 2024. [ abs ][ pdf ][ bib ]

Parallel-in-Time Probabilistic Numerical ODE Solvers Nathanael Bosch, Adrien Corenflos, Fatemeh Yaghoobi, Filip Tronarp, Philipp Hennig, Simo Särkkä , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data Shuo-Chieh Huang, Ruey S. Tsay , 2024. [ abs ][ pdf ][ bib ]

Dropout Regularization Versus l2-Penalization in the Linear Model Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber , 2024. [ abs ][ pdf ][ bib ]

Efficient Convex Algorithms for Universal Kernel Learning Aleksandr Talitckii, Brendon Colbert, Matthew M. Peet , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Manifold Learning by Mixture Models of VAEs for Inverse Problems Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria, Silvia Sciutto , 2024. [ abs ][ pdf ][ bib ]      [ code ]

An Algorithmic Framework for the Optimization of Deep Neural Networks Architectures and Hyperparameters Julie Keisler, El-Ghazali Talbi, Sandra Claudel, Gilles Cabriel , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity Laixi Shi, Yuejie Chi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Grokking phase transitions in learning local rules with gradient descent Bojan Žunkovič, Enej Ilievski , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Unsupervised Tree Boosting for Learning Probability Distributions Naoki Awaya, Li Ma , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Linear Regression With Unmatched Data: A Deconvolution Perspective Mona Azadkia, Fadoua Balabdaoui , 2024. [ abs ][ pdf ][ bib ]

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit Karl Hajjar, Lénaïc Chizat, Christophe Giraud , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Sharp analysis of power iteration for tensor PCA Yuchen Wu, Kangjie Zhou , 2024. [ abs ][ pdf ][ bib ]

On the Intrinsic Structures of Spiking Neural Networks Shao-Qun Zhang, Jia-Yi Chen, Jin-Hui Wu, Gao Zhang, Huan Xiong, Bin Gu, Zhi-Hua Zhou , 2024. [ abs ][ pdf ][ bib ]

Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data Wanli Hong, Shuyang Ling , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables Feng Xie, Biwei Huang, Zhengming Chen, Ruichu Cai, Clark Glymour, Zhi Geng, Kun Zhang , 2024. [ abs ][ pdf ][ bib ]

Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks Tian-Yi Zhou, Xiaoming Huo , 2024. [ abs ][ pdf ][ bib ]

Differentially Private Topological Data Analysis Taegyu Kang, Sehwan Kim, Jinwon Sohn, Jordan Awan , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Optimality of Misspecified Spectral Algorithms Haobo Zhang, Yicheng Li, Qian Lin , 2024. [ abs ][ pdf ][ bib ]

An Entropy-Based Model for Hierarchical Learning Amir R. Asadi , 2024. [ abs ][ pdf ][ bib ]

Optimal Clustering with Bandit Feedback Junwen Yang, Zixin Zhong, Vincent Y. F. Tan , 2024. [ abs ][ pdf ][ bib ]

A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression Youngseok Kim, Wei Wang, Peter Carbonetto, Matthew Stephens , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks Yuval Belfer, Amnon Geifman, Meirav Galun, Ronen Basri , 2024. [ abs ][ pdf ][ bib ]

Permuted and Unlinked Monotone Regression in R^d: an approach based on mixture modeling and optimal transport Martin Slawski, Bodhisattva Sen , 2024. [ abs ][ pdf ][ bib ]

Volterra Neural Networks (VNNs) Siddharth Roheda, Hamid Krim, Bo Jiang , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton , 2024. [ abs ][ pdf ][ bib ]

Bayesian Regression Markets Thomas Falconer, Jalal Kazempour, Pierre Pinson , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Sharpness-Aware Minimization and the Edge of Stability Philip M. Long, Peter L. Bartlett , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization Sijia Chen, Yu-Jie Zhang, Wei-Wei Tu, Peng Zhao, Lijun Zhang , 2024. [ abs ][ pdf ][ bib ]

Multi-Objective Neural Architecture Search by Learning Search Space Partitions Yiyang Zhao, Linnan Wang, Tian Guo , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms Nicolás García Trillos, Anna Little, Daniel McKenzie, James M. Murphy , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Spherical Rotation Dimension Reduction with Geometric Loss Functions Hengrui Luo, Jeremy E. Purvis, Didong Li , 2024. [ abs ][ pdf ][ bib ]

A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks Yuxin Sun, Dong Lao, Anthony Yezzi, Ganesh Sundaramoorthi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Two is Better Than One: Regularized Shrinkage of Large Minimum Variance Portfolios Taras Bodnar, Nestor Parolya, Erik Thorsen , 2024. [ abs ][ pdf ][ bib ]

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning Jinchi Chen, Jie Feng, Weiguo Gao, Ke Wei , 2024. [ abs ][ pdf ][ bib ]

Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning Ilnura Usmanova, Yarden As, Maryam Kamgarpour, Andreas Krause , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Cluster-Adaptive Network A/B Testing: From Randomization to Estimation Yang Liu, Yifan Zhou, Ping Li, Feifang Hu , 2024. [ abs ][ pdf ][ bib ]

On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing Jiacheng Zhuo, Jeongyeol Kwon, Nhat Ho, Constantine Caramanis , 2024. [ abs ][ pdf ][ bib ]

Optimization-based Causal Estimation from Heterogeneous Environments Mingzhang Yin, Yixin Wang, David M. Blei , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Optimal Locally Private Nonparametric Classification with Public Data Yuheng Ma, Hanfang Yang , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Learning to Warm-Start Fixed-Point Optimization Algorithms Rajiv Sambharya, Georgina Hall, Brandon Amos, Bartolomeo Stellato , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks Yunfei Yang, Ding-Xuan Zhou , 2024. [ abs ][ pdf ][ bib ]

Nonparametric Copula Models for Multivariate, Mixed, and Missing Data Joseph Feldman, Daniel R. Kowal , 2024. [ abs ][ pdf ][ bib ]      [ code ]

An Analysis of Quantile Temporal-Difference Learning Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney , 2024. [ abs ][ pdf ][ bib ]

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts Isaac Gibbs, Emmanuel J. Candès , 2024. [ abs ][ pdf ][ bib ]      [ code ]

More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization Xu Liu, Heng Lian, Jian Huang , 2024. [ abs ][ pdf ][ bib ]

A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment Robert Hu, Dino Sejdinovic, Robin J. Evans , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Assessing the Overall and Partial Causal Well-Specification of Nonlinear Additive Noise Models Christoph Schultheiss, Peter Bühlmann , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Simple Cycle Reservoirs are Universal Boyu Li, Robert Simon Fong, Peter Tino , 2024. [ abs ][ pdf ][ bib ]

On the Computational Complexity of Metropolis-Adjusted Langevin Algorithms for Bayesian Posterior Sampling Rong Tang, Yun Yang , 2024. [ abs ][ pdf ][ bib ]

Generalization and Stability of Interpolating Neural Networks with Minimal Width Hossein Taheri, Christos Thrampoulidis , 2024. [ abs ][ pdf ][ bib ]

Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression Jiading Liu, Lei Shi , 2024. [ abs ][ pdf ][ bib ]

Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations Yuanyuan Wang, Wei Huang, Mingming Gong, Xi Geng, Tongliang Liu, Kun Zhang, Dacheng Tao , 2024. [ abs ][ pdf ][ bib ]

Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning Maximilian Hüttenrauch, Gerhard Neumann , 2024. [ abs ][ pdf ][ bib ]

Kernel Thinning Raaz Dwivedi, Lester Mackey , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions Xuxing Chen, Tesi Xiao, Krishnakumar Balasubramanian , 2024. [ abs ][ pdf ][ bib ]

Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks Yunpeng Zhao, Ning Hao, Ji Zhu , 2024. [ abs ][ pdf ][ bib ]

Statistical Inference for Fairness Auditing John J. Cherian, Emmanuel J. Candès , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning Yiling Xie, Xiaoming Huo , 2024. [ abs ][ pdf ][ bib ]

DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Flexible Bayesian Product Mixture Models for Vector Autoregressions Suprateek Kundu, Joshua Lukemire , 2024. [ abs ][ pdf ][ bib ]

A Variational Approach to Bayesian Phylogenetic Inference Cheng Zhang, Frederick A. Matsen IV , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fat-Shattering Dimension of k-fold Aggregations Idan Attias, Aryeh Kontorovich , 2024. [ abs ][ pdf ][ bib ]

Unified Binary and Multiclass Margin-Based Classification Yutong Wang, Clayton Scott , 2024. [ abs ][ pdf ][ bib ]

Neural Feature Learning in Function Space Xiangxiang Xu, Lizhong Zheng , 2024. [ abs ][ pdf ][ bib ]      [ code ]

PyGOD: A Python Library for Graph Outlier Detection Kay Liu, Yingtong Dou, Xueying Ding, Xiyang Hu, Ruitong Zhang, Hao Peng, Lichao Sun, Philip S. Yu , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria Tengyuan Liang , 2024. [ abs ][ pdf ][ bib ]

Fixed points of nonnegative neural networks Tomasz J. Piotrowski, Renato L. G. Cavalcante, Mateusz Gabor , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks Fanghui Liu, Leello Dadi, Volkan Cevher , 2024. [ abs ][ pdf ][ bib ]

A Survey on Multi-player Bandits Etienne Boursier, Vianney Perchet , 2024. [ abs ][ pdf ][ bib ]

Transport-based Counterfactual Models Lucas De Lara, Alberto González-Sanz, Nicholas Asher, Laurent Risser, Jean-Michel Loubes , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Adaptive Latent Feature Sharing for Piecewise Linear Dimensionality Reduction Adam Farooq, Yordan P. Raykov, Petar Raykov, Max A. Little , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Topological Node2vec: Enhanced Graph Embedding via Persistent Homology Yasuaki Hiraoka, Yusuke Imoto, Théo Lacombe, Killian Meehan, Toshiaki Yachimura , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length Katerina Hlaváčková-Schindler, Anna Melnykova, Irene Tubikanec , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Representation Learning via Manifold Flattening and Reconstruction Michael Psenka, Druv Pai, Vishal Raman, Shankar Sastry, Yi Ma , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Bagging Provides Assumption-free Stability Jake A. Soloff, Rina Foygel Barber, Rebecca Willett , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fairness guarantees in multi-class classification with demographic parity Christophe Denis, Romuald Elie, Mohamed Hebiri, François Hu , 2024. [ abs ][ pdf ][ bib ]

Regimes of No Gain in Multi-class Active Learning Gan Yuan, Yunfan Zhao, Samory Kpotufe , 2024. [ abs ][ pdf ][ bib ]

Learning Optimal Dynamic Treatment Regimens Subject to Stagewise Risk Controls Mochuan Liu, Yuanjia Wang, Haoda Fu, Donglin Zeng , 2024. [ abs ][ pdf ][ bib ]

Margin-Based Active Learning of Classifiers Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice , 2024. [ abs ][ pdf ][ bib ]

Random Subgraph Detection Using Queries Wasim Huleihel, Arya Mazumdar, Soumyabrata Pal , 2024. [ abs ][ pdf ][ bib ]

Classification with Deep Neural Networks and Logistic Loss Zihan Zhang, Lei Shi, Ding-Xuan Zhou , 2024. [ abs ][ pdf ][ bib ]

Spectral learning of multivariate extremes Marco Avella Medina, Richard A Davis, Gennady Samorodnitsky , 2024. [ abs ][ pdf ][ bib ]

Sum-of-norms clustering does not separate nearby balls Alexander Dunlap, Jean-Christophe Mourrat , 2024. [ abs ][ pdf ][ bib ]      [ code ]

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization Guy Kornowski, Ohad Shamir , 2024. [ abs ][ pdf ][ bib ]

Linear Distance Metric Learning with Noisy Labels Meysam Alishahi, Anna Little, Jeff M. Phillips , 2024. [ abs ][ pdf ][ bib ]      [ code ]

OpenBox: A Python Toolkit for Generalized Black-box Optimization Huaijun Jiang, Yu Shen, Yang Li, Beicheng Xu, Sixian Du, Wentao Zhang, Ce Zhang, Bin Cui , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Generative Adversarial Ranking Nets Yinghua Yao, Yuangang Pan, Jing Li, Ivor W. Tsang, Xin Yao , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Predictive Inference with Weak Supervision Maxime Cauchois, Suyash Gupta, Alnur Ali, John C. Duchi , 2024. [ abs ][ pdf ][ bib ]

Functions with average smoothness: structure, algorithms, and learning Yair Ashlagi, Lee-Ad Gottlieb, Aryeh Kontorovich , 2024. [ abs ][ pdf ][ bib ]

Differentially Private Data Release for Mixed-type Data via Latent Factor Models Yanqing Zhang, Qi Xu, Niansheng Tang, Annie Qu , 2024. [ abs ][ pdf ][ bib ]

The Non-Overlapping Statistical Approximation to Overlapping Group Lasso Mingyu Qi, Tianxi Li , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Faster Rates of Differentially Private Stochastic Convex Optimization Jinyan Su, Lijie Hu, Di Wang , 2024. [ abs ][ pdf ][ bib ]

Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization O. Deniz Akyildiz, Sotirios Sabanis , 2024. [ abs ][ pdf ][ bib ]

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits Junpei Komiyama, Edouard Fouché, Junya Honda , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Stable Implementation of Probabilistic ODE Solvers Nicholas Krämer, Philipp Hennig , 2024. [ abs ][ pdf ][ bib ]

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund , 2024. [ abs ][ pdf ][ bib ]

Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space Zhengdao Chen , 2024. [ abs ][ pdf ][ bib ]

QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration Felix Chalumeau, Bryan Lim, Raphaël Boige, Maxime Allard, Luca Grillotti, Manon Flageat, Valentin Macé, Guillaume Richard, Arthur Flajolet, Thomas Pierrot, Antoine Cully , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Random Forest Weighted Local Fréchet Regression with Random Objects Rui Qiu, Zhou Yu, Ruoqing Zhu , 2024. [ abs ][ pdf ][ bib ]      [ code ]

PhAST: Physics-Aware, Scalable, and Task-Specific GNNs for Accelerated Catalyst Design Alexandre Duval, Victor Schmidt, Santiago Miret, Yoshua Bengio, Alex Hernández-García, David Rolnick , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Unsupervised Anomaly Detection Algorithms on Real-world Data: How Many Do We Need? Roel Bouman, Zaharah Bukhsh, Tom Heskes , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data Vasilii Feofanov, Emilie Devijver, Massih-Reza Amini , 2024. [ abs ][ pdf ][ bib ]

Information Processing Equalities and the Information–Risk Bridge Robert C. Williamson, Zac Cranko , 2024. [ abs ][ pdf ][ bib ]

Nonparametric Regression for 3D Point Cloud Learning Xinyi Li, Shan Yu, Yueying Wang, Guannan Wang, Li Wang, Ming-Jun Lai , 2024. [ abs ][ pdf ][ bib ]      [ code ]

AMLB: an AutoML Benchmark Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors, Erin LeDell, Sébastien Poirier, Janek Thomas, Bernd Bischl, Joaquin Vanschoren , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Materials Discovery using Max K-Armed Bandit Nobuaki Kikkawa, Hiroshi Ohno , 2024. [ abs ][ pdf ][ bib ]

Semi-supervised Inference for Block-wise Missing Data without Imputation Shanshan Song, Yuanyuan Lin, Yong Zhou , 2024. [ abs ][ pdf ][ bib ]

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou , 2024. [ abs ][ pdf ][ bib ]

Scaling Speech Technology to 1,000+ Languages Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli , 2024. [ abs ][ pdf ][ bib ]      [ code ]

MAP- and MLE-Based Teaching Hans Ulrich Simon, Jan Arne Telle , 2024. [ abs ][ pdf ][ bib ]

A General Framework for the Analysis of Kernel-based Tests Tamara Fernández, Nicolás Rivera , 2024. [ abs ][ pdf ][ bib ]

Overparametrized Multi-layer Neural Networks: Uniform Concentration of Neural Tangent Kernel and Convergence of Stochastic Gradient Descent Jiaming Xu, Hanjing Zhu , 2024. [ abs ][ pdf ][ bib ]

Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces Rui Wang, Yuesheng Xu, Mingsong Yan , 2024. [ abs ][ pdf ][ bib ]

Exploration of the Search Space of Gaussian Graphical Models for Paired Data Alberto Roverato, Dung Ngoc Nguyen , 2024. [ abs ][ pdf ][ bib ]

The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective Chi-Heng Lin, Chiraag Kaushik, Eva L. Dyer, Vidya Muthukumar , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy , 2024. [ abs ][ pdf ][ bib ]

Minimax Rates for High-Dimensional Random Tessellation Forests Eliza O'Reilly, Ngoc Mai Tran , 2024. [ abs ][ pdf ][ bib ]

Nonparametric Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks Guohao Shen, Yuling Jiao, Yuanyuan Lin, Joel L. Horowitz, Jian Huang , 2024. [ abs ][ pdf ][ bib ]

Spatial meshing for general Bayesian multivariate models Michele Peruzzi, David B. Dunson , 2024. [ abs ][ pdf ][ bib ]      [ code ]

A Semi-parametric Estimation of Personalized Dose-response Function Using Instrumental Variables Wei Luo, Yeying Zhu, Xuekui Zhang, Lin Lin , 2024. [ abs ][ pdf ][ bib ]

Learning Non-Gaussian Graphical Models via Hessian Scores and Triangular Transport Ricardo Baptista, Rebecca Morrison, Olivier Zahm, Youssef Marzouk , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Learnability of Out-of-distribution Detection Zhen Fang, Yixuan Li, Feng Liu, Bo Han, Jie Lu , 2024. [ abs ][ pdf ][ bib ]

Win: Weight-Decay-Integrated Nesterov Acceleration for Faster Network Training Pan Zhou, Xingyu Xie, Zhouchen Lin, Kim-Chuan Toh, Shuicheng Yan , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin , 2024. [ abs ][ pdf ][ bib ]

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions Maksim Velikanov, Dmitry Yarotsky , 2024. [ abs ][ pdf ][ bib ]

ptwt - The PyTorch Wavelet Toolbox Moritz Wolter, Felix Blanke, Jochen Garcke, Charles Tapley Hoyt , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Choosing the Number of Topics in LDA Models – A Monte Carlo Comparison of Selection Criteria Victor Bystrov, Viktoriia Naboka-Krell, Anna Staszewska-Bystrova, Peter Winker , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Functional Directed Acyclic Graphs Kuang-Yao Lee, Lexin Li, Bing Li , 2024. [ abs ][ pdf ][ bib ]

Unlabeled Principal Component Analysis and Matrix Completion Yunzhen Yao, Liangzu Peng, Manolis C. Tsakiris , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Distributed Estimation on Semi-Supervised Generalized Linear Model Jiyuan Tu, Weidong Liu, Xiaojun Mao , 2024. [ abs ][ pdf ][ bib ]

Towards Explainable Evaluation Metrics for Machine Translation Christoph Leiter, Piyawat Lertvittayakumjorn, Marina Fomicheva, Wei Zhao, Yang Gao, Steffen Eger , 2024. [ abs ][ pdf ][ bib ]

Differentially private methods for managing model uncertainty in linear regression Víctor Peña, Andrés F. Barrientos , 2024. [ abs ][ pdf ][ bib ]

Data Summarization via Bilevel Optimization Zalán Borsos, Mojmír Mutný, Marco Tagliasacchi, Andreas Krause , 2024. [ abs ][ pdf ][ bib ]

Pareto Smoothed Importance Sampling Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao, Jonah Gabry , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Policy Gradient Methods in the Presence of Symmetries and State Abstractions Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger, Doina Precup , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Scaling Instruction-Finetuned Language Models Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei , 2024. [ abs ][ pdf ][ bib ]

Tangential Wasserstein Projections Florian Gunsilius, Meng Hsuan Hsieh, Myung Jin Lee , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Learnability of Linear Port-Hamiltonian Systems Juan-Pablo Ortega, Daiying Yin , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Unbiased Estimation for Partially Observed Diffusions Jeremy Heng, Jeremie Houssineau, Ajay Jasra , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions Stanislas Ducotterd, Alexis Goujon, Pakshal Bohra, Dimitris Perdios, Sebastian Neumayer, Michael Unser , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Mathematical Framework for Online Social Media Auditing Wasim Huleihel, Yehonathan Refael , 2024. [ abs ][ pdf ][ bib ]

An Embedding Framework for the Design and Analysis of Consistent Polyhedral Surrogates Jessie Finocchiaro, Rafael M. Frongillo, Bo Waggoner , 2024. [ abs ][ pdf ][ bib ]

Low-rank Variational Bayes correction to the Laplace method Janet van Niekerk, Haavard Rue , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Scaling the Convex Barrier with Sparse Dual Algorithms Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel, Philip H.S. Torr, M. Pawan Kumar , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Causal-learn: Causal Discovery in Python Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Decomposed Linear Dynamical Systems (dLDS) for learning the latent components of neural dynamics Noga Mudrik, Yenho Chen, Eva Yezerets, Christopher J. Rozell, Adam S. Charles , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Existence and Minimax Theorems for Adversarial Surrogate Risks in Binary Classification Natalie S. Frank, Jonathan Niles-Weed , 2024. [ abs ][ pdf ][ bib ]

Data Thinning for Convolution-Closed Distributions Anna Neufeld, Ameer Dharamshi, Lucy L. Gao, Daniela Witten , 2024. [ abs ][ pdf ][ bib ]      [ code ]

A projected semismooth Newton method for a class of nonconvex composite programs with strong prox-regularity Jiang Hu, Kangkang Deng, Jiayuan Wu, Quanzheng Li , 2024. [ abs ][ pdf ][ bib ]

Revisiting RIP Guarantees for Sketching Operators on Mixture Models Ayoub Belhadji, Rémi Gribonval , 2024. [ abs ][ pdf ][ bib ]

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization Daniel LeJeune, Jiayu Liu, Reinhard Heckel , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks Dong-Young Lim, Sotirios Sabanis , 2024. [ abs ][ pdf ][ bib ]

Axiomatic effect propagation in structural causal models Raghav Singal, George Michailidis , 2024. [ abs ][ pdf ][ bib ]

Optimal First-Order Algorithms as a Function of Inequalities Chanwoo Park, Ernest K. Ryu , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Resource-Efficient Neural Networks for Embedded Systems Wolfgang Roth, Günther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani , 2024. [ abs ][ pdf ][ bib ]

Trained Transformers Learn Linear Models In-Context Ruiqi Zhang, Spencer Frei, Peter L. Bartlett , 2024. [ abs ][ pdf ][ bib ]

Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh , 2024. [ abs ][ pdf ][ bib ]

Efficient Modality Selection in Multimodal Learning Yifei He, Runxiang Cheng, Gargi Balasubramaniam, Yao-Hung Hubert Tsai, Han Zhao , 2024. [ abs ][ pdf ][ bib ]

A Multilabel Classification Framework for Approximate Nearest Neighbor Search Ville Hyvönen, Elias Jääsaari, Teemu Roos , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Probabilistic Forecasting with Generative Networks via Scoring Rule Minimization Lorenzo Pacchiardi, Rilwan A. Adewoyin, Peter Dueben, Ritabrata Dutta , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Multiple Descent in the Multiple Random Feature Model Xuran Meng, Jianfeng Yao, Yuan Cao , 2024. [ abs ][ pdf ][ bib ]

Mean-Square Analysis of Discretized Itô Diffusions for Heavy-tailed Sampling Ye He, Tyler Farghly, Krishnakumar Balasubramanian, Murat A. Erdogdu , 2024. [ abs ][ pdf ][ bib ]

Invariant and Equivariant Reynolds Networks Akiyoshi Sannai, Makoto Kawano, Wataru Kumagai , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Personalized PCA: Decoupling Shared and Unique Features Naichen Shi, Raed Al Kontar , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee George H. Chen , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel , 2024. [ abs ][ pdf ][ bib ]

Convergence for nonconvex ADMM, with applications to CT imaging Rina Foygel Barber, Emil Y. Sidky , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms T. Tony Cai, Hongji Wei , 2024. [ abs ][ pdf ][ bib ]

Sparse NMF with Archetypal Regularization: Computational and Robustness Properties Kayhan Behdin, Rahul Mazumder , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions Shijun Zhang, Jianfeng Lu, Hongkai Zhao , 2024. [ abs ][ pdf ][ bib ]

Effect-Invariant Mechanisms for Policy Generalization Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters , 2024. [ abs ][ pdf ][ bib ]

Pygmtools: A Python Graph Matching Toolkit Runzhong Wang, Ziao Guo, Wenzheng Pan, Jiale Ma, Yikai Zhang, Nan Yang, Qi Liu, Longxuan Wei, Hanxue Zhang, Chang Liu, Zetian Jiang, Xiaokang Yang, Junchi Yan , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Heterogeneous-Agent Reinforcement Learning Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Sample-efficient Adversarial Imitation Learning Dahuin Jung, Hyungyu Lee, Sungroh Yoon , 2024. [ abs ][ pdf ][ bib ]

Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi , 2024. [ abs ][ pdf ][ bib ]

Rates of convergence for density estimation with generative adversarial networks Nikita Puchkin, Sergey Samsonov, Denis Belomestny, Eric Moulines, Alexey Naumov , 2024. [ abs ][ pdf ][ bib ]

Additive smoothing error in backward variational inference for general state-space models Mathis Chagneux, Elisabeth Gassiat, Pierre Gloaguen, Sylvain Le Corff , 2024. [ abs ][ pdf ][ bib ]

Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality Stephan Wojtowytsch , 2024. [ abs ][ pdf ][ bib ]

Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees Alexander Terenin, David R. Burt, Artem Artemev, Seth Flaxman, Mark van der Wilk, Carl Edward Rasmussen, Hong Ge , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Tail Decay Rate Estimation of Loss Function Distributions Etrit Haxholli, Marco Lorenzi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces Hao Liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao , 2024. [ abs ][ pdf ][ bib ]

Post-Regularization Confidence Bands for Ordinary Differential Equations Xiaowu Dai, Lexin Li , 2024. [ abs ][ pdf ][ bib ]

On the Generalization of Stochastic Gradient Descent with Momentum Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang , 2024. [ abs ][ pdf ][ bib ]

Pursuit of the Cluster Structure of Network Lasso: Recovery Condition and Non-convex Extension Shotaro Yagishita, Jun-ya Gotoh , 2024. [ abs ][ pdf ][ bib ]

Iterate Averaging in the Quest for Best Test Error Diego Granziol, Nicholas P. Baskerville, Xingchen Wan, Samuel Albanie, Stephen Roberts , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Nonparametric Inference under B-bits Quantization Kexuan Li, Ruiqi Liu, Ganggang Xu, Zuofeng Shang , 2024. [ abs ][ pdf ][ bib ]

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box Ryan Giordano, Martin Ingram, Tamara Broderick , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Sufficient Graphical Models Bing Li, Kyongwon Kim , 2024. [ abs ][ pdf ][ bib ]

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond Nathan Kallus, Xiaojie Mao, Masatoshi Uehara , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks Sebastian Neumayer, Lénaïc Chizat, Michael Unser , 2024. [ abs ][ pdf ][ bib ]

Improving physics-informed neural networks with meta-learned optimization Alex Bihlo , 2024. [ abs ][ pdf ][ bib ]

A Comparison of Continuous-Time Approximations to Stochastic Gradient Descent Stefan Ankirchner, Stefan Perko , 2024. [ abs ][ pdf ][ bib ]

Critically Assessing the State of the Art in Neural Network Verification Matthias König, Annelot W. Bosman, Holger H. Hoos, Jan N. van Rijn , 2024. [ abs ][ pdf ][ bib ]

Estimating the Minimizer and the Minimum Value of a Regression Function under Passive Design Arya Akhavan, Davit Gogolashvili, Alexandre B. Tsybakov , 2024. [ abs ][ pdf ][ bib ]

Modeling Random Networks with Heterogeneous Reciprocity Daniel Cirkovic, Tiandong Wang , 2024. [ abs ][ pdf ][ bib ]

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment Zixian Yang, Xin Liu, Lei Ying , 2024. [ abs ][ pdf ][ bib ]

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models Yangjing Zhang, Ying Cui, Bodhisattva Sen, Kim-Chuan Toh , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Decorrelated Variable Importance Isabella Verdinelli, Larry Wasserman , 2024. [ abs ][ pdf ][ bib ]

Model-Free Representation Learning and Exploration in Low-Rank MDPs Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal , 2024. [ abs ][ pdf ][ bib ]

Seeded Graph Matching for the Correlated Gaussian Wigner Model via the Projected Power Method Ernesto Araya, Guillaume Braun, Hemant Tyagi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization Shicong Cen, Yuting Wei, Yuejie Chi , 2024. [ abs ][ pdf ][ bib ]

Power of knockoff: The impact of ranking algorithm, augmented design, and symmetric statistic Zheng Tracy Ke, Jun S. Liu, Yucong Ma , 2024. [ abs ][ pdf ][ bib ]

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction Yuze Han, Guangzeng Xie, Zhihua Zhang , 2024. [ abs ][ pdf ][ bib ]

On Truthing Issues in Supervised Classification Jonathan K. Su , 2024. [ abs ][ pdf ][ bib ]

2024.

Help | Advanced Search

Machine Learning

Authors and titles for recent submissions.

  • Fri, 27 Sep 2024
  • Thu, 26 Sep 2024
  • Wed, 25 Sep 2024
  • Tue, 24 Sep 2024
  • Mon, 23 Sep 2024

See today's new changes

Fri, 27 Sep 2024 (showing 19 of 19 entries )

Thu, 26 sep 2024 (showing first 6 of 13 entries ).

  • AI technologies

This wide-ranging guide to artificial intelligence in the enterprise provides the building blocks for becoming successful business consumers of AI technologies. It starts with introductory explanations of AI's history, how AI works and the main types of AI. The importance and impact of AI is covered next, followed by information on AI's key benefits and risks, current and potential AI use cases, building a successful AI strategy, steps for implementing AI tools in the enterprise and technological breakthroughs that are driving the field forward. Throughout the guide, we include hyperlinks to TechTarget articles that provide more detail and insights on the topics discussed.

  • 10 top AI and machine learning trends for 2024

Custom enterprise models, open source AI, multimodal -- learn about the top AI and machine learning trends for 2024 and how they promise to transform the industry.

Lev Craig

  • Lev Craig, Site Editor

After the launch of ChatGPT in November 2022, 2023 marked a turning point in artificial intelligence . The past year's developments, from a vibrant open source landscape to sophisticated multimodal models, have laid the groundwork for significant advances in AI .

But although generative AI continues to captivate the tech world, attitudes are becoming more nuanced and mature as organizations shift their focus from experimentation to real-world initiatives. This year's trends reflect a deepening sophistication and caution in AI development and deployment strategies , with an eye to ethics, safety and the evolving regulatory landscape.

Here are the top 10 AI and machine learning trends to prepare for in 2024.

1. Multimodal AI

Multimodal AI goes beyond traditional single-mode data processing to encompass multiple input types, such as text, images and sound -- a step toward mimicking the human ability to process diverse sensory information.

"The interfaces of the world are multimodal," said Mark Chen, head of frontiers research at OpenAI, in a November 2023 presentation at the conference EmTech MIT. "We want our models to see what we see and hear what we hear, and we want them to also generate content that appeals to more than one of our senses."

This article is part of

A guide to artificial intelligence in the enterprise

  • Which also includes:
  • How can AI drive revenue? Here are 10 approaches
  • 8 jobs that AI can't replace and why

The multimodal capabilities in OpenAI's GPT-4 model enable the software to respond to visual and audio input. In his talk, Chen gave the example of taking photos of the inside of a fridge and asking ChatGPT to suggest a recipe based on the ingredients in the photo. The interaction could even encompass an audio element if using ChatGPT's voice mode to pose the request aloud.

A man stands onstage with 'MIT Technology Review' projected on the wall. Behind him, a presentation slide reads 'Multimodal brings us closer to AGI.'

Although most generative AI initiatives today are text-based, "the real power of these capabilities is going to be when you can marry up text and conversation with images and video, cross-pollinate all three of those, and apply those to a variety of businesses," said Matt Barrington, Americas emerging technologies leader at EY.

Multimodal AI's real-world applications are diverse and expanding. In healthcare, for example, multimodal models can analyze medical images in light of patient history and genetic information to improve diagnostic accuracy. At the job function level, multimodal models can expand what various employees can do by extending basic design and coding capabilities to individuals without a formal background in those areas.

"I can't draw to save my life," Barrington said. "Well, now I can. I'm decent with language, so ... I can plug into a capability like [image generation], and some of those ideas that were in my head that I could never physically draw, I can have AI do."

Moreover, introducing multimodal capabilities could strengthen models by offering them new data to learn from. "As our models get better and better at modeling language and start to hit the limits of what they can learn from language, we want to provide the models with raw inputs from the world so that they can perceive the world on their own and draw their own inferences from things like video or audio data," Chen said.

2. Agentic AI

Agentic AI marks a significant shift from reactive to proactive AI. AI agents are advanced systems that exhibit autonomy, proactivity and the ability to act independently. Unlike traditional AI systems, which mainly respond to user inputs and follow predetermined programming, AI agents are designed to understand their environment, set goals and act to achieve those objectives without direct human intervention.

For example, in environmental monitoring , an AI agent could be trained to collect data, analyze patterns and initiate preventive actions in response to hazards such as early signs of a forest fire. Likewise, a financial AI agent could actively manage an investment portfolio using adaptive strategies that react to changing market conditions in real time.

"2023 was the year of being able to chat with an AI," wrote computer scientist Peter Norvig, a fellow at Stanford's Human-Centered AI Institute, in a recent blog post . "In 2024, we'll see the ability for agents to get stuff done for you. Make reservations, plan a trip, connect to other services."

In addition, combining agentic and multimodal AI could open up new possibilities. In the aforementioned presentation, Chen gave the example of an application designed to identify the contents of an uploaded image. Previously, someone looking to build such an application would have needed to train their own image recognition model and then figure out how to deploy it. But with multimodal, agentic models, this could all be accomplished through natural language prompting.

"I really think that multimodal together with GPTs will open up the no-code development of computer vision applications, just in the same way that prompting opened up the no-code development of a lot of text-based applications," Chen said.

3. Open source AI

Building large language models and other powerful generative AI systems is an expensive process that requires enormous amounts of compute and data. But using an open source model enables developers to build on top of others' work, reducing costs and expanding AI access. Open source AI is publicly available, typically for free, enabling organizations and researchers to contribute to and build on existing code.

GitHub data from the past year shows a remarkable increase in developer engagement with AI, particularly generative AI. In 2023, generative AI projects entered the top 10 most popular projects across the code hosting platform for the first time, with projects such as Stable Diffusion and AutoGPT pulling in thousands of first-time contributors.

Early in the year, open source generative models were limited in number, and their performance often lagged behind proprietary options such as ChatGPT. But the landscape broadened significantly over the course of 2023 to include powerful open source contenders such as Meta's Llama 2 and Mistral AI's Mixtral models. This could shift the dynamics of the AI landscape in 2024 by providing smaller, less resourced entities with access to sophisticated AI models and tools that were previously out of reach.

"It gives everyone easy, fairly democratized access, and it's great for experimentation and exploration," Barrington said.

Open source approaches can also encourage transparency and ethical development, as more eyes on the code means a greater likelihood of identifying biases, bugs and security vulnerabilities. But experts have also expressed concerns about the misuse of open source AI to create disinformation and other harmful content . In addition, building and maintaining open source is difficult even for traditional software, let alone complex and compute-intensive AI models.

4. Retrieval-augmented generation

Although generative AI tools were widely adopted in 2023, they continue to be plagued by the problem of hallucinations : plausible-sounding but incorrect responses to users' queries. This limitation has presented a roadblock to enterprise adoption, where hallucinations in business-critical or customer-facing scenarios could be catastrophic. Retrieval-augmented generation (RAG) has emerged as a technique for reducing hallucinations, with potentially profound implications for enterprise AI adoption.

RAG blends text generation with information retrieval to enhance the accuracy and relevance of AI-generated content. It enables LLMs to access external information, helping them produce more accurate and contextually aware responses. Bypassing the need to store all knowledge directly in the LLM also reduces model size, which increases speed and lowers costs.

"You can use RAG to go gather a ton of unstructured information, documents, etc., [and] feed it into a model without having to fine-tune or custom-train a model," Barrington said.

These benefits are particularly enticing for enterprise applications where up-to-date factual knowledge is crucial. For example, businesses can use RAG with foundation models to create more efficient and informative chatbots and virtual assistants.

5. Customized enterprise generative AI models

Massive, general-purpose tools such as Midjourney and ChatGPT have attracted the most attention among consumers exploring generative AI. But for business use cases, smaller, narrow-purpose models could prove to have the most staying power, driven by the growing demand for AI systems that can meet niche requirements.

While creating a new model from scratch is a possibility, it's a resource-intensive proposition that will be out of reach for many organizations. To build customized generative AI , most organizations instead modify existing AI models -- for example, tweaking their architecture or fine-tuning on a domain-specific data set. This can be cheaper than either building a new model from the ground up or relying on API calls to a public LLM .

"Calls to GPT-4 as an API, just as an example, are very expensive, both in terms of cost and in terms of latency -- how long it can actually take to return a result," said Shane Luke, vice president of AI and machine learning at Workday. "We are working a lot ... on optimizing so that we have the same capability, but it's very targeted and specific. And so it can be a much smaller model that's more manageable."

The key advantage of customized generative AI models is their ability to cater to niche markets and user needs. Tailored generative AI tools can be built for almost any scenario, from customer support to supply chain management to document review. This is especially relevant for sectors with highly specialized terminology and practices, such as healthcare , finance and legal.

In many business use cases, the most massive LLMs are overkill. Although ChatGPT might be the state of the art for a consumer-facing chatbot designed to handle any query, "it's not the state of the art for smaller enterprise applications," Luke said.

Barrington expects to see enterprises exploring a more diverse range of models in the coming year as AI developers' capabilities begin to converge. "We're expecting, over the next year or two, for there to be a much higher degree of parity across the models -- and that's a good thing," he said.

On a smaller scale, Luke has seen a similar scenario play out at Workday, which provides a set of AI services for teams to experiment with internally. Although employees started out using mostly OpenAI services, Luke said, he's gradually seen a shift toward a mix of models from various providers, including Google and AWS.

Building a customized model rather than using an off-the-shelf public tool often also improves privacy and security, as it gives organizations greater control over their data. Luke gave the example of building a model for Workday tasks that involve handling sensitive personal data, such as disability status and health history. "Those aren't things that we're going to want to send out to a third party," he said. "Our customers generally wouldn't be comfortable with that."

In light of these privacy and security benefits, stricter AI regulation in the coming years could push organizations to focus their energies on proprietary models, explained Gillian Crossan, risk advisory principal and global technology sector leader at Deloitte.

"It's going to encourage enterprises to focus more on private models that are proprietary, that are domain-specific, rather than focus on these large language models that are trained with data from all over the internet and everything that that brings with it," she said.

6. Need for AI and machine learning talent

Designing, training and testing a machine learning model is no easy feat -- much less pushing it to production and maintaining it in a complex organizational IT environment. It's no surprise, then, that the growing need for AI and machine learning talent is expected to continue into 2024 and beyond .

"The market is still really hot around talent," Luke said. "It's very easy to get a job in this space."

In particular, as AI and machine learning become more integrated into business operations, there's a growing need for professionals who can bridge the gap between theory and practice. This requires the ability to deploy, monitor and maintain AI systems in real-world settings -- a discipline often referred to as MLOps , short for machine learning operations.

In a recent O'Reilly report , respondents cited AI programming, data analysis and statistics, and operations for AI and machine learning as the top three skills their organizations needed for generative AI projects. These types of skills, however, are in short supply. "That's going to be one of the challenges around AI -- to be able to have the talent readily available," Crossan said.

In 2024, look for organizations to seek out talent with these types of skills -- and not just big tech companies. With IT and data nearly ubiquitous as business functions and AI initiatives rising in popularity, building internal AI and machine learning capabilities is poised to be the next stage in digital transformation.

Crossan also emphasized the importance of diversity in AI initiatives at every level, from technical teams building models up to the board. "One of the big issues with AI and the public models is the amount of bias that exists in the training data," she said. "And unless you have that diverse team within your organization that is challenging the results and challenging what you see, you are going to potentially end up in a worse place than you were before AI."

7. Shadow AI

As employees across job functions become interested in generative AI, organizations are facing the issue of shadow AI : use of AI within an organization without explicit approval or oversight from the IT department. This trend is becoming increasingly prevalent as AI becomes more accessible, enabling even nontechnical workers to use it independently.

Shadow AI typically arises when employees need quick solutions to a problem or want to explore new technology faster than official channels allow. This is especially common for easy-to-use AI chatbots, which employees can try out in their web browsers with little difficulty -- without going through IT review and approval processes.

On the plus side, exploring ways to use these emerging technologies evinces a proactive, innovative spirit. But it also carries risk, since end users often lack relevant information on security, data privacy and compliance. For example, a user might feed trade secrets into a public-facing LLM without realizing that doing so exposes that sensitive information to third parties.

"Once something gets out into these public models, you cannot pull it back," Barrington said. "So there's a bit of a fear factor and risk angle that's appropriate for most enterprises, regardless of sector, to think through."

Shadow IT's negative impacts on organizations include higher costs, increased risk, interdepartmental inconsistency and lack of control by IT functions.

In 2024, organizations will need to take steps to manage shadow AI through governance frameworks that balance supporting innovation with protecting privacy and security. This could include setting clear acceptable AI use policies and providing approved platforms, as well as encouraging collaboration between IT and business leaders to understand how various departments want to use AI.

"The reality is, everybody's using it," Barrington said, in reference to recent EY research finding that 90% of respondents used AI at work. "Whether you like it or not, your people are using it today, so you should figure out how to align them to ethical and responsible use of it."

8. A generative AI reality check

As organizations progress from the initial excitement surrounding generative AI to actual adoption and integration, they're likely to face a reality check in 2024 -- a phase often referred to as the "trough of disillusionment" in the Gartner Hype Cycle .

"We're definitely seeing a rapid shift from what we've been calling this experimentation phase into [asking], 'How do I run this at scale across my enterprise?'" Barrington said.

As early enthusiasm begins to wane, organizations are confronting generative AI's limitations, such as output quality, security and ethics concerns, and integration difficulties with existing systems and workflows. The complexity of implementing and scaling AI in a business environment is often underestimated, and tasks such as ensuring data quality, training models and maintaining AI systems in production can be more challenging than initially anticipated.

"It's actually not very easy to build a generative AI application and put it into production in a real product setting," Luke said.

The silver lining is that these growing pains, while unpleasant in the short term, could result in a healthier, more tempered outlook in the long run. Moving past this phase will require setting realistic expectations for AI and developing a more nuanced understanding of what AI can and can't do. AI projects should be clearly tied to business goals and practical use cases, with a clear plan in place for measuring outcomes .

"If you have very loose use cases that are not clearly defined, that's probably what's going to hold you up the most," Crossan said.

9. Increased attention to AI ethics and security risks

The proliferation of deepfakes and sophisticated AI-generated content is raising alarms about the potential for misinformation and manipulation in media and politics, as well as identity theft and other types of fraud. AI can also enhance the efficacy of ransomware and phishing attacks, making them more convincing, more adaptable and harder to detect.

Although efforts are underway to develop technologies for detecting AI-generated content , doing so remains challenging. Current AI watermarking techniques are relatively easy to circumvent, and existing AI detection software can be prone to false positives.

The increasing ubiquity of AI systems also highlights the importance of ensuring that they are transparent and fair -- for example, by carefully vetting training data and algorithms for bias. Crossan emphasized that these ethics and compliance considerations should be interwoven throughout the process of developing an AI strategy.

"You have to be thinking about, as an enterprise ... implementing AI, what are the controls that you're going to need?" she said. "And that starts to help you plan a bit for the regulation so that you're doing it together. You're not doing all of this experimentation with AI and then [realizing], 'Oh, now we need to think about the controls.' You do it at the same time."

Safety and ethics can also be another reason to look at smaller, more narrowly tailored models, Luke pointed out. "These smaller, tuned, domain-specific models are just far less capable than the really big ones -- and we want that," he said. "They're less likely to be able to output something that you don't want because they're just not capable of as many things."

10. Evolving AI regulation

Unsurprisingly, given these ethics and security concerns, 2024 is shaping up to be a pivotal year for AI regulation , with laws, policies and industry frameworks rapidly evolving in the U.S. and globally. Organizations will need to stay informed and adaptable in the coming year, as shifting compliance requirements could have significant implications for global operations and AI development strategies.

The EU's AI Act, on which members of the EU's Parliament and Council recently reached a provisional agreement , represents the world's first comprehensive AI law. If adopted, it would ban certain uses of AI, impose obligations for developers of high-risk AI systems and require transparency from companies using generative AI, with noncompliance potentially resulting in multimillion-dollar fines. And it's not just new legislation that could have an effect in 2024.

"Interestingly enough, the regulatory issue that I see could have the biggest impact is GDPR -- good old-fashioned GDPR -- because of the need for rectification and erasure, the right to be forgotten, with public large language models," Crossan said. "How do you control that when they're learning from massive amounts of data, and how can you assure that you've been forgotten?"

Together with the GDPR, the AI Act could position the EU as a global AI regulator, potentially influencing AI use and development standards worldwide. "They're certainly ahead of where we are in the U.S. from an AI regulatory perspective," Crossan said.

The U.S. doesn't yet have comprehensive federal legislation comparable to the EU's AI Act, but experts encourage organizations not to wait to think about compliance until formal requirements are in force. At EY, for example, "we're engaging with our clients to get ahead of it," Barrington said. Otherwise, businesses could find themselves playing catch-up when regulations do come into effect.

Beyond the ripple effects of European policy, recent activity in the U.S. executive branch also suggests how AI regulation could play out stateside. President Joe Biden's October executive order implemented new mandates, such as requiring AI developers to share safety test results with the U.S. government and imposing restrictions to protect against the risks of AI in engineering dangerous biological materials. Various federal agencies have also issued guidance targeting specific sectors, such as NIST's AI Risk Management Framework and the Federal Trade Commission's statement warning businesses against making false claims about their products' AI use.

Further complicating matters, 2024 is an election year in the U.S., and the current slate of presidential candidates shows a wide range of positions on tech policy questions . A new administration could theoretically change the executive branch's approach to AI oversight through reversing or revising Biden's executive order and nonbinding agency guidance.

Lev Craig covers AI and machine learning as the site editor for TechTarget Enterprise AI.

Related Resources

  • How AI Will Fully Automate the Lawn Mowing Business –Webinar
  • Embrace AI faster and responsibly with robust AI Governance that builds trust –Webinar
  • The Finance Industry Deep Dive: Targeted BI & Gen AI Capabilities –Webinar
  • Enhance IT Service Management with AI: Proven Strategies and Metrics for Success –Webinar

Dig Deeper on AI technologies

latest research on machine learning

GenAI evolving, remains dominant data and analytics trend

EricAvidon

GPT-4o vs. GPT-4: How do they compare?

LevCraig

IBM moves ahead with open source, multi-model AI strategy

ShaunSutner

Gemini 1.5 Pro explained: Everything you need to know

SeanKerner

As analytics enters a new era dominated by GenAI, the vendor has named former Salesforce Sales Cloud and Einstein Analytics ...

Great data visualizations require a combination of analytics, design and communication. Master seven key skills for data ...

The vendor's latest platform update adds features such as a PySpark conversion tool and AutoML aimed at helping customers improve...

To ensure an IT project's long-term success, stakeholders should keep track of certain KPIs to ensure technical debt is as low as...

The U.S. Commerce Department's proposal to ban software and hardware for connected vehicles imported from China or Russia might ...

Apple joined a list of big tech companies voluntarily committing to responsible AI. But without regulation, there is no legal ...

The database vendor's new update includes an integration with pgvector that provides users with the vector search capabilities ...

The vendor's latest set of capabilities includes prebuilt applications and automated features aimed at making it easier to use ...

Metadata management tools can range from comprehensive packages of many features to masters of a specific niche. Consider 9 of ...

Connected worker capabilities are now part of the Plex smart manufacturing platform, designed for manufacturers to address worker...

Oracle unveils new capabilities for AI use cases, smart manufacturing and sustainability for Oracle Cloud SCM applications at ...

Communicating effectively about an ERP implementation is a must because of the complexity and stress of the project. Learn what ...

  • Data Science

Analytics Drift

Top Machine Learning (ML) Research Papers Released in 2022

For every Machine Learning (ML) enthusiast, we bring you a curated list of the major breakthroughs in ML research in 2022.

Preetipadma K

Machine learning (ML) is gaining much traction in recent years owing to the disruption and development it brings in enhancing existing technologies. Every month, hundreds of ML papers from various organizations and universities get uploaded on the internet to share the latest breakthroughs in this domain. As the year ends, we bring you the Top 22 ML research papers of 2022 that created a huge impact in the industry. The following list does not reflect the ranking of the papers, and they have been selected on the basis of the recognitions and awards received at international conferences in machine learning.

  • Bootstrapped Meta-Learning

Meta-learning is a promising field that investigates ways to enable machine learners or RL agents (which include hyperparameters) to learn how to learn in a quicker and more robust manner, and it is a crucial study area for enhancing the efficiency of AI agents.

This 2022 ML paper presents an algorithm that teaches the meta-learner how to overcome the meta-optimization challenge and myopic meta goals. The algorithm’s primary objective is meta-learning using gradients, which ensures improved performance. The research paper also examines the potential benefits due to bootstrapping. The authors highlight several interesting theoretical aspects of this algorithm, and the empirical results achieve new state-of-the-art (SOTA) on the ATARI ALE benchmark as well as increased efficiency in multitask learning.

  • Competition-level code generation with AlphaCode

One of the exciting uses for deep learning and large language models is programming. The rising need for coders has sparked the race to build tools that can increase developer productivity and provide non-developers with tools to create software. However, these models still perform badly when put to the test on more challenging, unforeseen issues that need more than just converting instructions into code.

The popular ML paper of 2022 introduces AlphaCode, a code generation system that, in simulated assessments of programming contests on the Codeforces platform, averaged a rating in the top 54.3%. The paper describes the architecture, training, and testing of the deep-learning model.

  • Restoring and attributing ancient texts using deep neural networks

The epigraphic evidence of the ancient Greek era — inscriptions created on durable materials such as stone and pottery —  had already been broken when it was discovered, rendering the inscribed writings incomprehensible. Machine learning can help in restoring, and identifying chronological and geographical origins of damaged inscriptions to help us better understand our past. 

This ML paper proposed a machine learning model built by DeepMind, Ithaca, for the textual restoration and geographical and chronological attribution of ancient Greek inscriptions. Ithaca was trained on a database of just under 80,000 inscriptions from the Packard Humanities Institute. It had a 62% accuracy rate compared to historians, who had a 25% accuracy rate on average. But when historians used Ithaca, they quickly achieved a 72% accuracy.

  • Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Large neural networks use more resources to train hyperparameters since each time, the network must estimate which hyperparameters to utilize. This groundbreaking ML paper of 2022 suggests a novel zero-shot hyperparameter tuning paradigm for more effectively tuning massive neural networks. The research, co-authored by Microsoft Research and OpenAI, describes a novel method called µTransfer that leverages µP to zero-shot transfer hyperparameters from small models and produces nearly perfect HPs on large models without explicitly tuning them.

This method has been found to reduce the amount of trial and error necessary in the costly process of training large neural networks. By drastically lowering the need to predict which training hyperparameters to use, this approach speeds up research on massive neural networks like GPT-3 and perhaps its successors in the future.

  • PaLM: Scaling Language Modeling with Pathways 

Large neural networks trained for language synthesis and recognition have demonstrated outstanding results in various tasks in recent years. This trending 2022 ML paper introduced Pathways Language Model (PaLM), a 780 billion high-quality text token, and 540 billion parameter-dense decoder-only autoregressive transformer.

Although PaLM just uses a decoder and makes changes like SwiGLU Activation, Parallel Layers, Multi-Query Attention, RoPE Embeddings, Shared Input-Output Embeddings, and No Biases and Vocabulary, it is based on a typical transformer model architecture. The paper describes the company’s latest flagship surpassing several human baselines while achieving state-of-the-art in numerous zero, one, and few-shot NLP tasks.

  • Robust Speech Recognition via Large-Scale Weak Supervision

Machine learning developers have found it challenging to build speech-processing algorithms that are trained to predict a vast volume of audio transcripts on the internet. This year, OpenAI released Whisper , a new state-of-the-art (SotA) model in speech-to-text that can transcribe any audio to text and translate it into several languages. It has received 680,000 hours of training on a vast amount of voice data gathered from the internet. According to OpenAI, this model is robust to accents, background noise, and technical terminology. Additionally, it allows transcription into English from 99 different languages and translation into English from those languages.

The OpenAI ML paper mentions the author ensured that about one-third of the audio data is non-English. This helped the team outperform other supervised state-of-the-art models by maintaining a diversified dataset.

  • OPT: Open Pre-trained Transformer Language Models

Large language models have demonstrated extraordinary performance f on numerous tasks (e.g., zero and few-shot learning). However, these models are difficult to duplicate without considerable funding due to their high computing costs. Even while the public can occasionally interact with these models through paid APIs, complete research access is still only available from a select group of well-funded labs. This limited access has hindered researchers’ ability to comprehend how and why these language models work, which has stalled progress on initiatives to improve their robustness and reduce ethical drawbacks like bias and toxicity.

The popular 2022 ML paper introduces Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers with 125 million to 175 billion parameters that the authors want to share freely and responsibly with interested academics. The biggest OPT model, OPT-175B (it is not included in the code repository but is accessible upon request), which is impressively proven to perform similarly to GPT-3 (which also has 175 billion parameters)  uses just 15% of GPT-3’s carbon footprint during development and training.

  • A Path Towards Autonomous Machine Intelligence

Yann LeCun is a prominent and respectable researcher in the field of artificial intelligence and machine learning. In June, his much-anticipated paper “ A Path Towards Autonomous Machine Intelligence ” was published on OpenReview. LeCun offered a number of approaches and architectures in his paper that might be combined and used to create self-supervised autonomous machines. 

He presented a modular architecture for autonomous machine intelligence that combines various models to operate as distinct elements of a machine’s brain and mirror the animal brain. Due to the differentiability of all the models, they are all interconnected to power certain brain-like activities, such as identification and environmental response. It incorporates ideas like a configurable predictive world model, behavior-driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning. 

  • LaMDA: Language Models for Dialog Applications 

Despite tremendous advances in text generation, many of the chatbots available are still rather irritating and unhelpful. This 2022 ML paper from Google describes the LaMDA — short for “Language Model for Dialogue Applications” — system, which caused the uproar this summer when a former Google engineer, Blake Lemoine, alleged that it is sentient. LaMDA is a family of large language models for dialog applications built on Google’s Transformer architecture, which is known for its efficiency and speed in language tasks such as translation. The model’s ability to be adjusted using data that has been human-annotated and the capability of consulting external sources are its most intriguing features.

The model, which has 137 billion parameters, was pre-trained using 1.56 trillon words from publicly accessible conversation data and online publications. The model is also adjusted based on the three parameters of quality, safety, and groundedness.

  • Privacy for Free: How does Dataset Condensation Help Privacy?

One of the primary proposals in the award-winning ML paper is to use dataset condensation methods to retain data efficiency during model training while also providing membership privacy. The authors argue that dataset condensation, which was initially created to increase training effectiveness, is a better alternative to data generators for producing private data since it offers privacy for free. 

Though existing data generators are used to produce differentially private data for model training to minimize unintended data leakage, they result in high training costs or subpar generalization performance for the sake of data privacy. This study was published by Sony AI and received the Outstanding Paper Award at ICML 2022. 

  • TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

The use of a model that converts time series into anomaly scores at each time step is essential in any system for detecting time series anomalies. Recognizing and diagnosing anomalies in multivariate time series data is critical for modern industrial applications. Unfortunately, developing a system capable of promptly and reliably identifying abnormal observations is challenging. This is attributed to a shortage of anomaly labels, excessive data volatility, and the expectations of modern applications for ultra-low inference times. 

In this study , the authors present TranAD, a deep transformer network-based anomaly detection and diagnosis model that leverages attention-based sequence encoders to quickly execute inference while being aware of the more general temporal patterns in the data. TranAD employs adversarial training to achieve stability and focus score-based self-conditioning to enable robust multi-modal feature extraction. The paper mentions extensive empirical experiments on six publicly accessible datasets show that TranAD can perform better in detection and diagnosis than state-of-the-art baseline methods with data- and time-efficient training. 

  • Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding 

In the last few years, generative models called “diffusion models” have been increasingly popular. This year saw these models capture the excitement of AI enthusiasts around the world. 

Going ahead of the current text to speech technology of recent times, this outstanding 2022 ML paper introduced the viral text-to-image diffusion model from Google, Imagen. This diffusion model achieves a new state-of-the-art FID score of 7.27 on the COCO dataset by combining the deep language understanding of transformer-based large language models with the photorealistic image-generating capabilities of diffusion models. A text-only frozen language model provides the text representation, and a diffusion model with two super-resolution upsampling stages, up to 1024×2014, produces the images. It employs several training approaches, including classifier-free guiding, to teach itself conditional and unconditional generation. Another important feature of Imagen is the use of dynamic thresholding, which stops the diffusion process from being saturated in specific areas of the picture, a behavior that reduces image quality, particularly when the weight placed on text conditional creation is large.

  • No Language Left Behind: Scaling Human-Centered Machine Translation

This ML paper introduced the most popular Meta projects of the year 2022: NLLB-200. This paper talks about how Meta built and open-sourced this state-of-the-art AI model at FAIR, which is capable of translating 200 languages between each other. It covers every aspect of this technology: language analysis, moral issues, effect analysis, and benchmarking.

No matter what language a person speaks, accessibility via language ensures that everyone can benefit from the growth of technology. Meta claims that several languages that NLLB-200 translates, such as Kamba and Lao, are not currently supported by any translation systems in use. The tech behemoth also created a dataset called “FLORES-200” to evaluate the effectiveness of the NLLB-200 and show that accurate translations are offered. According to Meta, NLLB-200 offers an average of 44% higher-quality translations than its prior model.

  • A Generalist Agent

AI pundits believe that multimodality will play a huge role in the future of Artificial General Intelligence (AGI). One of the most talked ML papers of 2022 by DeepMind introduces Gato – a generalist agent . This AGI agent is a multi-modal, multi-task, multi-embodiment network, which means that the same neural network (i.e. a single architecture with a single set of weights) can do all tasks while integrating inherently diverse types of inputs and outputs. 

DeepMind claims that the general agent can be improved with new data to perform even better on a wider range of tasks. They argue that having a general-purpose agent reduces the need for hand-crafting policy models for each region, enhances the volume and diversity of training data, and enables continuous advances in the data, computing, and model scales. A general-purpose agent can also be viewed as the first step toward artificial general intelligence, which is the ultimate goal of AGI. 

Gato demonstrates the versatility of transformer-based machine learning architectures by exhibiting their use in a variety of applications.  Unlike previous neural network systems tailored for playing games, stack blocks with a real robot arm, read words, and caption images, Gato is versatile enough to perform all of these tasks on its own, using only a single set of weights and a relatively simple architecture.

  • The Forward-Forward Algorithm: Some Preliminary Investigations 

AI pioneer Geoffrey Hinton is known for writing paper on the first deep convolutional neural network and backpropagation. In his latest paper presented at NeurIPS 2022, Hinton proposed the “forward-forward algorithm,” a new learning algorithm for artificial neural networks based on our understanding of neural activations in the brain. This approach draws inspiration from Boltzmann machines (Hinton and Sejnowski, 1986) and noise contrast estimation (Gutmann and Hyvärinen, 2010). According to Hinton, forward-forward, which is still in its experimental stages, can substitute the forward and backward passes of backpropagation with two forward passes, one with positive data and the other with negative data that the network itself could generate. Further, the algorithm could simulate hardware more efficiently and provide a better explanation for the brain’s cortical learning process.

Without employing complicated regularizers, the algorithm obtained a 1.4 percent test error rate on the MNIST dataset in an empirical study, proving that it is just as effective as backpropagation.

The paper also suggests a novel “mortal computing” model that can enable the forward-forward algorithm and understand our brain’s energy-efficient processes.

  • Focal Modulation Networks

In humans, the ciliary muscles alter the shape of the eye and hence the radius of the curvature lens to focus on near or distant objects. Changing the shape of the eye lens, changes the focal length of the lens. Mimicking this behavior of focal modulation in computer vision systems can be tricky.

This machine learning paper introduces FocalNet, an iterative information extraction technique that employs the premise of foveal attention to post-process Deep Neural Network (DNN) outputs by performing variable input/feature space sampling. Its attention-free design outperforms SoTA self-attention (SA) techniques in a wide range of visual benchmarks. According to the paper, focal modulation consists of three parts: According to the paper, focal modulation consists of three parts: 

a. hierarchical contextualization, implemented using a stack of depth-wise convolutional layers, to encode visual contexts from close-up to a great distance; 

b. gated aggregation to selectively gather contexts for each query token based on its content; and  

c. element-wise modulation or affine modification to inject the gathered context into the query.

  • Learning inverse folding from millions of predicted structures

The field of structural biology is being fundamentally changed by cutting-edge technologies in machine learning, protein structure prediction, and innovative ultrafast structural aligners. Time and money are no longer obstacles to obtaining precise protein models and extensively annotating their functionalities. However, determining a protein sequence from its backbone atom coordinates remained a challenge for scientists. To date, machine learning methods to this challenge have been constrained by the amount of empirically determined protein structures available.

In this ICML Outstanding Paper (Runner Up) , authors explain tackling this problem by increasing training data by almost three orders of magnitude by using AlphaFold2 to predict structures for 12 million protein sequences. With the use of this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers is able to recover native sequence on structurally held-out backbones in 51% of cases while recovering buried residues in 72% of cases. This is an improvement of over 10% over previous techniques. In addition to designing protein complexes, partly masked structures, binding interfaces, and numerous states, the concept generalises to a range of other more difficult tasks.

  • MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Within the AI research community, using video games as a training medium for AI has gained popularity. These autonomous agents have had great success in Atari games, Starcraft, Dota, and Go. Although these developments have gained popularity in the field of artificial intelligence research, the agents do not generalize beyond a narrow range of activities, in contrast to humans, who continually learn from open-ended tasks.

This thought-provoking 2022 ML paper suggests MineDojo, a unique framework for embodied agent research based on the well-known game Minecraft. In addition to building an internet-scale information base with Minecraft videos, tutorials, wiki pages, and forum discussions, Minecraft provides a simulation suite with tens of thousands of open-ended activities. Using MineDojo data, the author proposes a unique agent learning methodology that employs massive pre-trained video-language models as a learnt reward function. Without requiring a dense shaping reward that has been explicitly created, MinoDojo autonomous agent can perform a wide range of open-ended tasks that are stated in free-form language.

  • Is Out-of-Distribution Detection Learnable?

Machine learning (supervised ML) models are frequently trained using the closed-world assumption, which assumes that the distribution of the testing data will resemble that of the training data. This assumption doesn’t hold true when used in real-world activities, which causes a considerable decline in their performance. While this performance loss is acceptable for applications like product recommendations, developing an out-of-distribution (OOD) identification algorithm is crucial to preventing ML systems from making inaccurate predictions in situations where data distribution in real-world activities typically drifts over time (self-driving cars).

In this paper , authors explore the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem, to study the applicability of OOD detection. They first focus on identifying a prerequisite for OOD detection’s learnability. Following that, they attempt to show a number of impossibility theorems regarding the learnability of OOD detection in a handful yet different scenarios.

  • Gradient Descent: The Ultimate Optimizer 

Gradient descent is a popular optimization approach for training machine learning models and neural networks. The ultimate aim of any machine learning (neural network) method is to optimize parameters, but selecting the ideal step size for an optimizer is difficult since it entails lengthy and error-prone manual work. Many strategies exist for automated hyperparameter optimization; however, they often incorporate additional hyperparameters to govern the hyperparameter optimization process. In this study , MIT CSAIL and Meta researchers offer a unique approach that allows gradient descent optimizers like SGD and Adam to tweak their hyperparameters automatically.

They propose learning the hyperparameters by self-using gradient descent, as well as learning the hyper-hyperparameters via gradient descent, and so on indefinitely. This paper describes an efficient approach for allowing gradient descent optimizers to autonomously adjust their own hyperparameters, which may be layered recursively to many levels. As these gradient-based optimizer towers expand in size, they become substantially less sensitive to the selection of top-level hyperparameters, reducing the load on the user to search for optimal values.

  • ProcTHOR: Large-Scale Embodied AI Using Procedural Generation 

Embodied AI is a developing study field that has been influenced by recent advancements in artificial intelligence, machine learning, and computer vision. This method of computer learning makes an effort to translate this connection to artificial systems. The paper proposes ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR allows researchers to sample arbitrarily huge datasets of diverse, interactive, customisable, and performant virtual environments in order to train and assess embodied agents across navigation, interaction, and manipulation tasks. 

According to the authors, models trained on ProcTHOR using only RGB images and without any explicit mapping or human task supervision achieve cutting-edge results in 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the ongoing Habitat2022, AI2-THOR Rearrangement2022, and RoboTHOR challenges. The paper received the Outstanding Paper award at NeurIPS 2022.

  • A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog

Emotion Recognition in Spoken Dialog (ERSD) has recently attracted a lot of attention due to the growth of open conversational data. This is due to the fact that excellent speech recognition algorithms have emerged as a result of the integration of emotional states in intelligent spoken human-computer interactions. Additionally, it has been demonstrated that recognizing emotions makes it possible to track the development of human-computer interactions, allowing for dynamic change of conversational strategies and impacting the result (e.g., customer feedback). But the volume of the current ERSD datasets restricts the model’s development. 

This ML paper proposes a Commonsense Knowledge Enhanced Network (CKE-Net) with a retrospective loss to carry out dialog modeling, external knowledge integration, and historical state retrospect hierarchically. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Preetipadma K

RELATED ARTICLES

What is enterprise data management, what is a data structure , building high-quality datasets with llms, leave a reply cancel reply.

Save my name, email, and website in this browser for the next time I comment.

Most Popular

Analytics Drift

Analytics Drift strives to keep you updated with the latest technologies such as Artificial Intelligence, Data Science, Machine Learning, and Deep Learning. We are on a mission to build the largest data science community in the world by serving you with engaging content on our platform.

Contact us: [email protected]

Copyright © 2024 Analytics Drift Private Limited.

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Sustainability
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

Study examines how machine learning boosts manufacturing

Two images of a defective syringe stopper. The top one, in visible light, shows a small pit in one end. The bottom one has a color highlight indicating how a machine sees the same defect.

Previous image Next image

Which companies deploy machine intelligence (MI) and data analytics successfully for manufacturing and operations? Why are those leading adopters so far ahead — and what can others learn from them?

MIT Machine Intelligence for Manufacturing and Operations (MIMO) and McKinsey and Company have the answer, revealed in a first-of-its-kind  Harvard Business Review article. The piece chronicles how MIMO and McKinsey partnered for a sweeping 100-company survey to explain how high-performing companies successfully wield machine learning technologies (and where others could improve).

Created by the MIT Leaders for Global Operations (LGO) program, MIMO is a research and educational program designed to boost industrial competitiveness by accelerating machine intelligence’s deployment and understanding. The goal is to “find the shortest path from data to impact,” says managing director Bruce Lawler SM ’92.

As such, the McKinsey project encapsulates MIMO’s mission of demystifying effective machine-learning use. The survey studied companies across sectors, probing their digital, data analytics, and MI tech usage; goals (ranging from efficiency to customer experience to environmental impact); and tracking. Respondents were drawn from MIT and McKinsey’s wide-ranging networks.

“The study is probably the broadest that anybody has done in the space: 100 companies and 21 performance indicators,” says Vijay D’Silva SM ’92, a senior partner at McKinsey and Company who collaborated with MIMO on the project.

Overall, those who extracted the biggest gains from digital technologies had strong governance, deployment, partnerships, MI-trained employees, and data availability. They also spent up to 60 percent more on machine learning than their competitors.

One standout company is biopharmaceutical giant Amgen, which uses deep-learning image-augmentation to maximize efficiency of visual inspection systems. This technique pays off by increasing particle detection by 70 percent and reduces the need for manual inspections. AJ Tan PhD ’19, MBA ’21, SM ’21 was instrumental in the effort: He wrote his LGO thesis about the project, winning last year’s Best Thesis Award at graduation.

Lawler says Tan’s work exemplifies MIMO’s mission of bridging the gap between machine learning and manufacturing before it’s too late.

“We saw a need to bring these powerful new technologies into manufacturing more quickly. In the next 20 to 30 years, we’re going to add another 3 billion people to the globe, and they're going to want the lifestyles that you and I enjoy. Those typically require manufactured things. How do we get better at translating natural resources into human well-being? One of the big vehicles for doing that is manufacturing, and one of the newest tools is AI and machine learning,” he says.

For the survey, MIMO issued each company a 30-page playbook analyzing how they compared against other companies across a range of categories and metrics, from strategy to governance to data execution. This will help them to target areas of opportunity or where to invest. Lawler hopes that this will be a longitudinal study with a wider scope and playbook each year — a vast but impactful undertaking with LGO brainpower as the driving engine.

“MIT was hugely important and critical to the piece of work and an amazing partner for us. We had talented MIT students on the team who did most of the analysis jointly with McKinsey, which improved the quality of the work as a result,” says D’Silva.

This collaborative approach is central to MIMO’s philosophy as an information convener and partner for the private sector. The goal is drive “an effective transformation in industries that achieve not just technical goals, but also business goals and social goals,” says Duane Boning, engineering faculty director at MIT LGO, and faculty lead at MIMO.

This fusion of research and collaboration is the logical next step for LGO, he says, because it’s always been at the forefront of problem-solving for global operations. Machine learning is definitely the latest big knowledge gap for many businesses, but not the first, and MIMO can teach companies how to apply it.

“[I liken] it to 30 years ago when LGO got started, when it was all about lean manufacturing principles. About 15 years ago, it was the supply chain idea. That sparked us to think — not just for our LGO students, but for the benefit of industry more broadly — for understanding this big change, for facilitating it, for doing research and getting connections into other actual research activities, we need some effort to catalyze this,” Boning says. “That’s [MIMO’s] real excitement: What are ideas that work? What are methodologies that work? What are technologies that work? And LGO students, in some sense, are the perfect vehicle to discover some of that.”

Share this news article on:

Related links.

  • Machine Intelligence for Manufacturing and Operations
  • Leaders for Global Operations
  • MIT Sloan School of Management

Related Topics

  • Artificial intelligence
  • Digital technology
  • Computer vision
  • Computer science and technology
  • Manufacturing
  • Business and management
  • Technology and society
  • Leaders for Global Operations (LGO)

Related Articles

Delta Electronics Professor of Computer Science Regina Barzilay (left) gives a tutorial on the fundamentals of machine learning.

Applying machine learning to challenges in the pharmaceutical industry

"As the momentum builds, developers will be able to set up a ML [machine learning] apparatus just as they set up a database," says Max Kanter, CEO at Feature Labs. "It will be that simple."

ML 2.0: Machine learning for many

To solve complex problems, data scientists must shepherd their raw data through a series of steps, each one requiring many human-driven decisions. The last step in the process, deciding on a modeling technique, is particularly crucial.

Auto-tuning data science: New research streamlines machine learning

Previous item Next item

More MIT News

Anna Huang and Eran Egozy pose by an outdoor geometric sculpture

MIT launches new Music Technology and Computation Graduate Program

Read full story →

Three women hold children while outside their homes.

How social structure influences the way people share money

Light securing a data pathway between a computer and a cloud-based computing platform

New security protocol shields data from attackers during cloud-based computation

Photo of the brown, rocky, clay Mars surface.

Mars’ missing atmosphere could be hiding in plain sight

A woman sleeps while wearing an Elemind headband. It has a sensor on the forehead.

Startup helps people fall asleep by aligning audio signals with brainwaves

Jail cells and hallway in a prison.

Study evaluates impacts of summer heat in U.S. prison environments

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram

Research Group

Machine learning.

latest research on machine learning

We study a range of research areas related to machine learning and their applications for robotics, health care, language processing, information retrieval and more. Among these subjects include precision medicine, motion planning, computer vision, Bayesian inference, graphical models, statistical inference and estimation. Our work is interdisciplinary and deeply rooted in systems and computer science theory.

Many of our researchers have affiliations with other groups at MIT, including the  Institute for Medical Engineering & Science  (IMES) and the Institute for Data, Systems and Society (IDSS).

Related Links

If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.

Last updated Jun 20 '18

Research Areas

Impact areas.

Broderick-headshot

Tamara Broderick

Tommi Jaakkola

Tommi Jaakkola

Jegelka-headshot

Stefanie Jegelka

Kaelbling

Leslie Kaelbling

David Sontag headshot

David Sontag

Optimal transport for statistics and machine learning.

solomon-headshot

Justin Solomon

Sensible deep learning for 3d data, robust optimization in machine learning and data mining, structured prediction through randomization.

default headshot

Andreea Gane

Tamir hazan, interpretability in complex machine learning models, diversity-inducing probability measures, geometry in large-scale machine learning, finite approximations of infinite models, tractable models of sparse networks, scalable bayesian inference via adaptive data summaries, learning optimal interventions.

Gifford

Learning Strategic Games

Scalable bayesian inference with optimization, learning from streaming network data, different types of approximations for fast and accurate probabilistic inference,   12 more.

A new algorithm developed by researchers at MIT CSAIL helps robots practice skills on their own. In experiments, it guided a quadruped with sweeping and placing various items (Credits: Alex Shipps/MIT CSAIL).

Helping robots practice skills independently to adapt to unfamiliar environments

A new method could help models predict a material's thermal properties, such as by revealing the dynamics of atoms in crystals, as illustrated here (Credits: Image courtesy of the researchers).

AI method radically speeds predictions of materials’ thermal properties

A new technique can help researchers who use Bayesian inference achieve more accurate results more quickly, without a lot of additional work (Credits: iStock).

Automated method helps researchers quantify uncertainty in their predictions

New MIT research provides a theoretical proof for a phenomenon observed in practice: that encoding symmetries in the machine learning model helps the model learn with fewer data (Credits: Alex Shipps/MIT CSAIL).

How symmetry can come to the aid of machine learning

What do people mean when they say “generative AI,” and why do these systems seem to be finding their way into practically every application imaginable? MIT AI experts help break down the ins and outs of this increasingly popular, and ubiquitous, technology (Credits: Jose-Luis Olivares, MIT).

Explained: Generative AI

Two MIT CSAIL members.

Two CSAILers within School of Engineering granted tenure in 2023

CSAIL PI and MIT professor

Unpacking the “black box” to build better AI models

Avoiding shortcut solutions in artificial intelligence.

Model developed at MIT’s Computer Science and Artificial Intelligence Laboratory could reduce false positives and unnecessary surgeries.

Using artificial intelligence to improve early breast cancer detection

  6 more.

share this!

September 24, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

New AI model breaks barriers in cross-modality machine vision learning

by Chinese Academy of Sciences

New AI model breaks barriers in cross-modality machine vision learning

Recently, the research team led by Prof. Wang Hongqiang from the Hefei Institutes of Physical Science of the Chinese Academy of Sciences proposed a wide-ranging cross-modality machine vision AI model.

This model overcame the limitations of traditional single-domain models in handling cross-modality information and achieved new breakthroughs in cross-modality image retrieval technology.

Cross-modality machine vision is a major challenge in AI, as it involves finding consistency and complementarity between different types of data. Traditional methods focus on images and features but are limited by issues like information granularity and lack of data.

Compared to traditional methods, researchers found that detailed associations are more effective in maintaining consistency across modalities. The work is posted to the arXiv preprint server.

In the study, the team introduced a wide-ranging information mining network (WRIM-Net). This model created global region interactions to extract detailed associations across various domains, such as spatial, channel, and scale domains, emphasizing modality invariant information mining across a broad range.

Additionally, the research team guided the network to effectively extract modality-invariant information by designing a cross-modality key-instance contrastive loss. Experimental validation showed the model's effectiveness on both standard and large-scale cross-modality datasets, achieving more than 90% in several key performance metrics for the first time.

This model can be applied in various fields of artificial intelligence, including visual traceability and retrieval as well as medical image analysis , according to the team.

Explore further

Feedback to editors

latest research on machine learning

Shape-morphing brain sensor adheres to curved surfaces for ultrasound neurostimulation

12 hours ago

latest research on machine learning

Scientists develop novel digital encoding system using fluorescent pixels

latest research on machine learning

Study highlights complex ocean conditions facing world's most powerful tidal turbine

15 hours ago

latest research on machine learning

Organic supramolecular crystals with high hydrogen storage performance could enhance fuel-cell vehicle efficiency

16 hours ago

latest research on machine learning

Clean energy transition: The impact of financial costs on the development of renewable energy sources

latest research on machine learning

As LLMs grow bigger, they're more likely to give wrong answers than admit ignorance

17 hours ago

latest research on machine learning

LiDAR-based system allows unmanned aerial vehicle team to rapidly reconstruct environments

19 hours ago

latest research on machine learning

Untapped potential: Study shows how water systems can help accelerate renewable energy adoption

21 hours ago

latest research on machine learning

Computing scheme accelerates machine learning while improving energy efficiency of traditional data operations

Sep 26, 2024

latest research on machine learning

New continuous reaction process can help turn plant waste into sustainable aviation fuel

Related stories.

latest research on machine learning

Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning

Feb 26, 2024

latest research on machine learning

New unsupervised domain adaptation framework enhances precision in medical image segmentation

Aug 13, 2024

latest research on machine learning

New AI framework enhances emotion analysis

Jun 26, 2024

latest research on machine learning

Pre-training in medical data: A survey

Jun 6, 2023

latest research on machine learning

Multimodal technique for analyzing audio and visual data improves performance of machine-learning models

latest research on machine learning

New neural framework enhances reconstruction of high-resolution images

Sep 5, 2024

Recommended for you

latest research on machine learning

New study shows AI can forecast mining disasters

latest research on machine learning

ChatGPT's rise linked to decline in public knowledge sharing on online Q&A platforms

Sep 25, 2024

latest research on machine learning

Replacing hype about AI in journal articles with accurate measurements of success

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Tech Xplore in any form.

Your Privacy

This site uses cookies to assist with navigation, analyse your use of our services, collect data for ads personalisation and provide content from third parties. By using our site, you acknowledge that you have read and understand our Privacy Policy and Terms of Use .

E-mail newsletter

  • Benchmarks / Tech
  • Buyers Guide

New research reveals brain memory encoding can improve AI, memory therapies, and learning tools

Real-world applications of this research across various fields include education, artificial intelligence and brain-machine interfaces. (Image source: Dall-E 3)

A new research paper titled Human hippocampal and entorhinal neurons encode the temporal structure of experience explores how our brains organize memories by identifying patterns over time, even when we’re not consciously aware of them. The study specifically focused on neurons in the hippocampus and entorhinal cortex, two key brain regions involved in memory and learning.

In this study, researchers monitored brain activity in 17 patients who had epilepsy and had intracranial electrodes implanted - tiny devices placed inside the brain to monitor its electrical activity. This allowed scientists to directly observe how neurons behave when people are exposed to patterns or sequences of events. For the sake of this experiment, patients were shown around 120 images of people, animals, objects and landmarks over 40 minutes - that too, in a specific order. Researchers analyzed how neurons in the hippocampus (a part of the brain that helps store and retrieve memories) and the entorhinal cortex (a region that communicates with the hippocampus to process both time and space) responded to those images.

One major finding was that neurons slowly yet steadily altered their activity as the patients were exposed to these image patterns, even though the participants were not told about the pattern. The neurons encoded what the images were ("what" information) and in what order they appeared ("when" information). What this did was, it formed a representation of the same sequence/pattern, a process known as encoding temporal sequences—essentially how the brain tracks the order of events over time. Even when the images were later presented in random order, the neurons still remembered the original sequence.

The sequence of stimuli presentation (bottom) corresponded to a random order on a pyramid graph in the test. (Image source: UCLA / Nature)

Neuronal replay was another aspect of the same study, where the neurons quickly replayed the same sequence of events during breaks. Happening at a way faster rate, this replay is believed to help the brain consolidate, or integrate, the memory of the sequence. The researchers drew parallels between how the brain encodes space and time, suggesting that similar mechanisms are at work whether one is navigating through space (e.g., walking through a maze or a confined space) or tracking the sequence of events in a timeline.

So what are this research's implications? The brain is the most complex organ in the world, and it will bring us closer to understanding the brain’s ability to organize experiences into predictable patterns. Even without conscious awareness, our neurons are working to make sense of the world, organizing both space and time to help us remember and anticipate future events. 

Real-world applications across various fields include education, where these findings could lead to improved learning methods by structuring material in a way that mirrors how the brain naturally processes sequences - in short, better memory retention. In healthcare, the research could guide the development of therapies for memory disorders like Alzheimer's . Artificial intelligence and machine learning systems could benefit from mimicking the brain’s predictive abilities, leading to smarter and more adaptive technologies.

Brain-machine interfaces, similar to the Neuralink , could leverage temporal encoding to aid individuals with neurological impairments. This will allow them to better control prosthetics or communication devices. Last but not least, mental health treatments, particularly for conditions like PTSD, could be stepped up - simply by targeting how traumatic memories are encoded and recalled, giving us new ways to manage intrusive thoughts.

UCLA Health via Nature

Related Articles

It looks like the Tesla Robotaxi event on October 10 is happening, after all. (Image source: Tesla via Walter Isaacson / Patrick Lawson on X - edited)

Machine Learning News

When The Bank’s Customers Are Replaced By Custobots

When The Bank’s Customers Are Replaced By Custobots

Gartner are right to point to bot-on-bot action for financial services moving forward. but even they may be underestimating the changes they will bring to money itself..

ServiceNow Beats, Raises, And Aims At $1 Trillion Generative AI Opportunity

ServiceNow Beats, Raises, And Aims At $1 Trillion Generative AI Opportunity

5 Considerations For Health Plan Leaders Using AI-Enabled Prior Authorization And Utilization Management

5 Considerations For Health Plan Leaders Using AI-Enabled Prior Authorization And Utilization Management

Can AI Resolve Your Issues With A Colleague?

Can AI Resolve Your Issues With A Colleague?

Tech Leaders Will Boost Ops To Grow With AI In 2024

Tech Leaders Will Boost Ops To Grow With AI In 2024

More from machine learning news.

article image

Retail Investors Have Been Burned, Should Startups Be Using AI To Get Them Back?

With the rapid expansion of AI investing capabilities, can startups to offer investment solutions that finally level the playing field?

article image

The One Person, One Computer AI Revolution

Tech transforms activities we once did collectively. You don’t need many people to create music anymore. You can do it alone. But what if we still need the human touch?

article image

Confluent Introduces Data Streaming for AI + New Partnerships

Confluent announced its Data Streaming for AI initiative to aid organizations in developing real-time AI applications. The company also introduced new AI-enabled tools.

article image

Leidos Reinforcement Learning Exercise Illuminates How AI Can Automate Military Ops

Reinforcement Learning will enable combat systems to learn the way people do—from experience.

article image

5 ChatGPT Prompts To Start Interesting Conversations (and Never Be Lost For Words)

There’s no excuse not to start interesting conversations. Use these ChatGPT prompts to show up with confidence and wow everyone you meet.

article image

3 Ways AI Is Already Influencing Craft Cocktails & Beer

AI is starting to influence how people find cocktail recipes, develop recipes and which breweries and bars they visit.

article image

6 AI Wellbeing Tools For Work You Should Try This Mental Health Day

In honor of World Mental Health Day, here is a round-up of six AI-powered work wellbeing tools you should try this month, to improve productivity and work-life balance.

article image

Generative AI Throwdown: Open Source Vs. Proprietary Models

Generative AI, dominated by proprietary models locked inside big tech companies, is being disrupted by a new wave of open-source models.

article image

Taylor Swift Could Be Your Next (AI) Girlfriend

The AI app DreamGF.ai has extended a $2 million offer to artist Taylor Swift for the rights to use her likeness on its platform.

article image

See The Bizarre Image That Just Won An Inaugural AI-Art Award

The Prompted Peculiar International AI Prize results are in, and they're a great reminder of just how mesmerizing—and absurd—art created with the help of AI can be.

  • Our Purpose

Our Science and Technology

  • Join Our Team
  • Partner with Us
  • Community and Education

LLNL Logo

  • Advanced Materials and Manufacturing

Bioscience and Bioengineering

  • Earth and Atmospheric Science

Facilities, Centers and Institutes

High Energy Density Science

  • HPC, Simulation and Data Science

Lasers & Optical Science and Technology

Nuclear, Chemical & Isotopic Science and Technology

Learn How We Make Science and Technology Happen

We support diverse research activities with talented staff, state-of-the-art facilities and core competencies. From internal collaboration to external partnerships, we work together to advance scientific discovery.

Learn More About Our Science and Technology

  • Media Contacts
  • Media Library
  • Publications
  • Fact Sheets
  • Social Media

Celebrating the Milestone of Fusion Ignition

In 2022, Lawrence Livermore National Laboratory made history by demonstrating fusion ignition for the first time in a laboratory setting. Read about the people, facilities, capabilities and decades of tenacity that made this achievement possible.

Read about our fusion breakthrough

New research could extend the lifetime of key carbon-capture materials

JACS_PolymerDeg_Atomistic simulations

Atomistic simulations, machine learning potential and accelerated degradation experiments reveal the complex role of CO2 in the oxidation kinetics of amine-functional sorbents for carbon capture. (Illustration concept: Sichi Li/LLNL; Illustration: Jacob Long and Adam Samuel Connell/LLNL)

Researchers at Lawrence Livermore National Laboratory (LLNL), in collaboration with the Georgia Institute of Technology , have made a significant breakthrough in understanding the impact of carbon dioxide (CO 2 ) on the stability of amine-functionalized porous solid materials, a crucial component in Direct Air Capture (DAC) carbon-capture technologies.

This new research, published in the Journal of the American Chemical Society and featured on the journal cover , sheds light on the complex interactions between CO 2 and poly(ethylenimine) sorbents, offering important insights that could enhance the efficiency and durability of DAC systems.

“This study underscores the importance of considering all atmospheric components in the design of DAC processes and materials,” said Simon Pang, corresponding author and principal investigator of the project. “Our findings will be instrumental in developing next-generation sorbents with enhanced durability, contributing to more efficient and cost-effective carbon-capture solutions.”

Amine-based sorbents are at the forefront of DAC technology due to their exceptional ability to efficiently capture CO 2 even at ultra-dilute conditions. However, the long-term stability of these materials has been a significant challenge, primarily due to oxidative degradation. The research team investigated the previously unresolved role of CO 2 in the oxidative degradation process of these sorbents, reconciling conflicting data in existing literature. The study reveals that CO 2 exerts a non-monotonic effect on the oxidation kinetics of poly(ethylenimine) sorbents, with its impact varying significantly depending on temperature and CO 2 concentration.

“Our research highlights the dual role of CO 2 in the oxidation process,” said Sichi Li, lead author of the paper and co-investigator of the project. “On one hand, CO 2 catalyzes critical oxidation reactions, while on the other, it reduces polymer branch mobility, which slows down radical propagation. These contrasting effects are key to understanding the complex degradation profiles we observed.”

The study's conclusions extend beyond reconciling existing literature, offering practical implications for the future of DAC technology. By identifying polymer side chain mobility and the presence of acidic environments as major factors accelerating oxidation, the research suggests new strategies to enhance sorbent longevity. Potential solutions include the introduction of functional groups, additives or oxide supports with surface chemistry designed to reduce polymer mobility or neutralize acidic conditions, thereby mitigating the rate of oxidative degradation.

LLNL co-authors also include Marcos F Calegari Andrade, Elwin Hunter-Sellars, Amitesh Maiti and Anthony J. Varni. The research is funded by the Department of Energy, Office of Science, Basic Energy Sciences , Materials Sciences and Engineering Division. Computing resources were provided by the LLNL Grand Challenge Program .

Anne Stark

Related Links

LLNL

  • Our Mission
  • Management and Sponsors
  • Our History - 1950s
  • Our History - 1960s
  • Our History - 1970s
  • Our History - 1980s
  • Our History - 1990s
  • Our History - 2000s
  • Our History - 2010s
  • Our History - 2020s
  • Bioscience & Bioengineering
  • Facilities and Centers
  • High-Energy-Density Science
  • Lasers and Optical Science and Technology
  • Nuclear, Chemical and Isotopic Science and Technology
  • Career Areas
  • Events & Resources
  • LCA Notifications
  • Affiliate Opportunities
  • Accessibility
  • Partner With Us
  • Discovery Center
  • Community Giving
  • Science Education
  • Environment, Safety, and Health
  • In Memoriam
  • Our Leadership
  • By the Numbers
  • Operations and Business
  • Protocol and Special Events

latest research on machine learning

Intuit AI_Roadblock Banner

  • Conferences
  • Last Updated: September 13, 2024
  • In AI Mysteries

Top Machine Learning Research Papers

latest research on machine learning

  • by Dr. Nivash Jeevanandam

Join AIM in Whatsapp

Advances in machine learning and deep learning research are reshaping our technology. Machine learning and deep learning have accomplished various astounding feats, and key research articles have resulted in technical advances used by billions of people. The research in this sector is advancing at a breakneck pace and assisting you to keep up. Here is a collection of the most important scientific study papers in machine learning.

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training

The authors of this work examined why ACGAN training becomes unstable as the number of classes in the dataset grows. The researchers revealed that the unstable training occurs due to a gradient explosion problem caused by the unboundedness of the input feature vectors and the classifier’s poor classification capabilities during the early training stage. The researchers presented the Data-to-Data Cross-Entropy loss (D2D-CE) and the Rebooted Auxiliary Classifier Generative Adversarial Network to alleviate the instability and reinforce ACGAN (ReACGAN). Additionally, extensive tests of ReACGAN demonstrate that it is resistant to hyperparameter selection and is compatible with a variety of architectures and differentiable augmentations.

This article is ranked #1 on CIFAR-10 for Conditional Image Generation.

For the research paper, read here .

For code, see here .

Dense Unsupervised Learning for Video Segmentation

The authors presented a straightforward and computationally fast unsupervised strategy for learning dense spacetime representations from unlabeled films in this study. The approach demonstrates rapid convergence of training and a high degree of data efficiency. Furthermore, the researchers obtain VOS accuracy superior to previous results despite employing a fraction of the previously necessary training data. The researchers acknowledge that the research findings may be utilised maliciously, such as for unlawful surveillance, and that they are excited to investigate how this skill might be used to better learn a broader spectrum of invariances by exploiting larger temporal windows in movies with complex (ego-)motion, which is more prone to disocclusions.

This study is ranked #1 on DAVIS 2017 for Unsupervised Video Object Segmentation (val).

Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

The authors offer an atlas-based technique for producing unsupervised temporally consistent surface reconstructions by requiring a point on the canonical shape representation to translate to metrically consistent 3D locations on the reconstructed surfaces. Finally, the researchers envisage a plethora of potential applications for the method. For example, by substituting an image-based loss for the Chamfer distance, one may apply the method to RGB video sequences, which the researchers feel will spur development in video-based 3D reconstruction.

This article is ranked #1 on ANIM in the category of Surface Reconstruction. 

EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow

The researchers propose a revolutionary interactive architecture called EdgeFlow that uses user interaction data without resorting to post-processing or iterative optimisation. The suggested technique achieves state-of-the-art performance on common benchmarks due to its coarse-to-fine network design. Additionally, the researchers create an effective interactive segmentation tool that enables the user to improve the segmentation result through flexible options incrementally.

This paper is ranked #1 on Interactive Segmentation on PASCAL VOC

Learning Transferable Visual Models From Natural Language Supervision

The authors of this work examined whether it is possible to transfer the success of task-agnostic web-scale pre-training in natural language processing to another domain. The findings indicate that adopting this formula resulted in the emergence of similar behaviours in the field of computer vision, and the authors examine the social ramifications of this line of research. CLIP models learn to accomplish a range of tasks during pre-training to optimise their training objective. Using natural language prompting, CLIP can then use this task learning to enable zero-shot transfer to many existing datasets. When applied at a large scale, this technique can compete with task-specific supervised models, while there is still much space for improvement.

This research is ranked #1 on Zero-Shot Transfer Image Classification on SUN

CoAtNet: Marrying Convolution and Attention for All Data Sizes

The researchers in this article conduct a thorough examination of the features of convolutions and transformers, resulting in a principled approach for combining them into a new family of models dubbed CoAtNet. Extensive experiments demonstrate that CoAtNet combines the advantages of ConvNets and Transformers, achieving state-of-the-art performance across a range of data sizes and compute budgets. Take note that this article is currently concentrating on ImageNet classification for model construction. However, the researchers believe their approach is relevant to a broader range of applications, such as object detection and semantic segmentation.

This paper is ranked #1 on Image Classification on ImageNet (using extra training data).

SwinIR: Image Restoration Using Swin Transformer

The authors of this article suggest the SwinIR image restoration model, which is based on the Swin Transformer . The model comprises three modules: shallow feature extraction, deep feature extraction, and human-recognition reconstruction. For deep feature extraction, the researchers employ a stack of residual Swin Transformer blocks (RSTB), each formed of Swin Transformer layers, a convolution layer, and a residual connection.

This research article is ranked #1 on Image Super-Resolution on Manga109 – 4x upscaling.

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Ways to incorporate historical data are still unclear: initialising reward estimates with historical samples can suffer from bogus and imbalanced data coverage, leading to computational and storage issues—particularly in continuous action spaces. The paper addresses the obstacles by proposing ‘Artificial Replay’, an algorithm to incorporate historical data into any arbitrary base bandit algorithm. 

Read the full paper here . 

Bootstrapped Meta-Learning

Author(s) – Sean R. Sinclair et al.

The paper proposes an algorithm in which the meta-learner teaches itself to overcome the meta-optimisation challenge. The algorithm focuses on meta-learning with gradients, which guarantees performance improvements. Furthermore, the paper also looks at how bootstrapping opens up possibilities. 

Read the full paper here .

LaMDA: Language Models for Dialog Applications

Author(s) – Sebastian Flennerhag et al.

The research describes the LaMDA system which caused chaos in AI this summer when a former Google engineer claimed that it had shown signs of sentience. LaMDA is a family of large language models for dialogue applications based on Transformer architecture. The interesting feature of the model is its fine-tuning with human-annotated data and the possibility of consulting external sources. This is a very interesting model family, which we might encounter in many applications we use daily. 

Competition-Level Code Generation with AlphaCode

Author(s) – Yujia Li et al.

Systems can help programmers become more productive. The following research addresses the problems with incorporating innovations in AI into these systems. AlphaCode is a system that creates solutions for problems that require deeper reasoning. 

Privacy for Free: How does Dataset Condensation Help Privacy?

Author(s) – Tian Dong et al.

The paper focuses on Privacy Preserving Machine Learning, specifically deducting the leakage of sensitive data in machine learning. It puts forth one of the first propositions of using dataset condensation techniques to preserve the data efficiency during model training and furnish membership privacy.

Why do tree-based models still outperform deep learning on tabular data?

Author(s) – Léo Grinsztajn, Edouard Oyallon and Gaël Varoquaux

The research answers why deep learning models still find it hard to compete on tabular data compared to tree-based models. It is shown that MLP-like architectures are more sensitive to uninformative features in data compared to their tree-based counterparts. 

Multi-Objective Bayesian Optimisation over High-Dimensional Search Spaces 

Author(s) – Samuel Daulton et al.

The paper proposes ‘MORBO’, a scalable method for multiple-objective BO as it performs better than that of high-dimensional search spaces. MORBO significantly improves the sample efficiency and, where existing BO algorithms fail, MORBO provides improved sample efficiencies over the current approach. 

A Path Towards Autonomous Machine Intelligence Version 0.9.2

Author(s) – Yann LeCun

The research offers a vision about how to progress towards general AI. The study combines several concepts: a configurable predictive world model, behaviour driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised

learning. 

TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

Author(s) –  Shreshth Tuli, Giuliano Casale and Nicholas R. Jennings

This is a specialised paper applying transformer architecture to the problem of unsupervised anomaly detection in multivariate time series. Many architectures which were successful in other fields are, at some point, also being applied to time series. The research shows improved performance on some known data sets. 

Differentially Private Bias-Term only Fine-tuning of Foundation Models

Author(s) – Zhiqi Bu et al. 

In the paper, researchers study the problem of differentially private (DP) fine-tuning of large pre-trained models—a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraints yet requires significant computational overhead or modifications to the network architecture.

ALBERT: A Lite BERT

Usually, increasing model size when pretraining natural language representations often result in improved performance on downstream tasks, but the training times become longer. To address these problems, the authors in their work presented two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. The authors also used a self-supervised loss that focuses on modelling inter-sentence coherence and consistently helped downstream tasks with multi-sentence inputs. According to results, this model established new state-of-the-art results on the GLUE, RACE, and squad benchmarks while having fewer parameters compared to BERT-large. 

Check the paper here .

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Microsoft Research, along with the University of Washington and the University of California, in this paper, introduced a model-agnostic and task agnostic methodology for testing NLP models known as CheckList. This is also the winner of the best paper award at the ACL conference this year. It included a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. 

Linformer is a Transformer architecture for tackling the self-attention bottleneck in Transformers. It reduces self-attention to an O(n) operation in both space- and time complexity. It is a new self-attention mechanism which allows the researchers to compute the contextual mapping in linear time and memory complexity with respect to the sequence length. 

Read more about the paper here .

Plug and Play Language Models

Plug and Play Language Models ( PPLM ) are a combination of pre-trained language models with one or more simple attribute classifiers. This, in turn, assists in text generation without any further training. According to the authors, model samples demonstrated control over sentiment styles, and extensive automated and human-annotated evaluations showed attribute alignment and fluency. 

Reformer 

The researchers at Google, in this paper , introduced Reformer. This work showcased that the architecture of a Transformer can be executed efficiently on long sequences and with small memory. The authors believe that the ability to handle long sequences opens the way for the use of the Reformer on many generative tasks. In addition to generating very long coherent text, the Reformer can bring the power of Transformer models to other domains like time-series forecasting, music, image and video generation. 

An Image is Worth 16X16 Words

The irony here is that one of the popular language models, Transformers have been made to do computer vision tasks. In this paper , the authors claimed that the vision transformer could go toe-to-toe with the state-of-the-art models on image recognition benchmarks, reaching accuracies as high as 88.36% on ImageNet and 94.55% on CIFAR-100. For this, the vision transformer receives input as a one-dimensional sequence of token embeddings. The image is then reshaped into a sequence of flattened 2D patches. The transformers in this work use constant widths through all of its layers.

Unsupervised Learning of Probably Symmetric Deformable 3D Objects

Winner of the CVPR best paper award, in this work, the authors proposed a method to learn 3D deformable object categories from raw single-view images, without external supervision. This method uses an autoencoder that factored each input image into depth, albedo, viewpoint and illumination. The authors showcased that reasoning about illumination can be used to exploit the underlying object symmetry even if the appearance is not symmetric due to shading.

Generative Pretraining from Pixels

In this paper, OpenAI researchers examined whether similar models can learn useful representations for images. For this, the researchers trained a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, the researchers found that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, it achieved 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning and matching the top supervised pre-trained models. An even larger model, trained on a mixture of ImageNet and web images, is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of their features.

Deep Reinforcement Learning and its Neuroscientific Implications

In this paper, the authors provided a high-level introduction to deep RL , discussed some of its initial applications to neuroscience, and surveyed its wider implications for research on brain and behaviour and concluded with a list of opportunities for next-stage research. Although DeepRL seems to be promising, the authors wrote that it is still a work in progress and its implications in neuroscience should be looked at as a great opportunity. For instance, deep RL provides an agent-based framework for studying the way that reward shapes representation, and how representation, in turn, shapes learning and decision making — two issues which together span a large swath of what is most central to neuroscience. 

Dopamine-based Reinforcement Learning

Why humans doing certain things are often linked to dopamine , a hormone that acts as the reward system (think: the likes on your Instagram page). So, keeping this fact in hindsight, DeepMind with the help of Harvard labs, analysed dopamine cells in mice and recorded how the mice received rewards while they learned a task. They then checked these recordings for consistency in the activity of the dopamine neurons with standard temporal difference algorithms. This paper proposed an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning. The authors hypothesised that the brain represents possible future rewards not as a single mean but as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. 

Lottery Tickets In Reinforcement Learning & NLP

In this paper, the authors bridged natural language processing (NLP) and reinforcement learning (RL). They examined both recurrent LSTM models and large-scale Transformer models for NLP and discrete-action space tasks for RL. The results suggested that the lottery ticket hypothesis is not restricted to supervised learning of natural images, but rather represents a broader phenomenon in deep neural networks.

What Can Learned Intrinsic Rewards Capture

In this paper, the authors explored if the reward function itself can be a good locus of learned knowledge. They proposed a scalable framework for learning useful intrinsic reward functions across multiple lifetimes of experience and showed that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. 

AutoML- Zero

The progress of AutoML has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks, or similarly restrictive search spaces. In this paper , the authors showed that AutoML could go further with AutoML Zero, that automatically discovers complete machine learning algorithms just using basic mathematical operations as building blocks. The researchers demonstrated this by introducing a novel framework that significantly reduced human bias through a generic search space.

Rethinking Batch Normalization for Meta-Learning

Batch normalization is an essential component of meta-learning pipelines. However, there are several challenges. So, in this paper, the authors evaluated a range of approaches to batch normalization for meta-learning scenarios and developed a novel approach — TaskNorm. Experiments demonstrated that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient-based and gradient-free meta-learning approaches. The TaskNorm has been found to be consistently improving the performance.

Meta-Learning without Memorisation

Meta-learning algorithms need meta-training tasks to be mutually exclusive, such that no single model can solve all of the tasks at once. In this paper, the authors designed a meta-regularisation objective using information theory that successfully uses data from non-mutually-exclusive tasks to efficiently adapt to novel tasks.

Understanding the Effectiveness of MAML

Model Agnostic Meta-Learning (MAML) consists of optimisation loops, from which the inner loop can efficiently learn new tasks. In this paper, the authors demonstrated that feature reuse is the dominant factor and led to ANIL (Almost No Inner Loop) algorithm — a simplification of MAML where the inner loop is removed for all but the (task-specific) head of the underlying neural network. 

Your Classifier is Secretly an Energy-Based Model

This paper proposed attempts to reinterpret a standard discriminative classifier as an energy-based model. In this setting, wrote the authors, the standard class probabilities can be easily computed. They demonstrated that energy-based training of the joint distribution improves calibration, robustness, handout-of-distribution detection while also enabling the proposed model to generate samples rivalling the quality of recent GAN approaches. This work improves upon the recently proposed techniques for scaling up the training of energy-based models. It has also been the first to achieve performance rivalling the state-of-the-art in both generative and discriminative learning within one hybrid model.

Reverse-Engineering Deep ReLU Networks

This paper investigated the commonly assumed notion that neural networks cannot be recovered from its outputs, as they depend on its parameters in a highly nonlinear way. The authors claimed that by observing only its output, one could identify the architecture, weights, and biases of an unknown deep ReLU network. By dissecting the set of region boundaries into components associated with particular neurons, the researchers showed that it is possible to recover the weights of neurons and their arrangement within the network.

Cricket Analytics and Predictor

Authors: Suyash Mahajan,  Salma Shaikh, Jash Vora, Gunjan Kandhari,  Rutuja Pawar,

Abstract:   The paper embark on predicting the outcomes of Indian Premier League (IPL) cricket match using a supervised learning approach from a team composition perspective. The study suggests that the relative team strength between the competing teams forms a distinctive feature for predicting the winner. Modeling the team strength boils down to modeling individual player‘s batting and bowling performances, forming the basis of our approach.

Research Methodology: In this paper, two methodologies have been used. MySQL database is used for storing data whereas Java for the GUI. The algorithm used is Clustering Algorithm for prediction. The steps followed are as

  • Begin with a decision on the value of k being the number of clusters.
  • Put any initial partition that classifies the data into k clusters.
  • Take every sample in the sequence; compute its distance from centroid of each of the clusters. If sample is not in the cluster with the closest centroid currently, switch this sample to that cluster and update the centroid of the cluster accepting the new sample and the cluster losing the sample.

For the research paper, read here

2.Real Time Sleep / Drowsiness Detection – Project Report

Author : Roshan Tavhare

Institute : University of Mumbai

Abstract : The main idea behind this project is to develop a nonintrusive system which can detect fatigue of any human and can issue a timely warning. Drivers who do not take regular breaks when driving long distances run a high risk of becoming drowsy a state which they often fail to recognize early enough.

Research Methodology : A training set of labeled facial landmarks on an image. These images are manually labeled, specifying specific (x, y) -coordinates of regions surrounding each facial structure.

  • Priors, more specifically, the probability on distance between pairs of input pixels. The pre-trained facial landmark detector inside the dlib library is used to estimate the location of 68 (x, y)-coordinates that map to facial structures on the face.

A Study of Various Text Augmentation Techniques for Relation Classification in Free Text

Authors: Chinmaya Mishra Praveen Kumar and Reddy Kumar Moda,  Syed Saqib Bukhari and Andreas Dengel

Institute: German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

Abstract: In this paper, the researchers explore various text data augmentation techniques in text space and word embedding space. They studied the effect of various augmented datasets on the efficiency of different deep learning models for relation classification in text.

Research Methodology: The researchers implemented five text data augmentation techniques (Similar word, synonyms, interpolation, extrapolation and random noise method)  and explored the ways in which we could preserve the grammatical and the contextual structures of the sentences while generating new sentences automatically using data augmentation techniques.

Smart Health Monitoring and Management Using Internet of Things, Artificial Intelligence with Cloud Based Processing

Author : Prateek Kaushik

Institute : G D Goenka University, Gurugram

Abstract : This research paper described a personalised smart health monitoring device using wireless sensors and the latest technology.

Research Methodology: Machine learning and Deep Learning techniques are discussed which works as a catalyst to improve the  performance of any health monitor system such supervised machine learning algorithms, unsupervised machine learning algorithms, auto-encoder, convolutional neural network and restricted boltzmann machine .

Internet of Things with BIG DATA Analytics -A Survey

Author : A.Pavithra,  C.Anandhakumar and V.Nithin Meenashisundharam

Institute : Sree Saraswathi Thyagaraja College,

Abstract : This article we discuss about Big data on IoT and how it is interrelated to each other along with the necessity of implementing Big data with IoT and its benefits, job market

Research Methodology : Machine learning, Deep Learning, and Artificial Intelligence are key technologies that are used to provide value-added applications along with IoT and big data in addition to being used in a stand-alone mod.

Single Headed Attention RNN: Stop Thinking With Your Head 

Author: Stephen Merity

In this work of art, the Harvard grad author, Stephen “Smerity” Merity, investigated the current state of NLP, the models being used and other alternate approaches. In this process, he tears down the conventional methods from top to bottom, including etymology.

The author also voices the need for a Moore’s Law for machine learning that encourages a minicomputer future while also announcing his plans on rebuilding the codebase from the ground up both as an educational tool for others and as a strong platform for future work in academia and industry.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Authors: Mingxing Tan and Quoc V. Le 

In this work, the authors propose a compound scaling method that tells when to increase or decrease depth, height and resolution of a certain network.

Convolutional Neural Networks(CNNs) are at the heart of many machine vision applications. 

EfficientNets are believed to superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster).

Deep Double Descent By OpenAI

Authors: Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal

In this paper , an attempt has been made to reconcile classical understanding and modern practice within a unified performance curve. 

The “double descent” curve overtakes the classic U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. 

The Lottery Ticket Hypothesis

Authors: Jonathan Frankle, Michael Carbin

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. 

The authors find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, they introduce the “lottery ticket hypothesis:”

On The Measure Of Intelligence 

Authors: Francois Chollet

This work summarizes and critically assesses the definitions of intelligence and evaluation approaches, while making apparent the historical conceptions of intelligence that have implicitly guided them.

The author, also the creator of keras, introduces a formal definition of intelligence based on Algorithmic Information Theory and using this definition, he also proposes a set of guidelines for what a general AI benchmark should look like. 

Zero-Shot Word Sense Disambiguation Using Sense Definition Embeddings via IISc Bangalore & CMU

Authors: Sawan Kumar, Sharmistha Jat, Karan Saxena and Partha Talukdar

Word Sense Disambiguation (WSD) is a longstanding  but open problem in Natural Language Processing (NLP).  Current supervised WSD methods treat senses as discrete labels  and also resort to predicting the Most-Frequent-Sense (MFS) for words unseen  during training.

The researchers from IISc Bangalore in collaboration with Carnegie Mellon University propose  Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD  by predicting over a continuous sense embedding space as opposed to a discrete label space.

Deep Equilibrium Models 

Authors: Shaojie Bai, J. Zico Kolter and Vladlen Koltun 

Motivated by the observation that the hidden layers of many existing deep sequence models converge towards some fixed point, the researchers at Carnegie Mellon University present a new approach to modeling sequential data through deep equilibrium model (DEQ) models. 

Using this approach, training and prediction in these networks require only constant memory, regardless of the effective “depth” of the network.

IMAGENET-Trained CNNs are Biased Towards Texture

Authors: Robert G, Patricia R, Claudio M, Matthias Bethge, Felix A. W and Wieland B

Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. The authors in this paper , evaluate CNNs and human observers on images with a texture-shape cue conflict. They show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence.

A Geometric Perspective on Optimal Representations for Reinforcement Learning 

Authors: Marc G. B , Will D , Robert D , Adrien A T , Pablo S C , Nicolas Le R , Dale S, Tor L, Clare L

The authors propose a new perspective on representation learning in reinforcement learning

based on geometric properties of the space of value functions. This work shows that adversarial value functions exhibit interesting structure, and are good auxiliary tasks when learning a representation of an environment. The authors believe this work to open up the possibility of automatically generating auxiliary tasks in deep reinforcement learning.

Weight Agnostic Neural Networks 

Authors: Adam Gaier & David Ha

In this work , the authors explore whether neural network architectures alone, without learning any weight parameters, can encode solutions for a given task. In this paper, they propose a search method for neural network architectures that can already perform a task without any explicit weight training. 

Stand-Alone Self-Attention in Vision Models 

Authors: Prajit Ramachandran, Niki P, Ashish Vaswani,Irwan Bello Anselm Levskaya, Jonathon S

In this work, the Google researchers verified that content-based interactions can serve the vision models . The proposed stand-alone local self-attention layer achieves competitive predictive performance on ImageNet classification and COCO object detection tasks while requiring fewer parameters and floating-point operations than the corresponding convolution baselines. Results show that attention is especially effective in the later parts of the network. 

High-Fidelity Image Generation With Fewer Labels 

Authors: Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Z, Olivier B and Sylvain Gelly 

Modern-day models can produce high quality, close to reality when fed with a vast quantity of labelled data. To solve this large data dependency, researchers from Google released this work , to demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting.

The proposed approach is able to match the sample quality of the current state-of-the-art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin G, Piyush Sharma and Radu S

The authors present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT and to address the challenges posed by increasing model size and GPU/TPU memory limitations, longer training times, and unexpected model degradation

As a result, this proposed model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.

GauGANs-Semantic Image Synthesis with Spatially-Adaptive Normalization 

Author: Taesung Park, Ming-Yu Liu, Ting-Chun Wang and Jun-Yan Zhu

Nvidia in collaboration with UC Berkeley and MIT proposed a model which has a spatially-adaptive normalization layer for synthesizing photorealistic images given an input semantic layout.

This model retained visual fidelity and alignment with challenging input layouts while allowing the user to control both semantic and style.

📣 Want to advertise in AIM? Book here

latest research on machine learning

Subscribe to The Belamy: Our Weekly Newsletter

Biggest ai stories, delivered to your inbox every week..

discord icon

Discover how Cypher 2024 expands to the USA, bridging AI innovation gaps and tackling the challenges of enterprise AI adoption

© Analytics India Magazine Pvt Ltd & AIM Media House LLC 2024

  • Terms of use
  • Privacy Policy

Register for our Newsletters

The Belamy, our Weekly Newsletter, is a rage. Enter your official email id to start receiving it.

An official website of the United States government

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS. A lock ( Lock Locked padlock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

design element

  • Search Awards
  • Recent Awards
  • Presidential and Honorary Awards
  • About Awards
  • How to Manage Your Award
  • Grant General Conditions
  • Cooperative Agreement Conditions
  • Special Conditions
  • Federal Demonstration Partnership
  • Policy Office Website

latest research on machine learning



August 5, 2024
August 5, 2024
2347658
Standard Grant
Reha Uzsoy
[email protected]
�(703)292-2681
CMMI
�Div Of Civil, Mechanical, & Manufact Inn
ENG
�Directorate For Engineering
November 1, 2024
October 31, 2027�(Estimated)
$313,087.00
$313,087.00
Razavi Liu
4400 VESTAL PKWY E
BINGHAMTON
NY �US �13902-4400
(607)777-6136
4400 VESTAL PKWY E
BINGHAMTON
NY �US �13902-4400
CDS&E
4900
4900
47.041

latest research on machine learning

Please report errors in award information by writing to: [email protected] .

Princeton Plasma Physics Laboratory

Replacing hype about artificial intelligence with accurate measurements of success.

illustration of ship with circuitry pattern sailing on turbulent waves made of equations

(Illustration credit: Kyle Palmer / PPPL Communications Department)

PPPL researchers find overoptimism in journal articles using machine learning to solve fluid-related partial differential equations

The hype surrounding machine learning, a form of artificial intelligence, can make it seem like it is only a matter of time before such techniques are used to solve all scientific problems. While impressive claims are often made, those claims do not always hold up under scrutiny. Machine learning may be useful for solving some problems but falls short for others.

In a new paper in Nature Machine Intelligence, researchers at the U.S.  Department of Energy ’s  Princeton Plasma Physics Laboratory (PPPL) and Princeton University performed a systematic review of research comparing  machine learning to traditional methods for solving fluid-related partial differential equations (PDEs). Such equations are important in many scientific fields, including the plasma research that supports the development of fusion power for the electricity grid. 

The researchers found that comparisons between machine learning methods for solving fluid-related PDEs and traditional methods are often biased in favor of machine learning methods. They also found that negative results were consistently underreported. They suggest rules for performing fair comparisons but argue that cultural changes are also needed to fix what appear to be systemic problems.

“Our research suggests that, though machine learning has great potential, the present literature paints an overly optimistic picture of how machine learning works to solve these particular types of equations,” said  Ammar Hakim , PPPL’s deputy head of computational science and the principal investigator on the research.  

Comparing results to weak baselines

PDEs are ubiquitous in physics and are particularly useful for explaining natural phenomena, such as heat, fluid flow and waves. For example, these kinds of equations can be used to figure out the temperatures along the length of a spoon placed in hot soup. Knowing the initial temperature of the soup and the spoon, as well as the type of metal in the spoon, a PDE could be used to determine the temperature at any point along the utensil at a given time after it was placed in the soup. Such equations are used in plasma physics, as many of the equations that govern plasmas are mathematically similar to those of fluids.

Scientists and engineers have developed various mathematical approaches to solving PDEs. One approach is known as numerical methods because it solves problems numerically, rather than analytically or symbolically, to find approximate solutions to problems that are difficult or impossible to solve exactly. Recently, researchers have explored whether machine learning can be used to solve these PDEs. The goal is to solve problems faster than they could with other methods.

The systematic review found that in most journal articles, machine learning hasn’t been as successful as advertised. “Our research indicates that there might be some cases where machine learning can be slightly faster for solving fluid-related PDEs, but in most cases, numerical methods are faster,” said  Nick McGreivy . McGreivy is the lead author of the paper and recently completed his doctorate at the  Princeton Program in Plasma Physics .

Numerical methods have a fundamental trade-off between accuracy and runtime. “If you spend more time to solve the problem, you’ll get a more accurate answer,” McGreivy said. “Many papers didn’t take that into account in their comparisons.”

Furthermore, there can be a dramatic difference in speed between numerical methods. In order to be useful, machine learning methods need to outperform the best numerical methods, McGreivy said. Yet his research found that comparisons were often being made to numerical methods that were much slower than the fastest methods.  

Two rules for making fair comparisons

Consequently, the paper proposes two rules to try to overcome these problems. The first rule is to only compare machine learning methods against numerical methods of either equal accuracy or equal runtime. The second is to compare machine learning methods to an efficient numerical method. 

Of 82 journal articles studied, 76 claimed the machine learning method outperformed when compared to a numerical method. The researchers found that 79% of those articles touting a machine learning method as superior actually had a weak baseline, breaking at least one of those rules. Four of the journal articles claimed to underperform when compared to a numerical method, and two articles claimed to have similar or varied performance.

chart showing relationship between baselines and biases on samples

The researchers created the image above to convey the cumulative effects of weak baselines and reporting biases on samples. The circles or hexagons represent articles. Green indicates a positive result, for example, that the machine learning method was faster than the numerical method, while red represents a negative result. Column (a) shows what the system would likely look like if strong baselines were used and reporting bias was not an issue. Column (b) depicts the likely results without reporting bias. Column (c) shows the actual results seen in the published literature. (Image credit: Nick McGreivy / Princeton University)

“Very few articles reported worse performance with machine learning, not because machine learning almost always does better, but because researchers almost never publish articles where machine learning does worse,” McGreivy said.

McGreivy thinks low-bar comparisons are often driven by perverse incentives in academic publishing. “In order to get a paper accepted, it helps to have some impressive results. This incentivizes you to make your machine learning model work as well as possible, which is good. However, you can also get impressive results if the baseline method you’re comparing to doesn’t work very well. As a result, you aren’t incentivized to improve your baseline, which is bad,” he said. The net result is that researchers end up working hard on their models but not on finding the best possible numerical method as a baseline for comparison.

The researchers also found evidence of reporting biases, including publication bias and outcome reporting bias. Publication bias occurs when a researcher chooses not to publish their results after realizing that their machine learning model doesn’t perform better than a numerical method, while outcome reporting bias can involve discarding negative results from the analyses or using nonstandard measures of success that make machine learning models appear more successful. Collectively, reporting biases tend to suppress negative results and create an overall impression that machine learning is better at solving fluid-related PDEs than it is. 

“There’s a lot of hype in the field. Hopefully, our work lays guidelines for principled approaches to use machine learning to improve the state of the art,” Hakim said.

To overcome these systemic, cultural issues, Hakim argues that agencies funding research and large conferences should adopt policies to prevent the use of weak baselines or require a more detailed description of the baseline used and the reasons it was selected. “They need to encourage their researchers to be skeptical of their own results,” Hakim said. “If I find results that seem too good to be true, they probably are.” This work was completed with funding from DOE grants DE-AC02-09CH11466 and DE-AC02-09CH11466.

Ammar Hakim

PPPL is mastering the art of using plasma — the fourth state of matter — to solve some of the world's toughest science and technology challenges. Nestled on Princeton University’s Forrestal Campus in Plainsboro, New Jersey, our research ignites innovation in a range of applications including fusion energy, nanoscale fabrication, quantum materials and devices, and sustainability science. The University manages the Laboratory for the U.S. Department of Energy’s Office of Science, which is the nation’s single largest supporter of basic research in the physical sciences. Feel the heat at https://energy.gov/science  and  https://www.pppl.gov .  

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 October 2023

Forecasting the future of artificial intelligence with machine learning-based link prediction in an exponentially growing knowledge network

  • Mario Krenn   ORCID: orcid.org/0000-0003-1620-9207 1 ,
  • Lorenzo Buffoni 2 ,
  • Bruno Coutinho 2 ,
  • Sagi Eppel 3 ,
  • Jacob Gates Foster 4 ,
  • Andrew Gritsevskiy   ORCID: orcid.org/0000-0001-8138-8796 3 , 5 , 6 ,
  • Harlin Lee   ORCID: orcid.org/0000-0001-6128-9942 4 ,
  • Yichao Lu   ORCID: orcid.org/0009-0001-2005-1724 7 ,
  • João P. Moutinho 2 ,
  • Nima Sanjabi   ORCID: orcid.org/0009-0000-6342-5231 8 ,
  • Rishi Sonthalia   ORCID: orcid.org/0000-0002-0928-392X 4 ,
  • Ngoc Mai Tran 9 ,
  • Francisco Valente   ORCID: orcid.org/0000-0001-6964-9391 10 ,
  • Yangxinyu Xie   ORCID: orcid.org/0000-0002-1532-6746 11 ,
  • Rose Yu 12 &
  • Michael Kopp 6  

Nature Machine Intelligence volume  5 ,  pages 1326–1335 ( 2023 ) Cite this article

29k Accesses

17 Citations

1049 Altmetric

Metrics details

  • Complex networks
  • Computer science
  • Research data

A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could profoundly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over recent years, making it challenging for human researchers to keep track of the progress. Here we use AI techniques to predict the future research directions of AI itself. We introduce a graph-based benchmark based on real-world data—the Science4Cast benchmark, which aims to predict the future state of an evolving semantic network of AI. For that, we use more than 143,000 research papers and build up a knowledge network with more than 64,000 concept nodes. We then present ten diverse methods to tackle this task, ranging from pure statistical to pure learning methods. Surprisingly, the most powerful methods use a carefully curated set of network features, rather than an end-to-end AI approach. These results indicate a great potential that can be unleashed for purely ML approaches without human knowledge. Ultimately, better predictions of new future research directions will be a crucial component of more advanced research suggestion tools.

Similar content being viewed by others

latest research on machine learning

Learning on knowledge graph dynamics provides an early warning of impactful research

latest research on machine learning

TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery

latest research on machine learning

Accelerating science with human-aware artificial intelligence

The corpus of scientific literature grows at an ever-increasing speed. Specifically, in the field of artificial intelligence (AI) and machine learning (ML), the number of papers every month is growing exponentially with a doubling rate of roughly 23 months (Fig. 1 ). Simultaneously, the AI community is embracing diverse ideas from many disciplines such as mathematics, statistics and physics, making it challenging to organize different ideas and uncover new scientific connections. We envision a computer program that can automatically read, comprehend and act on AI literature. It can predict and suggest meaningful research ideas that transcend individual knowledge and cross-domain boundaries. If successful, it could greatly improve the productivity of AI researchers, open up new avenues of research and help drive progress in the field.

figure 1

The doubling rate of papers per month is roughly 23 months, which might lead to problems for publishing in these fields, at some point. The categories are cs.AI, cs.LG, cs.NE and stat.ML.

In this work, we address the ambitious vision of developing a data-driven approach to predict future research directions 1 . As new research ideas often emerge from connecting seemingly unrelated concepts 2 , 3 , 4 , we model the evolution of AI literature as a temporal network. We construct an evolving semantic network that encapsulates the content and development of AI research since 1994, with approximately 64,000 nodes (representing individual concepts) and 18 million edges (connecting jointly investigated concepts).

We use the semantic network as an input to ten diverse statistical and ML methods to predict the future evolution of the semantic network with high accuracy. That is, we can predict which combinations of concepts AI researchers will investigate in the future. Being able to predict what scientists will work on is a first crucial step for suggesting new topics that might have a high impact.

Several methods were contributions to the Science4Cast competition hosted by the 2021 IEEE International Conference on Big Data (IEEE BigData 2021). Broadly, we can divide the methods into two classes: methods that use hand-crafted network-theoretical features and those that automatically learn features. We found that models using carefully hand-crafted features outperform methods that attempt to learn features autonomously. This (somewhat surprising) finding indicates a great potential for improvements of models free of human priors.

Our paper introduces a real-world graph benchmark for AI, presents ten methods for solving it, and discusses how this task contributes to the larger goal of AI-driven research suggestions in AI and other disciplines. All methods are available at GitHub 5 .

Semantic networks

The goal here is to extract knowledge from the scientific literature that can subsequently be processed by computer algorithms. At first glance, a natural first step would be to use large language model (such as GPT3 6 , Gopher 7 , MegaTron 8 or PaLM 9 ) on each article to extract concepts and their relations automatically. However, these methods still struggle in reasoning capabilities 10 , 11 ; thus, it is not yet directly clear how these models can be used for identifying and suggesting new ideas and concept combinations.

Rzhetsky et al. 12 pioneered an alternative approach, creating semantic networks in biochemistry from co-occurring concepts in scientific papers. There, nodes represent scientific concepts, specifically biomolecules, and are linked when a paper mentions both in its title or abstract. This evolving network captures the field’s history and, using supercomputer simulations, provides insights into scientists’ collective behaviour and suggests more efficient research strategies 13 . Although creating semantic networks from concept co-occurrences extracts only a small amount of knowledge from each paper, it captures non-trivial and actionable content when applied to large datasets 2 , 4 , 13 , 14 , 15 . PaperRobot extends this approach by predicting new links from large medical knowledge graphs and formulating new ideas in human language as paper drafts 16 .

This approach was applied and extended to quantum physics 17 by building a semantic network of over 6,000 concepts. There, the authors (including one of us) formulated the prediction of new research trends and connections as an ML task, with the goal of identifying concept pairs not yet jointly discussed in the literature but likely to be investigated in the future. This prediction task was one component for personalized suggestions of new research ideas.

Link prediction in semantic networks

We formulate the prediction of future research topics as a link-prediction task in an exponentially growing semantic network in the AI field. The goal is to predict which unconnected nodes, representing scientific concepts not yet jointly researched, will be connected in the future.

Link prediction is a common problem in computer science, addressed with classical metrics and features, as well as ML techniques. Network theory-based methods include local motif-based approaches 18 , 19 , 20 , 21 , 22 , linear optimization 23 , global perturbations 24 and stochastic block models 25 . ML works optimized a combination of predictors 26 , with further discussion in a recent review 27 .

In ref. 17 , 17 hand-crafted features were used for this task. In the Science4Cast competition, the goal was to find more precise methods for link-prediction tasks in semantic networks (a semantic network of AI that is ten times larger than the one in ref. 17 ).

Potential for idea generation in science

The long-term goal of predictions and suggestions in semantic networks is to provide new ideas to individual researchers. In a way, we hope to build a creative artificial muse in science 28 . We can bias or constrain the model to give topic suggestions that are related to the research interest of individual scientists, or a pair of scientists to suggest topics for collaborations in an interdisciplinary setting.

Generation and analysis of the dataset

Dataset construction.

We create a dynamic semantic network using papers published on arXiv from 1992 to 2020 in the categories cs.AI, cs.LG, cs.NE and stat.ML. The 64,719 nodes represent AI concepts extracted from 143,000 paper titles and abstracts using Rapid Automatic Keyword Extraction (RAKE) and normalized via natural language processing (NLP) techniques and custom methods 29 . Although high-quality taxonomies such as the Computer Science Ontology (CSO) exist 30 , 31 , we choose not to use them for two reasons: the rapid growth of AI and ML may result in new concepts not yet in the CSO, and not all scientific domains have high-quality taxonomies like CSO. Our goal is to build a scalable approach applicable to any domain of science. However, future research could investigate merging these approaches (see ‘Extensions and future work’).

Concepts form the nodes of the semantic network, and edges are drawn when concepts co-appear in a paper title or abstract. Edges have time stamps based on the paper’s publication date, and multiple time-stamped edges between concepts are common. The network is edge-weighted, and the weight of an edge stands for the number of papers that connect two concepts. In total, this creates a time-evolving semantic network, depicted in Fig. 2 .

figure 2

Utilizing 143,000 AI and ML papers on arXiv from 1992 to 2020, we create a list of concepts using RAKE and other NLP tools, which form nodes in a semantic network. Edges connect concepts that co-occur in titles or abstracts, resulting in an evolving network that expands as more concepts are jointly investigated. The task involves predicting which unconnected nodes (concepts not yet studied together) will connect within a few years. We present ten diverse statistical and ML methods to address this challenge.

Network-theoretical analysis

The published semantic network has 64,719 nodes and 17,892,352 unique undirected edges, with a mean node degree of 553. Many hub nodes greatly exceed this mean degree, as shown in Fig. 3 , For example, the highest node degrees are 466,319 (neural network), 198,050 (deep learning), 195,345 (machine learning), 169,555 (convolutional neural network), 159,403 (real world), 150,227 (experimental result), 127,642 (deep neural network) and 115,334 (large scale). We fit a power-law curve to the degree distribution p ( k ) using ref. 32 and obtained p ( k )  ∝   k −2.28 for degree k  ≥ 1,672. However, real complex network degree distributions often follow power laws with exponential cut-offs 33 . Recent work 34 has indicated that lognormal distributions fit most real-world networks better than power laws. Likelihood ratio tests from ref. 32 suggest truncated power law ( P  = 0.0031), lognormal ( P  = 0.0045) and lognormal positive ( P  = 0.015) fit better than power law, while exponential ( P  = 3 × 10 −10 ) and stretched exponential ( P  = 6 × 10 −5 ) are worse. We couldn’t conclusively determine the best fit with P  ≤ 0.1.

figure 3

Nodes with the highest (466,319) and lowest (2) non-zero degrees are neural network and video compression technique, respectively. The most frequent non-zero degree is 64 (which occures 313 times). The plot, in log scale, omits 1,247 nodes with zero degrees.

We observe changes in network connectivity over time. Although degree distributions remained heavy-tailed, the ordering of nodes within the tail changed due to popularity trends. The most connected nodes and the years they became so include decision tree (1994), machine learning (1996), logic program (2000), neural network (2005), experimental result (2011), machine learning (2013, for a second time) and neural network (2015).

Connected component analysis in Fig. 4 reveals that the network grew more connected over time, with the largest group expanding and the number of connected components decreasing. Mid-sized connected components’ trajectories may expose trends, like image processing. A connected component with four nodes appeared in 1999 (brightness change, planar curve, local feature, differential invariant), and three more joined in 2000 (similarity transformation, template matching, invariant representation). In 2006, a paper discussing support vector machine and local feature merged this mid-sized group with the largest connected component.

figure 4

Primary (left, blue) vertical axis: number of connected components with more than one node. Secondary (right, orange) vertical axis: number of nodes in the largest connected component. For example, the network in 2019 comprises of one large connected component with 63,472 nodes and 1,247 isolated nodes, that is, nodes with no edges. However, the 2001 network has 19 connected components with size greater than one, the largest of which has 2,733 nodes.

The semantic network reveals increasing centralization over time, with a smaller percentage of nodes (concepts) contributing to a larger fraction of edges (concept combinations). Figure 5 shows that the fraction of edges for high-degree nodes rises, while it decreases for low-degree nodes. The decreasing average clustering coefficient over time supports this trend, suggesting nodes are more likely to connect to high-degree central nodes. This could be due to the AI community’s focus on a few dominating methods or more consistent terminology use.

figure 5

This cumulative histogram illustrates the fraction of nodes (concepts) corresponding to the fraction of edges (connections) for given years (1999, 2003, 2007, 2011, 2015 and 2019). The graph was generated by adding edges and nodes dated before each year. Nodes are sorted by increasing degrees. The y value at x  = 80 represents the fraction of edges contributed by all nodes in and below the 80th percentile of degrees.

Problem formulation

At the big picture, we aim to make predictions in an exponentially growing semantic network. The specific task involves predicting which two nodes v 1 and v 2 with degrees d ( v 1/ 2 ) ≥  c lacking an edge in the year (2021 −  δ ) will have w edges in 2021. We use δ  = 1, 3, 5, c  = 0, 5, 25 and w  = 1, 3, where c is a minimal degree. Note that c  = 0 is an intriguing special case where the nodes may not have an associated edge in the initial year, requiring the model to predict which nodes will connect to entirely new edges. The task w  = 3 goes beyond simple link prediction and seeks to identify uninvestigated concept pairs that will appear together in at least three papers. An interesting alternative task could be predicting the fastest-growing links, denoted as ‘trend’ prediction.

In this task, we provide a list of 10 million unconnected node pairs (each node having a degree ≥ c ) for the year (2021 −  δ ), with the goal of sorting this list by descending probability that they will have at least w edges in 2021.

For evaluation, we employ the receiver operating characteristic (ROC) curve 35 , which plots the true-positive rate against the false-positive rate at various threshold settings. We use the area under the curve (AUC) of the ROC curve as our evaluation metric. The advantage of AUC over mean square error is its independence from the data distribution. Specifically, in our case, where the two classes have a highly asymmetric distribution (with only about 1–3% of newly connected edges) and the distribution changes over time, AUC offers meaningful interpretation. Perfect predictions yield AUC = 1, whereas random predictions result in AUC = 0.5. AUC represents the percentage that a random true element is ranked higher than a random false one. For other metrics, see ref. 36 .

To tackle this task, models can use the complete information of the semantic network from the year (2021 −  δ ) in any way possible. In our case, all presented models generate a dataset for learning to make predictions from (2021 − 2 δ ) to (2021 −  δ ). Once the models successfully complete this task, they are applied to the test dataset to make predictions from (2021 −  δ ) to 2021. All reported AUCs are based on the test dataset. Note that solving the test dataset is especially challenging due to the δ -year shift, causing systematic changes such as the number of papers and density of the semantic network.

AI-based solutions

We demonstrate various methods to predict new links in a semantic network, ranging from pure statistical approaches and neural networks with hand-crafted features (NF) to ML models without NF. The results are shown in Fig. 6 , with the highest AUC scores achieved by methods using NF as ML model inputs. Pure network features without ML are competitive, while pure ML methods have yet to outperform those with NF. Predicting links generated at least three times can achieve a quasi-deterministic AUC > 99.5%, suggesting an interesting target for computational sociology and science of science research. We have performed numerous tests to exclude data leakage in the benchmark dataset, overfitting or data duplication both in the set of articles and the set of concepts. We rank methods based on their performance, with model M1 as the best performing and model M8 as the least effective (for the prediction of a new edge with δ  = 3, c  = 0). Models M4 and M7 are subdivided into M4A, M4B, M7A and M7B, differing in their focus on feature or embedding selection (more details in Methods ).

figure 6

Here we show the AUC values for different models that use machine learning techniques (ML), hand-crafted network features (NF) or a combination thereof. The left plot shows results for the prediction of a single new link (that is, w  = 1) and the right plot shows the results for the prediction of new triple links w  = 3. The task is to predict δ  = [1, 3, 5] years into the future, with cut-off values c  = [0, 5, 25]. We sort the models by the the results for the task ( w  = 1,  δ  = 3,  c  = 0), which was the task in the Science4Cast competition. Data points that are not shown have a AUC below 0.6 or are not computed due to computational costs. All AUC values reported are computed on a validation dataset δ years ahead of the training dataset that the models have never seen. Note that the prediction of new triple edges can be performed nearly deterministic. It will be interesting to understand the origin of this quasi-deterministic pattern in AI research, for example, by connecting it to the research interests of scientists 88 .

Model M1: NF + ML. This approach combines tree-based gradient boosting with graph neural networks, using extensive feature engineering to capture node centralities, proximity and temporal evolution 37 . The Light Gradient Boosting Machine (LightGBM) model 38 is employed with heavy regularization to combat overfitting due to the scarcity of positive examples, while a time-aware graph neural network learns dynamic node representations.

Model M2: NF + ML. This method utilizes node and edge features (as well as their first and second derivatives) to predict link formation probabilities 39 . Node features capture popularity, and edge features measure similarity. A multilayer perceptron with rectified linear unit (ReLU) activation is used for learning. Cold start issues are addressed with feature imputation.

Model M3: NF + ML. This method captures hand-crafted node features over multiple time snapshots and employs a long short-term memory (LSTM) to learn time dependencies 40 . The features were selected to be highly informative while having a low computational cost. The final configuration uses degree centrality, degree of neighbours and common neighbours as features. The LSTM outperforms fully connected neural networks.

Model M4: pure NF. Two purely statistical methods, preferential attachment 41 and common neighbours 27 , are used 42 . Preferential attachment is based on node degrees, while common neighbours relies on the number of shared neighbours. Both methods are computationally inexpensive and perform competitively with some learning-based models.

Model M5: NF + ML. Here, ten groups of first-order graph features are extracted to obtain neighbourhood and similarity properties, with principal component analysis 43 applied for dimensionality reduction 44 . A random forest classifier is trained on the balanced dataset to predict new links.

Model M6: NF + ML. The baseline solution uses 15 hand-crafted features as input to a four-layer neural network, predicting the probability of link formation between node pairs 17 .

Model M7: end-to-end ML (auto node embedding). The baseline solution is modified to use node2vec 45 and ProNE embeddings 46 instead of hand-crafted features. The embeddings are input to a neural network with two hidden layers for link prediction.

Model M8: end-to-end ML (transformers). This method learns features in an unsupervised manner using transformers 47 . Node2vec embeddings 45 , 48 are generated for various snapshots of the adjacency matrix, and a transformer model 49 is pre-trained as a feature extractor. A two-layer ReLU network is used for classification.

Extensions and future work

Developing an AI that suggests research topics to scientists is a complex task, and our link-prediction approach in temporal networks is just the beginning. We highlight key extensions and future work directly related to the ultimate goal of AI for AI.

High-quality predictions without feature engineering. Interestingly, the most effective methods utilized carefully crafted features on a graph with extracted concepts as nodes and edges representing their joint publication history. Investigating whether end-to-end deep learning can solve tasks without feature engineering will be a valuable next step.

Fully automated concept extraction. Current concept lists, generated by RAKE’s statistical text analysis, demand time-consuming code development to address irrelevant term extraction (for example, verbs, adjectives). A fully automated NLP technique that accurately extracts meaningful concepts without manual code intervention would greatly enhance the process.

Leveraging ontology taxonomies. Alongside fully automated concept extraction, utilizing established taxonomies such as the CSO 30 , 31 , Wikipedia-extracted concepts, book indices 17 or PhySH key phrases is crucial. Although not comprehensive for all domains, these curated datasets often contain hierarchical and relational concept information, greatly improving prediction tasks.

Incorporating relation extraction. Future work could explore relation extraction techniques for constructing more accurate, sparser semantic networks. By discerning and classifying meaningful concept relationships in abstracts 50 , 51 , a refined AI literature representation is attainable. Using NLP tools for entity recognition, relationship identification and classification, this approach may enhance prediction performance and novel research direction identification.

Generation of new concepts. Our work predicts links between known concepts, but generating new concepts using AI remains a challenge. This unsupervised task, as explored in refs. 52 , 53 , involves detecting concept clusters with dynamics that signal new concept formation. Incorporating emerging concepts into the current framework for suggesting research topics is an intriguing future direction.

Semantic information beyond concept pairs. Currently, abstracts and titles are compressed into concept pairs, but more comprehensive information extraction could yield meaningful predictions. Exploring complex data structures such as hypergraphs 54 may be computationally demanding, but clever tricks could reduce complexity, as shown in ref. 55 . Investigating sociological factors or drawing inspiration from material science approaches 56 may also improve prediction tasks. A recent dataset for the study of the science of science also includes more complex data structures than the ones used in our paper, including data from social networks such as Twitter 57 .

Predictions of scientific success. While predicting new links between concepts is valuable, assessing their potential impact is essential for high-quality suggestions. Introducing a metric of success, like estimated citation numbers or citation growth rate, can help gauge the importance of these connections. Adapting citation prediction techniques from the science of science 58 , 59 , 60 , 61 to semantic networks offers a promising research direction.

Anomaly detections. Predicting likely connections may not align with finding surprising research directions. One method for identifying surprising suggestions involves constraining cosine similarity between vertices 62 , which measures shared neighbours and can be associated with semantic (dis)similarity. Another approach is detecting anomalies in semantic networks, which are potential links with extreme properties 63 , 64 . While scientists often focus on familiar topics 3 , 4 , greater impact results from unexpected combinations of distant domains 12 , encouraging the search for surprising associations.

End-to-end formulation. Our method breaks down the goal of extracting knowledge from scientific literature into subtasks, contrasting with end-to-end deep learning that tackles problems directly without subproblems 65 , 66 . End-to-end approaches have shown great success in various domains 67 , 68 , 69 . Investigating whether such an end-to-end solution can achieve similar success in our context would be intriguing.

Our method represents a crucial step towards developing a tool that can assist scientists in uncovering novel avenues for exploration. We are confident that our outlined ideas and extensions pave the way for achieving practical, personalized, interdisciplinary AI-based suggestions for new impactful discoveries. We firmly believe that such a tool holds the potential to become a influential catalyst, transforming the way scientists approach research questions and collaborate in their respective fields.

Details on concept set generation and application

In this section, we provide details on the generation of our list of 64,719 concepts. For more information, the code is accessible on GitHub . The entire approach is designed for immediate scalability to other domains.

Initially, we utilized approximately 143,000 arXiv papers from the categories cs.AI, cs.LG, cs.NE and stat.ML spanning 1992 to 2020. The omission of earlier data has a negligible effect on our research question, as we show below. We then iterated over each individual article, employing RAKE (with an extended stopword list) to suggest concept candidates, which were subsequently stored.

Following the iteration, we retained concepts composed of at least two words (for example, neural network) appearing in six or more articles, as well as concepts comprising a minimum of three words (for example, recurrent neural network) appearing in three or more articles. This initial filter substantially reduced noise generated by RAKE, resulting in a list of 104,948 concepts.

Lastly, we developed an automated filtering tool to further enhance the quality of the concept list. This tool identified common, domain-independent errors made by RAKE, which primarily included phrases that were not concepts (for example, dataset provided or discuss open challenge). We compiled a list of 543 words not part of meaningful concepts, including verbs, ordinal numbers, conjunctions and adverbials. Ultimately, this process produced our final list of 64,719 concepts employed in our study. No further semantic concept/entity linking is applied.

By this construction, the test sets with c  = 0 could lead to very rare contamination of the dataset. That is because each concept will have at least one edge in the final dataset. The effects, however, are negligible.

The distribution of concepts in the articles can be seen in Extended Data Fig. 1 . As an example, we show the extraction of concepts from five randomly chosen papers:

Memristor hardware-friendly reinforcement learning 70 : ‘actor critic algorithm’, ‘neuromorphic hardware implementation’, ‘hardware neural network’, ‘neuromorphic hardware system’, ‘neural network’, ‘large number’, ‘reinforcement learning’, ‘case study’, ‘pre training’, ‘training procedure’, ‘complex task’, ‘high performance’, ‘classical problem’, ‘hardware implementation’, ‘synaptic weight’, ‘energy efficient’, ‘neuromorphic hardware’, ‘control theory’, ‘weight update’, ‘training technique’, ‘actor critic’, ‘nervous system’, ‘inverted pendulum’, ‘explicit supervision’, ‘hardware friendly’, ‘neuromorphic architecture’, ‘hardware system’.

Automated deep learning analysis of angiography video sequences for coronary artery disease 71 : ‘deep learning approach’, ‘coronary artery disease’, ‘deep learning analysis’, ‘traditional image processing’, ‘deep learning’, ‘image processing’, ‘f1 score’, ‘video sequence’, ‘error rate’, ‘automated analysis’, ‘coronary artery’, ‘vessel segmentation’, ‘key frame’, ‘visual assessment’, ‘analysis method’, ‘analysis pipeline’, ‘coronary angiography’, ‘geometrical analysis’.

Demographic influences on contemporary art with unsupervised style embeddings 72 : ‘classification task’, ‘social network’, ‘data source’, ‘visual content’, ‘graph network’, ‘demographic information’, ‘social connection’, ‘visual style’, ‘historical dataset’, ‘novel information’

The utility of general domain transfer learning for medical language tasks 73 : ‘natural language processing’, ‘long short term memory’, ‘logistic regression model’, ‘transfer learning technique’, ‘short term memory’, ‘average f1 score’, ‘class classification model’, ‘domain transfer learning’, ‘weighted average f1 score’, ‘medical natural language processing’, ‘natural language process’, ‘transfer learning’, ‘f1 score’, ’natural language’, ’deep model’, ’logistic regression’, ’model performance’, ’classification model’, ’text classification’, ’regression model’, ’nlp task’, ‘short term’, ‘medical domain’, ‘weighted average’, ‘class classification’, ‘bert model’, ‘language processing’, ‘biomedical domain’, ‘domain transfer’, ‘nlp model’, ‘main model’, ‘general domain’, ‘domain model’, ‘medical text’.

Fast neural architecture construction using envelopenets 74 : ‘neural network architecture’, ‘neural architecture search’, ‘deep network architecture’, ‘image classification problem’, ‘neural architecture search method’, ‘neural network’, ‘reinforcement learning’, ‘deep network’, ‘image classification’, ‘objective function’, ‘network architecture’, ‘classification problem’, ‘evolutionary algorithm’, ‘neural architecture’, ‘base network’, ‘architecture search’, ‘training epoch’, ‘search method’, ‘image class’, ‘full training’, ‘automated search’, ‘generated network’, ‘constructed network’, ‘gpu day’.

Time gap between the generation of edges

We use articles from arXiv, which only goes back to the year 1992. However, of course, the field of AI exists at least since the 1960s 75 . Thus, this raises the question whether the omission of the first 30–40 years of research has a crucial impact in the prediction task we formulate, specifically, whether edges that we consider as new might not be so new after all. Thus, in Extended Data Fig. 2 , we compute the time between the formation of edges between the same concepts, taking into account all or just the first edge. We see that the vast majority of edges are formed within short time periods, thus the effect of omission of early publication has a negligible effect for our question. Of course, different questions might be crucially impacted by the early data; thus, a careful choice of the data source is crucial 61 .

Positive examples in the test dataset

Table 1 shows the number of positive cases within the 10 million examples in the 18 test datasets that are used for evaluation.

Publication rates in quantum physics

Another field of research that gained a lot of attention in the recent years is quantum physics. This field is also a strong adopter of arXiv. Thus, we analyse in the same way as for AI in Fig. 1 . We find in Extended Data Fig. 3 no obvious exponential increase in papers per month. A detailed analysis of other domains is beyond the current scope. It will be interesting to investigate the growth rates in different scientific disciplines in more detail, especially given that exponential increase has been observed in several aspects of the science of science 3 , 76 .

Details on models M1–M8

What follows are more detailed explanations of the models presented in the main text. All codes are available at GitHub. The feature importance of the best model M1 is shown here, those of other models are analysed in the respective workshop contributions (cited in the subsections).

Details on M1

The best-performing solution is based on a blend of a tree-based gradient boosting approach and a graph neural network approach 37 . Extensive feature engineering was conducted to capture the centralities of the nodes, the proximity between node pairs and their evolution over time. The centrality of a node is captured by the number of neighbours and the PageRank score 77 , while the proximity between a node pair is derived using the Jaccard index. We refer the reader to ref. 37 for the list of all features and their feature importance.

The tree-based gradient boosting approach uses LightGBM 38 and applies heavy regularization to combat overfitting due to the scarcity of positive samples. The graph neural network approach employs a time-aware graph neural network to learn node representations on dynamic semantic networks. The feature importance of model M1, averaged over 18 datasets, is shown in Table 2 . It shows that the temporal features do contribute largely to the model performance, but the model remains strong even when they are removed. An example of the evolution of the training (from 2016 to 2019) and test set (2019 to 2021) for δ  = 3, c  = 25, ω  = 1 is shown in Extended Data Fig. 4 .

Details on M2

The second method assumes that the probability that nodes u and v form an edge in the future is a function of the node features f ( u ), f ( v ) and some edge feature h ( u ,  v ). We chose node features f that capture popularity at the current time t 0 (such as degree, clustering coefficient 78 , 79 and PageRank 77 ). We also use these features’ first and second time derivatives to capture the evolution of the node’s popularity over time. After variable selection during training, we chose h to consist of the HOP-rec score (high-order proximity for implicit recommendation) 80 , 81 and a variation of the Dice similarity score 82 as a measure of similarity between nodes. In summary, we use 31 node features for each node, and two edge features, which gives 31 × 2 + 2 = 64 features in total. These features are then fed into a small multilayer perceptron (5 layers, each with 13 neurons) with ReLU activation.

Cold start is the problem that some nodes in the test set do not appear in the training set. Our strategy for a cold start is imputation. We say a node v is seen if it appeared in the training data, and unseen otherwise; similarly, we say that a node is born at time t if t is the first time stamp where an edge linking this node has appeared. The idea is that an unseen node is simply a node born in the future, so its features should look like a recently born node in the training set. If a node is unseen, then we impute its features as the average of the features of the nodes born recently. We found that with imputation during training, the test AUC scores across all models consistently increased by about 0.02. For a complete description of this method, we refer the reader to ref. 39 .

Details on M3

This approach, detailed in ref. 40 , uses hand-crafted node features that have been captured in multiple time snapshots (for example, every year) and then uses an LSTM to benefit from learning the time dependencies of these features. The final configuration uses two main types of feature: node features including degree and degree of neighbours, and edge features including common neighbours. In addition, to balance the training data, the same number of positive and negative instances have been randomly sampled and combined.

One of the goals was to identify features that are very informative with a very low computational cost. We found that the degree centrality of the nodes is the most important feature, and the degree centrality of the neighbouring nodes and the degree of mutual neighbours gave us the best trade-off. As all of the extracted features’ distributions are highly skewed to the right, meaning most of the features take near zero values, using a power transform such as Yeo–Johnson 83 helps to make the distributions more Gaussian, which boosts the learning. Finally, for the link-prediction task, we saw that LSTMs perform better than fully connected neural networks.

Details on M4

The following two methods are based on a purely statistical analysis of the test data and are explained in detail in ref. 42 .

Preferential attachment. In the network analysis, we concluded that the growth of this dataset tends to maintain a heavy-tailed degree distribution, often associated with scale-free networks. As mentioned before the γ value of the degree distribution is very close to 2, suggesting that preferential attachment 41 is probably the main organizational principle of the network. As such, we implemented a simple prediction model following this procedure. Preferential attachment scores in link prediction are often quantified as

with k i , j the degree of nodes i and j . However, this assumes the scoring of links between nodes that are already connected to the network, that is k i , j  > 0, which is not the case for all the links we must score in the dataset. As a result, we define our preferential attachment model as

Using this simple model with no free parameters we could score new links and compare them with the other models. Immediately we note that preferential attachment outperforms some learning-based models, even if it never manages to reach the top AUC, but it is extremely simple and with negligible computational cost.

Common neighbours. We explore another network-based approach to score the links. Indeed, while the preferential attachment model we derived performed well, it uses no information about the distance between i and j , which is a popular feature used in link-prediction methods 27 . As such, we decided to test a method known as common neighbours 18 . We define Γ ( i ) as the set of neighbors of node i and Γ ( i ) ∩  Γ ( j ) as the set of common neighbours between nodes i and j . We can easily score the nodes with

the intuition being that nodes that share a larger number of neighbours are more likely to be connected than distant nodes that do not share any.

Evaluating this score for each pair ( i ,  j ) on the dataset of unconnected pairs, which can be computed as the second power of the adjacency matrix, A 2 , we obtained an AUC that is sometimes higher than preferential attachment and sometimes lower than it but is still consistently quite close with the best learning-based models.

Details on M5

This method is based on ref. 44 . First, ten groups of first-order graph features are extracted to get some neighbourhood and similarity properties from each pair of nodes: degree centrality of nodes, pair’s total number of neighbours, common neighbours index, Jaccard coefficient, Simpson coefficient, geometric coefficient, cosine coefficient, Adamic–Adar index, resource allocation index and preferential attachment index. They are obtained for three consecutive years to capture the temporal dynamics of the semantic network, leading to a total of 33 features. Second, principal component analysis 43 is applied to reduce the correlation between features, speed up the learning process and improve generalization, which results in a final set of seven latent variables. Lastly, a random forest classifier is trained (using a balanced dataset) to estimate the likelihood of new links between the AI concepts.

In this paper, a modification was performed in relation to the original formulation of the method 44 : two of the original features, average neighbour degree and clustering coefficient, were infeasible to extract for some of the tasks covered in this paper, as their computation can be heavy for such a very large network, and they were discarded. Due to some computational memory issues, it was not possible to run the model for some of the tasks covered in this study, and so those results are missing.

Details on M6

The baseline solution for the Science4Cast competition was closely related to the model presented in ref. 17 . It uses 15 hand-crafted features of a pair of nodes v 1 and v 2 . (Degrees of v 1 and v 2 in the current year and previous two years are six properties. The number of shared neighbours in total of v 1 and v 2 in the current year and previous two years are six properties. The number of shared neighbours between v 1 and v 2 in the current year and the previous two years are three properties). These 15 features are the input of a neural network with four layers (15, 100, 10 and 1 neurons), intending to predict whether the nodes v 1 and v 2 will have w edges in the future. After the training, the model computes the probability for all 10 million evaluation examples. This list is sorted and the AUC is computed.

Details on M7

The solution M7 was not part of the Science4Cast competition and therefore not described in the corresponding proceedings, thus we want to add more details.

The most immediate way one can apply ML to this problem is by automating the detection of features. Quite simply, the baseline solution M6 is modified such that instead of 15 hand-crafted features, the neural network is instead trained on features extracted from a graph embedding. We use two different embedding approaches. The first method is employs node2vec (M7A) 45 , for which we use the implementations provided in the nodevectors Python package 84 . The second one uses the ProNE embedding (M7B) 46 , which is based on sparse matrix factorizations modulated by the higher-order Cheeger inequality 85 .

The embeddings generate a 32-dimensional representation for each node, resulting in edge representations in [0, 1] 64 . These features are input into a neural network with two hidden layers of size 1,000 and 30. Like M6, the model computes the probability for evaluation examples to determine the ROC. We compare ProNE to node2vec, a common graph embedding method using a biased random walk procedure with return and in–out parameters, which greatly affect network encoding. Initial experiments used default values for a 64-dimensional encoding before inputting into the neural network. The higher variance in node2vec predictions is probably due to its sensitivity to hyperparameters. While ProNE is better suited for general multi-dataset link prediction, node2vec’s sensitivity may help identify crucial network features for predicting temporal evolution.

Details on M8

This model, which is detailed in ref. 47 , does not use any hand-crafted features but learns them in a completely unsupervised manner. To do so, we extract various snapshots of the adjacency matrix through time, capturing graphs in the form of A t for t  = 1994, …, 2019. We then embed each of these graphs into 128-dimensional Euclidean space via node2vec 45 , 48 . For each node u in the semantic graph, we extract different 128-dimensional vector embeddings n u ( A 1994 ), …,  n u ( A 2019 ).

Transformers have performed extremely well in NLP tasks 49 ; thus, we apply them to learn the dynamics of the embedding vectors. We pre-train a transformer to help classify node pairs. For the transformer, the encoder and decoder had 6 layers each; we used 128 as the embedding dimension, 2,048 as the feed-forward dimension and 8-headed attention. This transformer acts as our feature extractor. Once we pre-train our transformer, we add a two-layer ReLU network with hidden dimension 128 as a classifier on top.

Data availability

All 18 datasets tested in this paper are available via Zenodo at https://doi.org/10.5281/zenodo.7882892 ref. 86 .

Code availability

All of the models and codes described above can be found via GitHub at https://github.com/artificial-scientist-lab/FutureOfAIviaAI ref. 5 and a permanent Zenodo record at https://zenodo.org/record/8329701 ref. 87 .

Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355 , 477–480 (2017).

Article   Google Scholar  

Evans, J. A. & Foster, J. G. Metaknowledge. Science 331 , 721–725 (2011).

Article   MathSciNet   MATH   Google Scholar  

Fortunato, S. et al. Science of science. Science 359 , eaao0185 (2018).

Wang, D. & Barabási, A.-L. The Science of Science (Cambridge Univ. Press, 2021).

Krenn, M. et al. FutureOfAIviaAI. GitHub https://github.com/artificial-scientist-lab/FutureOfAIviaAI (2023).

Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33 , 1877–1901 (2020).

Google Scholar  

Rae, J. W. et al. Scaling language models: methods, analysis & insights from training gopher. Preprint at https://arxiv.org/abs/2112.11446 (2021).

Smith, S. et al. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model. Preprint at https://arxiv.org/abs/2201.11990 (2022).

Chowdhery, A. et al. Palm: scaling language modeling with pathways. Preprint at https://arxiv.org/abs/2204.02311 (2022).

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. Preprint at https://arxiv.org/abs/2205.11916 (2022).

Zhang, H., Li, L. H., Meng, T., Chang, K.-W. & Broeck, G. V. d. On the paradox of learning to reason from data. Preprint at https://arxiv.org/abs/2205.11502 (2022).

Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA 112 , 14569–14574 (2015).

Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Sociol. Rev. 80 , 875–908 (2015).

Van Eck, N. J. & Waltman, L. Text mining and visualization using vosviewer. Preprint at https://arxiv.org/abs/1109.2058 (2011).

Van Eck, N. J. & Waltman, L. in Measuring Scholarly Impact: Methods and Practice (eds Ding, Y. et al.) 285–320 (Springer, 2014).

Wang, Q. et al. Paperrobot: Incremental draft generation of scientific ideas. Preprint at https://arxiv.org/abs/1905.07870 (2019).

Krenn, M. & Zeilinger, A. Predicting research trends with semantic and neural networks with an application in quantum physics. Proc. Natl Acad. Sci. USA 117 , 1910–1916 (2020).

Liben-Nowell, D. & Kleinberg, J. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58 , 1019–1031 (2007).

Albert, I. & Albert, R. Conserved network motifs allow protein–protein interaction prediction. Bioinformatics 20 , 3346–3352 (2004).

Zhou, T., Lü, L. & Zhang, Y.-C. Predicting missing links via local information. Eur. Phys. J. B 71 , 623–630 (2009).

Article   MATH   Google Scholar  

Kovács, I. A. et al. Network-based prediction of protein interactions. Nat. Commun. 10 , 1240 (2019).

Muscoloni, A., Abdelhamid, I. & Cannistraci, C. V. Local-community network automata modelling based on length-three-paths for prediction of complex network structures in protein interactomes, food webs and more. Preprint at bioRxiv https://doi.org/10.1101/346916 (2018).

Pech, R., Hao, D., Lee, Y.-L., Yuan, Y. & Zhou, T. Link prediction via linear optimization. Physica A 528 , 121319 (2019).

Lü, L., Pan, L., Zhou, T., Zhang, Y.-C. & Stanley, H. E. Toward link predictability of complex networks. Proc. Natl Acad. Sci. USA 112 , 2325–2330 (2015).

Guimerà, R. & Sales-Pardo, M. Missing and spurious interactions and the reconstruction of complex networks. Proc. Natl Acad. Sci. USA 106 , 22073–22078 (2009).

Ghasemian, A., Hosseinmardi, H., Galstyan, A., Airoldi, E. M. & Clauset, A. Stacking models for nearly optimal link prediction in complex networks. Proc. Natl Acad. Sci. USA 117 , 23393–23400 (2020).

Zhou, T. Progresses and challenges in link prediction. iScience 24 , 103217 (2021).

Krenn, M. et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 4 , 761–769 (2022).

Rose, S., Engel, D., Cramer, N. & Cowley, W. in Text Mining: Applications and Theory (eds Berry, M. W. & Kogan, J.) Ch. 1 (Wiley, 2010).

Salatino, A. A., Thanapalasingam, T., Mannocci, A., Osborne, F. & Motta, E. The computer science ontology: a large-scale taxonomy of research areas. In Proc. Semantic Web–ISWC 2018: 17th International Semantic Web Conference Part II Vol. 17, 187–205 (Springer, 2018).

Salatino, A. A., Osborne, F., Thanapalasingam, T. & Motta, E. The CSO classifier: ontology-driven detection of research topics in scholarly articles. In Proc. Digital Libraries for Open Knowledge: 23rd International Conference on Theory and Practice of Digital Libraries Vol. 23, 296–311 (Springer, 2019).

Alstott, J., Bullmore, E. & Plenz, D. powerlaw: a Python package for analysis of heavy-tailed distributions. PLoS ONE 9 , e85777 (2014).

Fenner, T., Levene, M. & Loizou, G. A model for collaboration networks giving rise to a power-law distribution with an exponential cutoff. Soc. Netw. 29 , 70–80 (2007).

Broido, A. D. & Clauset, A. Scale-free networks are rare. Nat. Commun. 10 , 1017 (2019).

Fawcett, T. ROC graphs: notes and practical considerations for researchers. Pattern Recognit. Lett. 31 , 1–38 (2004).

Sun, Y., Wong, A. K. & Kamel, M. S. Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23 , 687–719 (2009).

Lu, Y. Predicting research trends in artificial intelligence with gradient boosting decision trees and time-aware graph neural networks. In 2021 IEEE International Conference on Big Data (Big Data) 5809–5814 (IEEE, 2021).

Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Proc. 31st International Conference on Neural Information Processing Systems 3149–3157 (Curran Associates Inc., 2017).

Tran, N. M. & Xie, Y. Improving random walk rankings with feature selection and imputation Science4Cast competition, team Hash Brown. In 2021 IEEE International Conference on Big Data (Big Data) 5824–5827 (IEEE, 2021).

Sanjabi, N. Efficiently predicting scientific trends using node centrality measures of a science semantic network. In 2021 IEEE International Conference on Big Data (Big Data) 5820–5823 (IEEE, 2021).

Barabási, A.-L. Network science. Phil. Trans. R. Soci. A 371 , 20120375 (2013).

Moutinho, J. P., Coutinho, B. & Buffoni, L. Network-based link prediction of scientific concepts—a Science4Cast competition entry. In 2021 IEEE International Conference on Big Data (Big Data) 5815–5819 (IEEE, 2021).

Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A 374 , 20150202 (2016).

Valente, F. Link prediction of artificial intelligence concepts using low computational power. In 2021 IEEE International Conference on Big Data (Big Data) 5828–5832 (2021).

Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 (ACM, 2016).

Zhang, J., Dong, Y., Wang, Y., Tang, J. & Ding, M. ProNE: fast and scalable network representation learning. In Proc. Twenty-Eighth International Joint Conference on Artificial Intelligence 4278–4284 (International Joint Conferences on Artificial Intelligence Organization, 2019).

Lee, H., Sonthalia, R. & Foster, J. G. Dynamic embedding-based methods for link prediction in machine learning semantic network. In 2021 IEEE International Conference on Big Data (Big Data) 5801–5808 (IEEE, 2021).

Liu, R. & Krishnan, A. PecanPy: a fast, efficient and parallelized python implementation of node2vec. Bioinformatics 37 , 3377–3379 (2021).

Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems 6000–6010 (Curran Associates Inc., 2017).

Zelenko, D., Aone, C. & Richardella, A. Kernel methods for relation extraction. J. Mach. Learn. Res. 3 , 1083–1106 (2003).

MathSciNet   MATH   Google Scholar  

Bach, N. & Badaskar, S. A review of relation extraction. Literature Review for Language and Statistics II 2 , 1–15 (2007).

Salatino, A. A., Osborne, F. & Motta, E. How are topics born? Understanding the research dynamics preceding the emergence of new areas. PeerJ Comput. Sc. 3 , e119 (2017).

Salatino, A. A., Osborne, F. & Motta, E. AUGUR: forecasting the emergence of new research topics. In Proc. 18th ACM/IEEE on Joint Conference on Digital Libraries 303–312 (IEEE, 2018).

Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys. 17 , 1093–1098 (2021).

Coutinho, B. C., Wu, A.-K., Zhou, H.-J. & Liu, Y.-Y. Covering problems and core percolations on hypergraphs. Phys. Rev. Lett. 124 , 248301 (2020).

Article   MathSciNet   Google Scholar  

Olivetti, E. A. et al. Data-driven materials research enabled by natural language processing and information extraction. Appl. Phys. Rev. 7 , 041317 (2020).

Lin, Z., Yin, Y., Liu, L. & Wang, D. SciSciNet: a large-scale open data lake for the science of science research. Sci. Data 10 , 315 (2023).

Azoulay, P. et al. Toward a more scientific science. Science 361 , 1194–1197 (2018).

Liu, H., Kou, H., Yan, C. & Qi, L. Link prediction in paper citation network to construct paper correlation graph. EURASIP J. Wirel. Commun. Netw. 2019 , 1–12 (2019).

Reisz, N. et al. Loss of sustainability in scientific work. New J. Phys. 24 , 053041 (2022).

Frank, M. R., Wang, D., Cebrian, M. & Rahwan, I. The evolution of citation graphs in artificial intelligence research. Nat. Mach. Intell. 1 , 79–85 (2019).

Newman, M. Networks (Oxford Univ. Press, 2018).

Kwon, D. et al. A survey of deep learning-based network anomaly detection. Cluster Comput. 22 , 949–961 (2019).

Pang, G., Shen, C., Cao, L. & Hengel, A. V. D. Deep learning for anomaly detection: a review. ACM Comput. Surv. 54 , 1–38 (2021).

Collobert, R. et al. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 , 2493–2537 (2011).

MATH   Google Scholar  

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60 , 84–90 (2017).

Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518 , 529–533 (2015).

Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529 , 484–489 (2016).

Wu, N., Vincent, A., Strukov, D. & Xie, Y. Memristor hardware-friendly reinforcement learning. Preprint at https://arxiv.org/abs/2001.06930 (2020).

Zhou, C. et al. Automated deep learning analysis of angiography video sequences for coronary artery disease. Preprint at https://arxiv.org/abs/2101.12505 (2021).

Huckle, N., Garcia, N. & Nakashima, Y. Demographic influences on contemporary art with unsupervised style embeddings. In Proc. Computer Vision–ECCV 2020 Workshops Part II Vol. 16, 126–142 (Springer, 2020).

Ranti, D. et al. The utility of general domain transfer learning for medical language tasks. Preprint at https://arxiv.org/abs/2002.06670 (2020).

Kamath, P., Singh, A. & Dutta, D. Fast neural architecture construction using envelopenets. Preprint at https://arxiv.org/abs/1803.06744 (2018).

Minsky, M. Steps toward artificial intelligence. Proc. IRE 49 , 8–30 (1961).

Bornmann, L., Haunschild, R. & Mutz, R. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanit. Soc. Sci. Commun. 8 , 224 (2021).

Brin, S. & Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30 , 107–117 (1998).

Holland, P. W. & Leinhardt, S. Transitivity in structural models of small groups. Comp. Group Studies 2 , 107–124 (1971).

Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393 , 440–442 (1998).

Yang, J.-H., Chen, C.-M., Wang, C.-J. & Tsai, M.-F. HOP-rec: high-order proximity for implicit recommendation. In Proc. 12th ACM Conference on Recommender Systems 140–144 (2018).

Lin, B.-Y. OGB_collab_project. GitHub https://github.com/brucenccu/OGB_collab_project (2021).

Sorensen, T. A. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons. Biol. Skar. 5 , 1–34 (1948).

Yeo, I.-K. & Johnson, R. A. A new family of power transformations to improve normality or symmetry. Biometrika 87 , 954–959 (2000).

Ranger, M. nodevectors. GitHub https://github.com/VHRanger/nodevectors (2021).

Bandeira, A. S., Singer, A. & Spielman, D. A. A Cheeger inequality for the graph connection Laplacian. SIAM J. Matrix Anal. Appl. 34 , 1611–1630 (2013).

Krenn, M. et al. Predicting the future of AI with AI. Zenodo https://doi.org/10.5281/zenodo.7882892 (2023).

Krenn, M. et al. FutureOfAIviaAI code. Zenodo https://zenodo.org/record/8329701 (2023).

Jia, T., Wang, D. & Szymanski, B. K. Quantifying patterns of research-interest evolution. Nat. Hum. Behav. 1 , 0078 (2017).

Download references

Acknowledgements

We thank IARAI Vienna and IEEE for supporting and hosting the IEEE BigData Competition Science4Cast. We are specifically grateful to D. Kreil, M. Neun, C. Eichenberger, M. Spanring, H. Martin, D. Geschke, D. Springer, P. Herruzo, M. McCutchan, A. Mihai, T. Furdui, G. Fratica, M. Vázquez, A. Gruca, J. Brandstetter and S. Hochreiter for helping to set up and successfully execute the competition and the corresponding workshop. We thank X. Gu for creating Fig. 2 , and M. Aghajohari and M. Sadegh Akhondzadeh for helpful comments on the paper. The work of H.L., R.S. and J.G.F. was supported by grant TWCF0333 from the Templeton World Charity Foundation. H.L. is additionally supported by NSF grant DMS-1952339. J.P.M. acknowledges the support of FCT (Portugal) through scholarship SFRH/BD/144151/2019. B.C. thanks the support from FCT/MCTES through national funds and when applicable co-funded EU funds under the project UIDB/50008/2020, and FCT through the project CEECINST/00117/2018/CP1495/CT0001. N.M.T. and Y.X. are supported by NSF grant DMS-2113468, the NSF IFML 2019844 award to the University of Texas at Austin, and the Good Systems Research Initiative, part of University of Texas at Austin Bridging Barriers.

Open access funding provided by Max Planck Society.

Author information

Authors and affiliations.

Max Planck Institute for the Science of Light (MPL), Erlangen, Germany

Mario Krenn

Instituto de Telecomunicações, Lisbon, Portugal

Lorenzo Buffoni, Bruno Coutinho & João P. Moutinho

University of Toronto, Toronto, Ontario, Canada

Sagi Eppel & Andrew Gritsevskiy

University of California Los Angeles, Los Angeles, CA, USA

Jacob Gates Foster, Harlin Lee & Rishi Sonthalia

Cavendish Laboratories, Cavendish, VT, USA

Andrew Gritsevskiy

Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria

Andrew Gritsevskiy & Michael Kopp

Alpha 8 AI, Toronto, Ontario, Canada

Independent Researcher, Barcelona, Spain

Nima Sanjabi

University of Texas at Austin, Austin, TX, USA

Ngoc Mai Tran

Independent Researcher, Leiria, Portugal

Francisco Valente

University of Pennsylvania, Philadelphia, PA, USA

Yangxinyu Xie

University of California, San Diego, CA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

M. Krenn and R.Y. initiated the research. M. Krenn and M. Kopp organized the Science4Cast competition. M. Krenn generated the datasets and initial codes. S.E. and H.L. analysed the network-theoretical properties of the semantic network. M. Krenn, L.B., B.C., J.G.F., A.G, H.L., Y.L, J.P.M, N.S., R.S., N.M.T, F.V., Y.X and M. Kopp provided codes for the ten models. M. Krenn wrote the paper with input from all co-authors.

Corresponding author

Correspondence to Mario Krenn .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Machine Intelligence thanks Alexander Belikov, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Mirko Pieropan, in collaboration with the Nature Machine Intelligence team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1.

Number of concepts per article.

Extended Data Fig. 2

Time Gap between the generation of edges. Here, left shows the time it takes to create a new edge between two vertices and right shows the time between the first and the second edge.

Extended Data Fig. 3

Publications in Quantum Physics.

Extended Data Fig. 4

Evolution of the AUC during training for Model M1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Krenn, M., Buffoni, L., Coutinho, B. et al. Forecasting the future of artificial intelligence with machine learning-based link prediction in an exponentially growing knowledge network. Nat Mach Intell 5 , 1326–1335 (2023). https://doi.org/10.1038/s42256-023-00735-0

Download citation

Received : 21 January 2023

Accepted : 11 September 2023

Published : 16 October 2023

Issue Date : November 2023

DOI : https://doi.org/10.1038/s42256-023-00735-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approach.

  • Uchenna Akujuobi
  • Priyadarshini Kumari
  • Tarek R. Besold

Artificial Intelligence Review (2024)

A commentary on transformative consumer research: Musings on its genesis, evolution, and opportunity for scientific specialization

  • Martin Mende
  • David Glen Mick

AMS Review (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

latest research on machine learning

Precision molecular insights for prostate cancer prognosis: tumor immune microenvironment and cell death analysis of senescence-related genes by machine learning and single-cell analysis

  • Open access
  • Published: 27 September 2024
  • Volume 15 , article number  487 , ( 2024 )

Cite this article

You have full access to this open access article

latest research on machine learning

  • Yuni Wu 1 ,
  • Jing Wang 3 &
  • Zhibin Luo 1  

Prostate cancer (PCa) is a prevalent malignancy among men, primarily originating from the prostate epithelium. It ranks first in global cancer incidence and second in mortality rates, with a rising trend in China. PCa's subtle initial symptoms, such as urinary issues, necessitate diagnostic measures like digital rectal examination, prostate-specific antigen (PSA) testing, and tissue biopsy. Advanced PCa management typically involves a multifaceted approach encompassing surgery, radiation, chemotherapy, and hormonal therapy. The involvement of aging genes in PCa development and progression, particularly through the mTOR pathway, has garnered increasing attention.

This study aimed to explore the association between aging genes and biochemical PCa recurrence and construct predictive models. Utilizing public gene expression datasets (GSE70768, GSE116918, and TCGA), we conducted extensive analyses, including Cox regression, functional enrichment, immune cell infiltration estimation, and drug sensitivity assessments. The constructed risk score model, based on aging-related genes (ARGs), demonstrated superior predictive capability for PCa prognosis compared to conventional clinical features. High-risk genes positively correlated with risk, while low-risk genes displayed a negative correlation.

An ARGs-based risk score model was developed and validated for predicting prognosis in prostate adenocarcinoma (PRAD) patients. LASSO regression analysis and cross-validation plots were employed to select ARGs with prognostic significance. The risk score outperformed traditional clinicopathological features in predicting PRAD prognosis, as evidenced by its high AUC (0.787). The model demonstrated good sensitivity and specificity, with AUC values of 0.67, 0.675, 0.696, and 0.696 at 1, 3, 5, and 8 years, respectively, in the GEO cohort. Similar AUC values were observed in the TCGA cohort at 1, 3, and 5 years (0.67, 0.659, 0.667, and 0.743). The model included 12 genes, with high-risk genes positively correlated with risk and low-risk genes negatively correlated.

Conclusions

This study presents a robust ARGs-based risk score model for predicting biochemical recurrence in PCa patients, highlighting the potential significance of aging genes in PCa prognosis and offering enhanced predictive accuracy compared to traditional clinical parameters. These findings open new avenues for research on PCa recurrence prediction and therapeutic strategies.

Avoid common mistakes on your manuscript.

1 Introduction

PCa is one of the most common malignant tumors in men, mostly caused by malignant tumors of the prostate epithelium [ 1 ]. From the latest global cancer statistics in 2019, we know that the incidence of PCa is ranked first and the death rate is also ranked second, and the incidence of PCa in China has been rising in recent years[ 2 , 3 ]. Early symptoms are more insidious, such as urinary frequency, urgency, and reduced urinary flow, comparable to those of prostate enlargement[ 4 ]. PCa is not diagnosed accurately through symptoms but requires digital rectal examination, prostate-specific antigen (PSA) test to assist in the diagnosis, and tissue biopsy to confirm the diagnosis [ 5 , 6 ]. Patients with advanced disease usually require a combination of therapies including surgery, radiation, chemotherapy, and hormonal therapy [ 7 , 8 ].

The mechanisms and significance of aging are becoming increasingly important as the population ages. Aging genes are a group of genes involved in the aging process whose main role is to control a number of molecular biological processes of organismal aging (DNA damage response and cell cycle regulatory pathways, etc.) [ 9 , 10 ]. Aging genes may be involved in PCa development and progression by regulating cancer cell proliferation, cell cycle, and apoptosis [ 11 ]. Its involvement in the encoding of the mTOR protein and over-activation of the mTOR pathway, which activates Akt, can inhibit apoptosis in cancer cells28283069 [ 12 ]. These findings provide valuable insights into the relationship between the Aging gene and PCa.

The aim of this study was to construct a predictive model by exploring the association between senescence genes and biochemical recurrence in PCa. PCa remains a significant clinical challenge due to its high incidence and potential for recurrence after initial treatment. Despite advances in therapeutic strategies, predicting BCR remains difficult, which underscores the urgent need for reliable biomarkers. However, the specific contribution of these genes to PCa recurrence has not been fully elucidated. By utilizing two public gene expression profile datasets, GSE70768 and GSE116918, as training sets, and validating the findings with the TCGA dataset, this study aims to identify key genes that are strongly associated with the biochemical recurrence of PCa. Through rigorous statistical analyses, including analysis of variance, univariate analysis, and the application of lasso and stepwise multifactorial screening, a prognostic model was developed. The innovative aspect of this study lies in its focus on the integration of senescence-related gene expression profiles to predict BCR in PCa, an area that has been underexplored. This study includes data from a total of 323 patients, offering new theoretical insights that may inform future research on predicting and treating biochemical recurrence in PCa, thereby filling a critical gap in existing research.

2.1 Data sources

We downloaded the PCa expression profile microarray from the public database Gene Expression Omnibus (GEO), which includes peripheral blood samples from PRAD patients and controls, containing GSE70769 [ 13 ] and GSE116918 [ 14 ] totaling 268 samples, and the TCGA database to do the validation group, containing 55 columns.

2.2 Prognostic analysis and column line plot construction

We conducted univariate and multivariate Cox regression analyses to determine if risk scores could serve as independent prognostic indicators. Utilizing the "rms" R package, we integrated these risk scores with clinicopathologic features to create histograms that predict patient survival at 1, 3, and 5 years within the TCGA-PRAD cohort.

2.3 Estimation of infiltrating immune cells using CIBERSORT analysis

CIBERSORT analysis ( https://cibersort.stanford.edu/ ) is a robust tool that leverages gene expression data to estimate the relative proportions of various immune cell types within complex tissue samples. To assess immune cell infiltration in cancer tissues, we utilized this method, allowing us to precisely quantify the relative abundance of different immune cells. This approach provides a detailed understanding of the immune landscape in heterogeneous samples, offering valuable insights into the tumor microenvironment.[ 15 ].

2.4 Drug sensitivity

To evaluate the efficacy of treatments across different risk categories, we utilized the "pRRophetic" R package. This tool allowed us to analyze treatment responses in patients classified as either high-risk or low-risk based on their prognostic scores. We drew on data from the Genomics of Drug Sensitivity in Cancer (GDSC) database, which provides comprehensive information on drug responses. Specifically, we used the dataset to obtain half-maximal inhibitory concentration (IC50) values, which measure the concentration of a drug required to inhibit a biological process by 50%. This approach enabled us to assess how effectively various treatments could suppress cancer growth in PRAD patients, providing insights into potential therapeutic strategies [ 16 ].

2.5 GeneSet cancer analysis database

GSCALite ( http://bioinfo.life.hust.edu.cn/web/GSCALite/ ) is a comprehensive online tool that integrates genomic data from 33 types of cancer available in The Cancer Genome Atlas (TCGA) with normal tissue data from the Genotype-Tissue Expression (GTEx) project. In our study, we utilized GSCALite to perform a detailed analysis of various genomic alterations, including copy number variations, DNA methylation patterns, and pathway activities related to ARGs in PRAD. This platform facilitated a thorough examination of how these genomic features influence the biological behaviors of ARGs in PRAD.

2.6 Tumor immune single cell hub database

TISCH ( http://tisch.comp-genomics.org ) is a comprehensive database dedicated to single-cell RNA sequencing data specifically focused on the tumor microenvironment (TME) [ 17 ]. Utilizing this resource, we systematically investigated the heterogeneity of the tumor microenvironment across various cell types and datasets.

2.7 Statistical analysis

All statistical analyses were conducted using R software (V. 4.2.0). To evaluate the reliability of the diagnostic model, ROC curves were generated, with the area under the curve (AUC) used to determine predictive accuracy, applying a significance threshold of P  < 0.05. Furthermore, the goodness-of-fit of the constructed nomograms was assessed using the Hosmer–Lemeshow test. This rigorous analysis ensured a thorough evaluation of the model's performance and fit.

3.1 Construction and validation of ARGs signatures.

A risk score model based on ARGs was developed to identify prognostic biomarkers for PRAD patients. LASSO regression analysis (Fig.  1 A) was utilized on DE-ARGs with prognostic significance, and cross-validation plots (Fig.  1 B) identified 12 key genes: HDAC3, IRS2, HIF1A, PRKCA, MSRA, APOE, HSPA9, CDKN2A, TP53BP1, CNR1, CDKN2B, and SERPINE1. High-risk genes were found to have a positive correlation with risk, whereas low-risk genes were negatively associated. The risk score model demonstrated strong predictive power for PRAD prognosis, with an AUC of 0.787, outperforming traditional clinicopathologic features (Fig.  1 C). In the GEO cohort, the model's predictive performance was further validated, showing high sensitivity and specificity with AUC values of 0.67, 0.675, 0.696, and 0.696 at 1, 3, 5, and 8 years, respectively (Fig.  1 D). Similarly, in the TCGA cohort, the AUC values were 0.67, 0.659, 0.667, and 0.743 at 1, 3, and 5 years (Fig.  1 E). Survival analysis in the GEO cohort revealed that patients with higher risk scores had significantly increased mortality, while those in the low-risk group demonstrated a better prognosis ( P  < 0.001) (Fig.  1 F). This trend was consistent in the TCGA cohort, where a better prognosis was also observed in the low-risk group ( P  = 0.003) (Fig.  1 G). Overall, this ARG-based risk score model offers superior predictive power and may serve as a valuable tool in identifying prognostic biomarkers and guiding clinical decisions for PRAD patients.

figure 1

Construction and validation of ARGs signatures. A Ten-fold cross-validation used to adjust parameter selection in the LASSO model. B Lasso coefficient profiles. C Multi-exponential ROC analysis. D ROC profile analysis of the GEO cohort over time. E ROC profile analysis of the TCGA cohort over time. F KM curves comparing overall PRAD patients between low and high risk groups in the GEO cohort. G KM curves comparing overall PRAD patients between low and high risk groups in the TCGA cohort

3.2 Exploring the relationship between the prognostic model's scoring profile of prostate patients and biochemical recurrence (BCR) in the clinic.

We collected BCR profiles and time to recurrence (bcr time) of prostate patients and plotted a scatterplot with the prognostic model's scoring profiles (dataset), and from the plot, we clearly observed that with the rise in patients' RISK SCREEN, not only the number of recurrences in BCR of recurrence, the time to recurrence (bcr time) also increased from 0–6 years to a wide range of 0–8 years (Fig.  2 A, B ), which also suggests that our prognostic model can, to some extent, also predict the circumstances and timing of biochemical recurrence in prostate patients. We then categorized the patients into high and low risk groups based on the median value of the risk score (Fig.  2 C, D ), and viewed the expression of the modeled genes between the high and low risk groups through the limma package, and we could see that all modeled genes, except HSPA9, were actively expressed in the high risk group (Fig.  2 E, F ). By principal component analysis (PCA) of the patients we can clearly see that the high and low risk labels can classify and summarize the patients well and there is very little overlap of patients (Fig.  2 G, H ).

figure 2

Exploring the relationship between the prognostic model's scoring profile of prostate patients and biochemical recurrence (BCR) in the clinic. A , B Scatterplot of the scoring profile of the prognostic model for prostate patients versus the time to clinical biochemical recurrence (BCR) recurrence (bcr time). C , D Scatterplot of the distribution of risk scores for prostate patients. E , F Heatmap of differential expression of modeled genes between high, and low risk groups. G , H Principal component analysis of prostate patients

3.3 The construction of the nomogram.

To explore the relationship between risk factors such as prognostic model scores and clinical characteristics and BCR outcomes in prostate patients. We plotted forest plots by the results of univariate and multivariate COX regression analyses as well as Log Rank tests (Fig.  3 A, B ) in which the P value of T-stage and risk scores were < 0.05, the Hazard ratio (HR) of T-stage in one-way, multifactor COX regression was 0.569 as well as 0.585, and HR > 1 was a risk factor. Hazard ratio (HR) for multifactorial COX regression was 0.569 as well as 0.585, in addition to pvalue < 0.001 for both risk scores, and HR > 1 was a risk factor. Subsequently, we constructed a multi-indicator clinical prognostic model based on the results of the multifactorial COX analysis, and plotted the nomogram and the corrected curves (Fig.  3 C, D ). Based on the nomogram, each sample was scored by the COX analysis, and based on the Points, we predicted the patient's OS in the clinical diagnosis and treatment, and the results of the corrected curves showed the discrepancy between the predicted and actual results of COX model. The results of the correction curves show that the COX model has excellent predictive performance, as the difference between the predicted results and the actual situation is very small.

figure 3

The construction of the nomogram. A Forest plot for a one-factor COX analysis. B Forest plot for multifactor COX analysis. C Nomogram of a multifactor COX analysis model. D Correction curve plot for a multifactor COX model

3.4 Correlation between clinical characteristics and biochemical recurrence of ARGs in PRAD patients.

Given that BCR in the high-risk (HR) and low-risk (LR) populations differed dramatically in terms of individual clinical attributes. In order to study this difference in depth and compare it more precisely, we categorized patients diagnosed with PRAD into five different subgroups based on clinical parameters. These stratifications included age (≤ 65 and > 65 years), gleason (6–7 and 8–9), pca (10 > and <  = 10), and T-stage (T1-2 and T3-4). Notably, in all subgroups, the survival of LR patients was significantly better than that of HR patients, which was characterized by a longer survival period (Fig.  4 A-H). Based on the analysis of the results, we further strengthened our confidence in the reliability of the ARGs profile as a clinical predictive tool.

figure 4

Correlation between clinical characteristics and biochemical recurrence of ARGs in PRAD patients. A–H KM curves between different clinical features

3.5 Analysis of GSCALite and cBioPortal Data

Figure  5 highlights genomic alterations involving ARGs and hub genes across three domains: single-nucleotide variation (SNV) (Fig.  5 A-B, I ), copy-number variation (CNV) (Fig.  5 D, H ), and methylation (Fig.  5 F-G). ARGs did not show significant mutations in KIRP (Fig.  5 A).

figure 5

Analysis of GSCALite and cBioPortal Data. A SNV of all mutated genes in the gene set in PRAD. B SNV classes of hub-gene set in PRAD, C Survival difference between high and low methylation in each cancer. D Survival difference between CNV groups. E Correlations between methylation and mRNA expression of ARGs in PRAD. F Correlation between methylation and mRNA expression; G Methylation differences among tumor and normal samples of SKA3 and top ten hub genes in PRAD; H Pie plot summarizing CNV of ARGs. I SNV of ARGs and hub genes in PRAD. J Correlation between CTRP drug sensitivity and mRNA expression

CNV in PRAD patients encompassed heterozygous, homozygous, amplifications, and deletions (Fig.  5 H). Importantly, no apparent correlation was observed between heterozygous or homozygous CNV. However, CNV in genes such as MSRA, TP53BP1, IRS2, HSPA9, and HDAC3 displayed significant associations with mRNA expression, with MSRA exhibiting a particularly strong correlation (F i g.  5 I). In PRAD patients, CNV groups, including CNR1, HSPA9, and TP53BP1, demonstrated a negative correlation with overall survival (OS) and progression-free survival (PFS), while others showed varying degrees of positive correlations (Fig.  5 D).

The analysis also unveiled differential methylation patterns in PRAD genes between tumor and normal samples (Fig.  5 G). Specifically, low methylation of APOE and CDKNA2 was associated with poorer overall survival (OS) in KIRP (Fig.  5 C). Furthermore, methylation of ARGs displayed a negative correlation with mRNA expression (Fig.  5 F). In the majority of genes, there was a positive correlation between CTRP drug sensitivity and mRNA expression. SERPINE1 exhibited a significant positive correlation, while a few genes, including CNR1, displayed negative correlations. This suggests a degree of specificity in the drug sensitivity experiment, providing valuable insights for future clinical research in developing treatment strategies.

3.6 Correlation Analysis of ARGs with Clinicopathological Characteristics

In Fig.  6 A, our analysis revealed a notable influence of the ARGs on the distribution of specific four clinicopathological features within both high-risk and low-risk groups. It was obvious that patients aged 65 and older, those with pca more than 10, and individuals at gleason7 comprised a larger proportion of patients in the high-risk (HR) group. Furthermore, the heatmap illustrates various clinicopathological features, including T stage, age, gleason, pca, and risk scores across the entire cohort of TCGA-PRAD patients (Fig.  6 B). We extended our analysis to explore the relationship between risk scores and various clinicopathological factors, including tumor grade, disease stage, T stage, patient age, and gender. These correlations were systematically evaluated to understand how each factor interacts with the risk scores, as illustrated in Fig.  6 C to 6F. The analysis revealed significant variations in risk scores among patients with differing age, pca, gleason, and T stages, with patients in more advanced stages showing higher risk scores. Based on our findings, we concluded that a significant positive correlation exists between risk scores and various clinicopathological factors.

figure 6

Distribution of risk scores in different clinical subtypes.  A  The proportion of patients with different clinical subtypes (Age, Pca, Gleason, T stage) in the HR group and LR group. B Heatmap of clinicopathological variables in HR group and LR group. The proportion of patients with different clinical subtypes (Age, Pca, Gleason, T stage) in the HR group and LR group.  C-F  Risk score distribution of different clinical subtypes

3.7 Immunoassays in patients with PRAD

Immune cell infiltration represents a fundamental aspect of the TME. Utilizing the CIBERSORT algorithm for Spearman correlation analysis, we observed a notable association between risk scores and the abundance of immune cells in the PRAD TME. Specifically, CD8 + T cells were predominantly correlated with CD4 + T cells (Fig.  7 A). In the combined analysis of the 12 ARGs with immune cells, HSPA9 was found to be highly correlated with M1 macrophages (Fig.  7 B). To assess the distribution and correlation of the 22 tumor-infiltrating immune cells (TICs) in the GEO cohort, we utilized CIBERSORT as the immune analysis tool. The results indicated that BRC samples exhibited significantly higher levels of immune infiltration compared to non-BRC samples, particularly in B cells, plasma cells, and macrophages (Fig.  7 C). The ARG-based risk score model effectively differentiated between various immune subtypes, thereby influencing the response to immunotherapy. To further investigate changes in immune function, we conducted a comparison of single-sample GSEA (ssGSEA) scores, revealing a significant increase in scores for the high-risk group (Fig.  7 D). Additionally, we examined differences in the expression of immune checkpoint genes, which are critical for tumor immunotherapy. In the low-risk group, 13 immune checkpoint genes, including BTNL2, CD244, CD28, CD40LG, CTLA4, LAIR1, NRP1, PDCD1, TIGIT, TNFRSF25, TNFRSF8, TNFRSF9, and TNFSF9, were significantly upregulated. In contrast, the high-risk group showed upregulation of only TNFSF9 and TNFRSF25 (Fig.  7 E). The upregulation of immune checkpoints suggests the presence of inflammation within the TME [ 18 ], implying that low-risk patients may have an inflammatory microenvironment. Targeted therapies against these elevated immune checkpoints could potentially benefit this tumor subtype [ 19 ].

figure 7

Immunoassays in patients with PRAD. A Histogram of immune cells. B Correlation of 12 genes with immune cells. C Differences in immune cell infiltration between high- and low-risk groups. D Immune function ssGSEA scores between high- and low-risk groups. E Differences in immune checkpoints between high- and low-risk groups

3.8 Correlation study of ARGs with the immune microenvironment of PRAD

We scrutinized the expression of 12 ARGs in the immune microenvironment using the PRAD_GSE143791 single-cell dataset retrieved from the TISCH database.There are 15 different immune cell types in GSE143791 (Fig.  8 A). We used pie charts to represent the proportional composition of different immune cells and their distribution in the samples (Fig.  8 B). To deeply investigate the expression levels of individual ARGs in immune cells, we generated a downscaled distribution map of CCRGs in immune cells (Fig.  8 C-N). Our analysis showed that HDAC3, HIF1A, HSPA9, and TP53BP1 were widely expressed in a wide range of AML immune cells, whereas the expression of CNR1 and CDKN2B in the immune microenvironment was almost negligible. These findings based on the PRAD dataset validate the correlation studies between ARGs and the immune microenvironment, thus complementing and refining the clinical targeting of PRAD induced by the immune microenvironment.

figure 8

Correlation study of ARGs with the immune microenvironment of PRAD. A Downscaled distribution of various immune cell subpopulations of PRAD_GSE143791. B Pie chart showing percentage of immune cells. C–N Distribution of 12-ARGs in PRAD_GSE143791

4 Discussion

PCa is the most common malignancy among men, emphasizing the importance of effective screening and detection methods. PSA testing has proven valuable in identifying localized PCa. However, PSA testing is limited by its lack of sensitivity and specificity. While PSA screening has raised the lifetime risk of a PCa diagnosis to 16%, the mortality rate remains relatively low at 3.4% [ 20 ]. This discrepancy suggests that increased detection of slow-growing or relatively benign cancers, which do not necessarily require definitive treatment, has led to concerns about overdiagnosis and overtreatment, exposing patients to unnecessary risks and potential urinary and bowel dysfunction post-treatment [ 21 ]. Recent reports indicate that a significant proportion of men with low PSA levels still develop PCa, many of which are high-grade malignancies. Thus, PSA is less effective as a screening tool for differentiating between high and low-risk cases. Research is ongoing to identify other markers that could more accurately pinpoint malignancies that are clinical threats while avoiding interventions for inert diseases. Preventive strategies tailored to genetic or other risks may help reduce the incidence of PCa [ 22 ]. PCa incidence escalates markedly with age. Data from the US Surveillance, Epidemiology, and End Results Program (2000–2008) indicate that the rate of PCa is 9.2 per 100,000 men in the 40–44 age group. This incidence rises sharply to 984.8 per 100,000 men aged 70–74 years before experiencing a slight decline [ 20 ]. PCa often develops gradually, typically preceded by dysplastic lesions that may remain undetected for many years or even decades. Autopsy studies have indicated that if most men lived to 100 years old, they would likely develop PCa [ 23 ]. Macrophage-tumor cell interactions have been found to promote androgen resistance and increase PCa invasion through tissue factor expression. Studies by Parrinello et al. demonstrated that aged mice with increased macrophage infiltration in the prostate glands reflect the role of immune cells in aging and its association with PCa development [ 24 ]. Thus, prognostic models based on senescence-related biomarkers can complement PSA screening for early diagnosis and predict genetic risk related to senescence [ 25 ].

We merged two prostate tumor patient cohort transcript datasets, GSE70768 and GSE116918, collected from public databases, and extracted the aging-related differential genes that were differentially expressed in the patients after de-batching and normalization, and then the Aging-DEGs were analysed by one-way COX analysis, followed by a lasso machine learning approach and stepwise multifactorial COX analysis. analysis to screen the genes for constructing prognostic models (HDAC3,IRS2,HIF1A,PRKCA,MSRA, APOE, HSPA9, CDKN2A, TP53BP1, CNR1, CDKN2B, and SERPINE1), and finally a prognostic model was constructed by logistic regression algorithms in patients with 12-Agings prostate tumors, and additionally A nomogram of prostate tumor patients was depicted as well as the immune microenvironment, immune function, mutation load, pathway enrichment analysis, and clinical subgroup survival analysis of patients with high- and low-risk prostate tumors were explored using the CIBERSORT database.

Histone deacetylase 3 (HDAC3) is an enzyme with histone deacetylase activity that plays a critical role in transcription regulation. By binding to the promoter region, HDAC3 inhibits transcription. Additionally, it modulates gene expression through interaction with the zinc finger transcription factor YY1 and suppresses p53 activity, which is essential for regulating cell growth and apoptosis. HDAC3 is recognized as a potential tumor suppressor gene. It has been proposed that the corepressor SMRT, together with N-CoR and HDAC3, forms a complex that inhibits AR activity and interacts with AR nuclear steroid receptors to suppress specific protein expression in PCa cell lines [ 26 ].

Hypoxia-inducible factor 1 subunit alpha (HIF1A) is a crucial transcriptional regulator that enables cells to adapt to low oxygen environments. In hypoxic conditions, HIF1A drives the expression of over 40 genes that enhance oxygen delivery and support metabolic adaptation. These include genes for HILPDA, vascular endothelial growth factor, glycolytic enzymes, glucose transporters, and erythropoietin [ 27 , 28 ]. HIF1A plays a crucial role in embryonic and tumor angiogenesis, as well as in the pathophysiology of ischemic diseases, influencing both cell proliferation and survival [ 29 ]. Early prostatic intraepithelial neoplasia (PIN) is hypoxic, and HIF1A signaling in luminal cells enhances malignant progression by suppressing immune surveillance and promoting luminal plasticity, leading to the emergence of cells that impair androgen signaling [ 30 ].

Protein kinase C (PKC) encompasses a family of serine- and threonine-specific kinases that are activated by calcium and diacylglycerol. As key receptors for tumor-promoting phorbol esters, PKC family members display unique expression profiles and contribute to various cellular functions, including adhesion, transformation, cycle checkpoints, and volume regulation. Aberrant PKC expression is a well-recognized cancer hallmark, with elevated levels linked to enhanced cell proliferation and diminished apoptosis in several malignancies, such as bladder cancers [ 31 ], gliomas, and PCas [ 32 ]. Aggressive PCa cells with high PKCα expression require this for mitogenic activity [ 33 ].

Apolipoprotein E (APOE) is a protein-coding gene.APOE is a core component of plasma lipoproteins and is involved in their production, transformation, and clearance [ 34 ], Venanzoni MC et al. examined protein expression in 20 prostatectomy specimens by immunohistochemistry and determined the association between the Gleason score of each sample and ApoE expression. ApoE expression was positively associated with Gleason score, hormone independence, and both local and distant invasiveness in prostate tissue sections. In contrast, while ApoE was positive in prostate intraepithelial neoplasia (PIN) adjacent to clinically evident cancer, more distant PINs showed a negative expression for ApoE [ 35 ]. Additionally ApoE gene scores were performed by blood samples from patients with prostate tumors. The E3/E3 genotype was found at a significantly higher frequency in patients compared to controls ( P  = 0.004). Carriers of the E3/E3 genotype had a 3.6-fold increased likelihood of being patients compared to controls (OR = 3.67, 95% CI = 1.451–9.155; p = 0.004). Moreover, patients with the E3/E3 genotype exhibited significantly higher Gleason scores (p = 0.017) and a greater prevalence of Gleason scores above 7 ( P  = 0.007). In contrast, the E4 allele was more prevalent in the control group ( P  = 0.006)(26,851,028).

Heat shock protein family A (Hsp70) member 9 (HSPA9) is a chaperone protein crucial for mitochondrial iron-sulfur cluster (ISC) biogenesis. HSPA9 interacts with and stabilizes ISC cluster assembly proteins, including FXN, NFU1, NFS1, and ISCU [ 36 ]. HSPA9 regulates erythropoiesis by stabilizing ISC assembly and may also play a role in controlling cell proliferation and cellular senescence [ 36 , 37 ]. It has been shown that JG-70, a variant inhibitor of HSP98, inhibits aerobic respiration by targeting mitochondrial HSP70 (HSPA9) and re-sensitizes desmoplasia-resistant PCas to androgen deprivation drugs in addition to Hirth CG et al. retrospectively reviewed the records of 636 patients who underwent radical prostatectomy and mounted paraffin embedded adenocarcinomatous and non-tumor tissues for microarrays. We evaluated the ability of HSPA9 to predict postoperative PSA outcome, response to adjuvant/rescue therapy and systemic disease. Results showed that HSPA9 was diffusely expressed in tumor cells and that diagnostic HSPA9 staining helped identify patients at increased risk of recurrence after salvage therapy [ 38 ].

Serpin family E member 1 (SERPINE1), also known as plasminogen activator inhibitor 1 (PAI-1), inhibits tissue-type plasminogen activator (tPA) and urokinase (uPA). These enzymes convert plasminogen into plasmin, which in turn activates matrix metalloproteinases (MMPs) to degrade the extracellular matrix (ECM), thereby promoting invasion and metastasis. SERPINE1 blocks cancer cell invasion by inhibiting uPA protease activity. Additionally, knocking down six transmembrane epithelial antigens (STEAP2) in PCa cells upregulates SERPINE1, reducing their invasive potential. This indicates that SERPINE1 may serve as a downstream effector of certain oncogenes to regulate prostate cancer cell migration [ 39 ].

Additionally epigenetic changes in the remaining biomarkers are thought to be associated with severe subtypes of prostate tumors and metastatic, invasive capacity. This includes mutations in CDKN2A [ 40 ] as well as large amounts of unclipped TP53BP1 [ 41 ] and hypermethylated CDKN2B [ 42 ].

In order to more systematically analyze the mutations in the genes of patients with prostate tumors and to explore their correlation with the prognostic models predicted by the 12-Agings Prostate Tumor Patient's Prognostic Model for high and low risk patients. We obtained patient mutation data from the TCGA database and conducted analyses across various omics levels, including genomic and copy number levels. The analysis revealed that single nucleotide variants (SNVs) were the most common mutations in the cohort, with single nucleotide polymorphisms (SNPs) being the predominant type. Additionally, we identified the genes with the highest mutation frequencies. Subsequently, we analyzed the proportion and type of homozygous versus heterozygous mutations among the copy number variants (CNVs) in the sample. We conducted Spearman correlation analysis to explore the relationship between CNVs and gene expression. Moreover, significant correlations were found between the expression of senescence biomarkers and drug sensitivity in the Cancer Treatment Response Portal (CTRP) and the Genomics of Cancer Drug Sensitivity (GDSC) databases. These results suggest that our risk markers could potentially serve as predictors of chemotherapy drug sensitivity or be targeted in future drug development efforts.

However, our prognostic model still has a lot of shortcomings, including the lack of real clinical cases and the lack of in vivo and ex vivo experiments to validate the expression and enrichment of the corresponding genes and pathways with the progression of the disease. These further experimental studies will be discussed in our subsequent papers.

In conclusion, our proposed 12-Agings signature is a novel biomarker with significant potential for predicting patient prognosis and serving as a therapeutic target in prostate tumor patients. The 12-Agings signature is capable of predicting clinical outcomes, thereby assisting physicians in identifying cases that are at risk of deterioration and recurrence. Moreover, it can characterize the immune environment of prostate tumors, enabling a more precise stratification of patients and the development of individualized treatment plans. Additionally, the signature facilitates the early identification of patient subgroups that may benefit from immunotherapy and chemotherapy based on mRNA expression profiles. These capabilities underscore the potential of the 12-Agings signature to improve clinical decision-making and patient management in prostate tumors.

5 Conclusion

In conclusion, our study presents a robust prognostic tool for PRAD utilizing ARGs. Validated through LASSO regression and cross-validation, the risk score model identified 12 pivotal genes that illuminate the molecular mechanisms underlying PRAD. High-risk genes were positively correlated with increased risk, whereas low-risk genes were negatively correlated. These findings are anticipated to enhance PRAD treatment and facilitate the development of more targeted therapeutic strategies.

6 Limitations

Although the GEO and TCGA datasets have been meticulously curated in terms of scale and quality, the diversity of their origins may introduce sample heterogeneity, potentially affecting the generalizability of our findings. Additionally, since these datasets were generated by different research centers, variations in sample collection and processing methods might result in batch effects. Despite implementing appropriate bioinformatics techniques to mitigate these issues, some inherent variability may still influence the interpretation of the results.

7 Contributions

Our study makes a significant contribution to PRAD research by introducing a novel prognostic model rooted in ARGs. Validated by rigorous statistical methods, the model outperforms traditional clinicopathologic factors and provides greater accuracy for PRAD prognosis. The identification of 12 key genes provided valuable insights into the molecular mechanisms that drive PRAD prognosis. By demonstrating the robustness and clinical relevance of the model, we facilitate more informed therapeutic decisions for patients with PRAD, potentially enabling personalized treatment. Furthermore, our work highlights the importance of exploring ARGs in cancer prognosis, paving the way for future research in this critical area of oncology.

Data availability

The datasets employed in this study can be accessed through the GEO repository ( https://www.ncbi.nlm.nih.gov/geo/ ) and the TCGA portal ( https://portal.gdc.cancer.gov/ ). Additionally, the raw data files, code files, and images supporting this research are available for download via the following link: https://www.jianguoyun.com/p/DY7CFbEQkeKyCxiQvaAFIAA .

Litwin MS, Tan HJ. The diagnosis and treatment of prostate cancer: a review. JAMA. 2017;317(24):2532–42.

Article   PubMed   Google Scholar  

Liu X, et al. Trends and age-period-cohort effect on incidence and mortality of prostate cancer from 1990 to 2017 in China. Public Health. 2019;172:70–80.

Article   CAS   PubMed   Google Scholar  

Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7–34.

Schatten H. Brief overview of prostate cancer statistics, grading, diagnosis and treatment strategies. Adv Exp Med Biol. 2018;1095:1–14.

Partin AW, et al. Combination of prostate-specific antigen, clinical stage, and Gleason score to predict pathological stage of localized prostate cancer. A multi-institutional update. JAMA. 1997;277(18):1445–51.

Barry MJ, Simmons LH. Prevention of prostate cancer morbidity and mortality: primary prevention and early detection. Med Clin North Am. 2017;101(4):787–806.

Teo MY, Rathkopf DE, Kantoff P. Treatment of advanced prostate cancer. Annu Rev Med. 2019;70:479–99.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Grozescu T, Popa F. Prostate cancer between prognosis and adequate/proper therapy. J Med Life. 2017;10(1):5–12.

CAS   PubMed   PubMed Central   Google Scholar  

Morgan AE, Davies TJ, Mc Auley MT. The role of DNA methylation in ageing and cancer. Proc Nutr Soc. 2018;77(4):412–22.

Avelar RA, et al. A multidimensional systems biology analysis of cellular senescence in aging and disease. Genome Biol. 2020;21(1):91.

Bottazzi B, Riboli E, Mantovani A. Aging, inflammation and cancer. Semin Immunol. 2018;40:74–82.

Rubie C, et al. microRNA-496 - a new, potentially aging-relevant regulator of mTOR. Cell Cycle. 2016;15(8):1108–16.

Ross-Adams H, et al. Integration of copy number and transcriptomics provides risk stratification in prostate cancer: a discovery and validation cohort study. EBioMedicine. 2015;2(9):1133–44.

Jain S, et al. Validation of a metastatic assay using biopsies to improve risk stratification in patients with prostate cancer treated with radical radiation therapy. Ann Oncol. 2018;29(1):215–22.

Cho SY, et al. Amplification of transglutaminase 2 enhances tumor-promoting inflammation in gastric cancers. Exp Mol Med. 2020;52(5):854–64.

Geeleher P, Cox NJ, Huang RS. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 2014;15(3):R47.

Article   PubMed   PubMed Central   Google Scholar  

Sun D, et al. TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment. Nucleic Acids Res. 2021;49(D1):D1420-d1430.

Spranger S, et al. Up-regulation of PD-L1, IDO, and T(regs) in the melanoma tumor microenvironment is driven by CD8(+) T cells. Sci Transl Med. 2013;5(200):200ra116.

Oliva M, et al. Immune biomarkers of response to immune-checkpoint inhibitors in head and neck squamous cell carcinoma. Ann Oncol. 2019;30(1):57–67.

Frankel S, et al. Screening for prostate cancer. Lancet. 2003;361(9363):1122–8.

Smither AR, et al. Quantifying the natural history of post-radical prostatectomy incontinence using objective pad test data. BMC Urol. 2007;7:2.

Nogueira L, Corradi R, Eastham JA. Prostatic specific antigen for prostate cancer detection. Int Braz J Urol. 2009;35(5):521–9 ( discussion 530-2 ).

Leitzmann MF, Rohrmann S. Risk factors for the onset of prostatic cancer: age, location, and behavioral correlates. Clin Epidemiol. 2012;4:1–11.

Taverna G, et al. Senescent remodeling of the innate and adaptive immune system in the elderly men with prostate cancer. Curr Gerontol Geriatr Res. 2014;2014: 478126.

Stenman UH, et al. Prognostic value of serum markers for prostate cancer. Scand J Urol Nephrol Suppl. 2005;216:64–81.

Article   Google Scholar  

Trtkova K, et al. Binding of AR to SMRT/N-CoR complex and its co-operation with PSA promoter in prostate cancer cells treated with natural histone deacetylase inhibitor NaB. Neoplasma. 2010;57(5):406–14.

Jaakkola P, et al. Targeting of HIF-alpha to the von Hippel-Lindau ubiquitylation complex by O2-regulated prolyl hydroxylation. Science. 2001;292(5516):468–72.

Masson N, et al. Independent function of two destruction domains in hypoxia-inducible factor-alpha chains activated by prolyl hydroxylation. Embo j. 2001;20(18):5197–206.

Shan B, et al. RSUME is implicated in HIF-1-induced VEGF-A production in pituitary tumour cells. Endocr Relat Cancer. 2012;19(1):13–27.

Abu El Maaty MA, et al. Hypoxia-mediated stabilization of HIF1A in prostatic intraepithelial neoplasia promotes cell plasticity and malignant progression. Sci Adv. 2022;8(29):eabo2295.

Mandil R, et al. Protein kinase Calpha and protein kinase Cdelta play opposite roles in the proliferation and apoptosis of glioma cells. Cancer Res. 2001;61(11):4612–9.

CAS   PubMed   Google Scholar  

Stewart JR, O’Brian CA. Protein kinase C-{alpha} mediates epidermal growth factor receptor transactivation in human prostate cancer cells. Mol Cancer Ther. 2005;4(5):726–32.

Cooke M, et al. Protein kinase C alpha is a central node for tumorigenic transcriptional networks in human prostate cancer. Cancer Res Commun. 2022;2(11):1372–87.

Marcel YL, Vezina C, Milne RW. Cholesteryl ester and apolipoprotein E transfer between human high density lipoproteins and chylomicrons. Biochim Biophys Acta. 1983;750(2):411–7.

Venanzoni MC, et al. Apolipoprotein E expression in localized prostate cancers. Int J Oncol. 2003;22(4):779–86.

Shan Y, Cortopassi G. Mitochondrial Hspa9/Mortalin regulates erythroid differentiation via iron-sulfur cluster assembly. Mitochondrion. 2016;26:94–103.

Chen TH, et al. Knockdown of Hspa9, a del(5q31.2) gene, results in a decrease in hematopoietic progenitors in mice. Blood. 2011;117(5):1530–9.

Hirth CG, et al. Immunoexpression of HSPA9 and CUL2 in prostatic tissue and adenocarcinoma. Ann Diagn Pathol. 2022;56: 151843.

Mao Y, et al. Silencing of ELK3 induces S-M Phase arrest and apoptosis and upregulates SERPINE1 expression reducing migration in prostate cancer cells. Biomed Res Int. 2020;2020:2406159.

Machiela MJ, et al. Limited evidence that cancer susceptibility regions are preferential targets for somatic mutation. Genome Biol. 2015;16(1):193.

Jaworski D, et al. Expression differences between proteins responsible for DNA damage repair according to the Gleason grade as a new heterogeneity marker in prostate cancer. Arch Med Sci. 2023;19(2):499–506.

Yegnasubramanian S, et al. Hypermethylation of CpG islands in primary and metastatic human prostate cancer. Cancer Res. 2004;64(6):1975–86.

Download references

This study did not receive any specific grants from funding agencies in the public, commercial, or nonprofit sectors.

Author information

Authors and affiliations.

Department of Oncology, Chongqing General Hospital, Chongqing University, Chongqing, 401147, China

Yuni Wu & Zhibin Luo

School of Clinical Medicine, North Sichuan Medical College, Nanchong, 637100, China

Department of Oncology, Chongqing Hospital of Traditional Chinese Medicine, Chongqing, 400021, China

You can also search for this author in PubMed   Google Scholar

Contributions

The study was conceptualized by YW, RX, and JW. ZL, YW and RX were responsible for drafting the manuscript. YW and ZL conducted the literature search and gathered the relevant data. Subsequently, YW and JW analyzed and presented the data in a visual format. The final version of the manuscript was reviewed by ZL and JW, and necessary revisions were made. All authors critically evaluated and provided their approval for the final version of the manuscript.

Corresponding authors

Correspondence to Jing Wang or Zhibin Luo .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Wu, Y., Xu, R., Wang, J. et al. Precision molecular insights for prostate cancer prognosis: tumor immune microenvironment and cell death analysis of senescence-related genes by machine learning and single-cell analysis. Discov Onc 15 , 487 (2024). https://doi.org/10.1007/s12672-024-01277-6

Download citation

Received : 15 July 2024

Accepted : 26 August 2024

Published : 27 September 2024

DOI : https://doi.org/10.1007/s12672-024-01277-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Aging-related Genes
  • Machine Learning
  • Immune microenvironment
  • Prostate Cancer
  • Single-Cell Analysis
  • Biochemical recurrence

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. 2023 emerging AI and Machine Learning trends

    latest research on machine learning

  2. Top Machine Learning (ML) Research Papers Released in 2022

    latest research on machine learning

  3. Latest 15 machine learning research topics

    latest research on machine learning

  4. Latest Thesis Topics in Machine Learning for Research Scholars

    latest research on machine learning

  5. 10 top AI and Machine Learning Trends for 2023

    latest research on machine learning

  6. Latest Research Projects in Machine Learning 2024

    latest research on machine learning

VIDEO

  1. Qualcomm Research: SceneDetect

  2. Machine Learning for Ph.D. Scholar's #machinelearning #labtech

  3. Introduction to Machine Learning in Python with Scikit-Learn

  4. What is machine learning?

  5. Stan

  6. Netflix Research: Machine Learning

COMMENTS

  1. The latest in Machine Learning

    Discover the latest trends and innovations in machine learning research and code. Browse papers with code by topics, tasks, methods, and datasets.

  2. Machine learning

    Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers to learn without being explicitly programmed and have ...

  3. A Data-Centric Approach to improve performance of deep learning models

    In 35, the author discussed multi-task learning (MTL) techniques, which by allowing parameters to be shared across many machine learning tasks, improves generalization and so represents a Data ...

  4. Machine learning

    Helping robots practice skills independently to adapt to unfamiliar environments. A new algorithm helps robots practice skills like sweeping and placing objects, potentially helping them improve at important tasks in houses, hospitals, and factories. August 8, 2024. Read full story.

  5. Machine Learning: Algorithms, Real-World Applications and Research

    In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI ...

  6. Journal of Machine Learning Research

    Browse the most recent articles published in JMLR, a leading peer-reviewed journal in all areas of machine learning. Find topics such as reinforcement learning, large language models, Gaussian processes, fairness, and more.

  7. Machine learning

    Read the latest Research articles in Machine learning from Nature. ... A cutting-edge global model of the atmosphere combines machine learning with a numerical model based on the laws of physics ...

  8. Machine Learning

    See today's new changes. Total of 80 entries : 1-25 26-50 51-75 76-80. Showing up to 25 entries per page: fewer ... 8 of 8 entries ) arXiv:2409.13342 [pdf, html, other] Title: Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data Youngro Lee, Giacomo Baruzzo, Jeonghwan Kim, Jongmo Seo, ...

  9. Machine learning-based approach: global trends, research directions

    Machine learning-based approach: global trends, research directions, and regulatory standpoints ... core of artificial intelligence (AI) and data science. Recent progress in ML has been driven both by the development of new learning algorithms theory, and by the ongoing explosion in the availability of vast amount of data (often referred to as ...

  10. Home

    Overview. Machine Learning is an international forum focusing on computational approaches to learning. Reports substantive results on a wide range of learning methods applied to various learning problems. Provides robust support through empirical studies, theoretical analysis, or comparison to psychological phenomena.

  11. 777306 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on MACHINE LEARNING. Find methods information, sources, references or conduct a literature review on ...

  12. 10 top AI and machine learning trends for 2024

    Here are the top 10 AI and machine learning trends to prepare for in 2024. 1. Multimodal AI. Multimodal AI goes beyond traditional single-mode data processing to encompass multiple input types, such as text, images and sound -- a step toward mimicking the human ability to process diverse sensory information.

  13. Top Machine Learning (ML) Research Papers Released in 2022

    This 2022 ML paper presents an algorithm that teaches the meta-learner how to overcome the meta-optimization challenge and myopic meta goals. The algorithm's primary objective is meta-learning using gradients, which ensures improved performance. The research paper also examines the potential benefits due to bootstrapping.

  14. Study examines how machine learning boosts manufacturing

    This fusion of research and collaboration is the logical next step for LGO, he says, because it's always been at the forefront of problem-solving for global operations. Machine learning is definitely the latest big knowledge gap for many businesses, but not the first, and MIMO can teach companies how to apply it.

  15. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023

    As computers and the concept of artificial intelligence (AI) were almost simultaneously developed in the 1940s and 1950s, the field of medicine was quick to see their potential relevance and ...

  16. Machine Learning

    All the latest science news on machine learning from Phys.org. Find the latest news, advancements, and breakthroughs.

  17. Machine Learning

    We study a range of research areas related to machine learning and their applications for robotics, health care, language processing, information retrieval and more. Among these subjects include precision medicine, motion planning, computer vision, Bayesian inference, graphical models, statistical inference and estimation. Our work is ...

  18. Nature Machine Intelligence

    Nature Machine Intelligence is an online-only journal publishing research and perspectives from the fast-moving fields of artificial intelligence, machine learning and robotics.

  19. New AI model breaks barriers in cross-modality machine vision learning

    Multimodal technique for analyzing audio and visual data improves performance of machine-learning models. Jun 6, 2023. New neural framework enhances reconstruction of high-resolution images. Sep 5, 2024. ... Daily science news on research developments and the latest scientific innovations. Medical Xpress. Medical research advances and health news.

  20. New research reveals brain memory encoding can improve AI, memory

    New research reveals brain memory encoding can improve AI, memory therapies, and learning tools Real-world applications of this research across various fields include education, artificial ...

  21. Machine Learning News

    Oct 8, 2023. Generative AI Throwdown: Open Source Vs. Proprietary Models. Generative AI, dominated by proprietary models locked inside big tech companies, is being disrupted by a new wave of open ...

  22. Research Associate

    Strong theoretical understanding and practical experience in deep learning-based machine learning. Strong research experience (e.g., evidenced by publication record). Excellent programming skills and in-depth computer science knowledge. Preferred Knowledge, Skills, and Abilities: Practical experience developing novel AI/ML algorithms and models.

  23. New research could extend the lifetime of key carbon-capture materials

    Atomistic simulations, machine learning potential and accelerated degradation experiments reveal the complex role of CO2 in the oxidation kinetics of amine-functional sorbents for carbon capture. (Illustration concept: Sichi Li/LLNL; Illustration: Jacob Long and Adam Samuel Connell/LLNL) ... This new research, published in the Journal of the ...

  24. Machine learning articles within Scientific Reports

    Read the latest Research articles in Machine learning from Scientific Reports. ... Teaching old docks new tricks with machine learning enhanced ensemble docking. Roshni Bhatt, Ann Wang

  25. Top Machine Learning Research Papers 2024

    Abstract: This research paper described a personalised smart health monitoring device using wireless sensors and the latest technology. Research Methodology: Machine learning and Deep Learning techniques are discussed which works as a catalyst to improve the performance of any health monitor system such supervised machine learning algorithms ...

  26. NSF Award Search: Award # 2347658

    Despite recent advances in the deployment of machine learning techniques to materials science, the creation of materials with desired mechanical properties in multiple loading directions remains a significant challenge. ... This collaborative research will create and test a new physics-informed deep learning (PIDL) framework to tailor the ...

  27. Replacing hype about artificial intelligence with accurate measurements

    In a new paper in Nature Machine Intelligence, researchers at the U.S. Department of Energy's Princeton Plasma Physics Laboratory (PPPL) and Princeton University performed a systematic review of research comparing machine learning to traditional methods for solving fluid-related partial differential equations (PDEs). Such equations are ...

  28. Featured Discovery: Girish Melkani and Team Leverage Machine Learning

    Girish Melkani, PhD Girish Melkani, Ph.D., associate professor of medicine in the Department of Pathology's Division of Molecular and Cellular Pathology, is the latest recipient of the Heersink School of Medicine's Featured Discovery. This honor recognizes significant research contributions by Heersink faculty. Melkani's study, "Automated assessment of cardiac dynamics in aging and dilated ...

  29. Forecasting the future of artificial intelligence with machine learning

    The most connected nodes and the years they became so include decision tree (1994), machine learning (1996), logic program (2000), neural network (2005), experimental result (2011), machine ...

  30. Precision molecular insights for prostate cancer prognosis: tumor

    These findings open new avenues for research on PCa recurrence prediction and therapeutic strategies. Prostate cancer (PCa) is a prevalent malignancy among men, primarily originating from the prostate epithelium. ... and then the Aging-DEGs were analysed by one-way COX analysis, followed by a lasso machine learning approach and stepwise ...